Results 1 to 4 of 4
  1. #1
    Join Date
    Oct 2014
    Posts
    268
    Provided Answers: 6

    Answered: Preferred method of updating tables?

    Right now I am leaning towards joining a temp table that pulls my aggregates in and then joins them on metric id and year month. But I noticed my bosses boss who knows this stuff a lot more than I, seemed to do direct inserts dropping that whole range.

    Same criteria but different approach. I like updating the new results and not dropping or deleting the contents. I like using temp tables too it makes it easy to just select into them and join off of them on the destination table that will be updated.

  2. Best Answer
    Posted by Pat Phelan

    "I generally try to avoid temp tables, although I have no problem with using them when they are truly necessary.

    In SQL Server, tempdb is almost always the most heavily used database. When you run into problems scaling up an application, tempdb is frequently a bottleneck. You can mitigate this problem many ways, but if I have a choice I prefer to write code that won't have the problem instead of finding better ways to mitigate it. I've found that by thinking the problems through, using expressions generously, and sometimes doing "interesting" things with CTEs I can almost always avoid the need to use temporary tables.

    If you have something like temp tables that works well for you, I don't object and wouldn't make you change until your hardware can no longer support the load, but you can be assured that there will come a point where you'll have to find a faster/lower impact way to do things. It might be easier to follow you're boss' lead and learn the more efficient way from the start instead of having to change your style/tactics in the middle of your career.

    -PatP"


  3. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I generally try to avoid temp tables, although I have no problem with using them when they are truly necessary.

    In SQL Server, tempdb is almost always the most heavily used database. When you run into problems scaling up an application, tempdb is frequently a bottleneck. You can mitigate this problem many ways, but if I have a choice I prefer to write code that won't have the problem instead of finding better ways to mitigate it. I've found that by thinking the problems through, using expressions generously, and sometimes doing "interesting" things with CTEs I can almost always avoid the need to use temporary tables.

    If you have something like temp tables that works well for you, I don't object and wouldn't make you change until your hardware can no longer support the load, but you can be assured that there will come a point where you'll have to find a faster/lower impact way to do things. It might be easier to follow you're boss' lead and learn the more efficient way from the start instead of having to change your style/tactics in the middle of your career.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  4. #3
    Join Date
    Oct 2014
    Posts
    268
    Provided Answers: 6
    Database theory is such a funny thing. CTE's are great for smaller sets, but larger sets they seems to struggle on performance. Temp tables are faster for larger sets, but like you mentioned Pat, they tax the tempdb more, potentially causing a bottle neck.

    Is there ever a perfect answer?

    I agree ultimately I will need to follow his lead, he has been dealing with database analytics for over 10 years, whereas I am ~ 2 years of pure analytics and ~ 4 years of data cleansing, utilizing Access, Excel, Notepad++ etc.

    So obviously my skill sets are inferior for now.

  5. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I've never tried a CTE with over two billion rows, but to me that's a respectable set of data. The performance was certainly acceptable to me although I did need to change the clustered index on one of the six tables for that particular request.

    As a general rule I try to avoid creating queries that return over a million rows or two gigabytes of data. These result sets aren't hard to generate and I rarely encounter performance issues, but they get to be rather challenging to process or even to send to most other applications.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •