Results 1 to 15 of 15
  1. #1
    Join Date
    Jun 2002
    Location
    Singapore
    Posts
    71

    Unanswered: Any Faster Way To Process This?

    Hello All.

    I have this script that I need to run every Saturday evening and it takes more than 5 hours to complete. Is there a better way to structure my script? Please advise. Thank you.


    update Table_A set financial_yr = t2.Master_financial_year,
    financial_period = t2.Master_financial_period
    from Master_financial_Table t2
    where frst_requested_date >= t2.Start_date and frst_requested_date <= t2.End_date
    and financial_year>2000


    There are a total of 4.6 million records (and growing weekly) in Table_A and 42 records in Master_financial_table (standard)

    Indexes have been created for these tables.

  2. #2
    Join Date
    Nov 2003
    Location
    FRANCE
    Posts
    393

    Re: Any Faster Way To Process This?

    Maybe this will be faster
    You don't read 4 million times the Master_financial_Table

    Declare @Master_financial_year DateTime
    Declare @Master_financial_period DateTime
    Declare @Start_date DateTime
    Declare @End_date DateTime

    Select @Master_financial_year=Master_financial_year,
    @Master_financial_period=Master_financial_period,
    @Start_date=Start_date,
    @End_date=End_date
    From Master_financial_Table

    Update Table_A
    Set financial_yr=@Master_financial_year,
    financial_period = @Master_financial_period
    Where frst_requested_date between @Start_date and @End_date and
    financial_year>2000

  3. #3
    Join Date
    Nov 2003
    Posts
    94
    I suspect that Table_A will have been given a non-clustered index on frst_requested_date. Given that “where frst_requested_date >= t2.Start_date and frst_requested_date <= t2.End_date” is going to select a very large number of records, if Table_A has clustered index, you should consider removing it.

    In SQL Server 2000, if a table has a clustered index, the leaf nodes of all non-clustered indexes contain the key values of the clustered index corresponding to the index match. So, when a non-clustered index match is found it is followed by a bookmark-lookup on the table’s clustered index. On *very* large tables even a highly specific index can return many thousands of records, thereby inducing thousands of clustered index seeks in order to resolve the secondary bookmark lookup. SQL Server should be smart enough to realize a table scan is going to be better, but often it doesn’t, and so very large clustered tables exhibit unpleasant performance when a query is conducted through a non-clustered index.

    If a table is non-clustered (or heaped), the leaf nodes of all its indexes consist of offset pointers directly into the data blob. This has a significant performance risk in that any index page splitting becomes expensive to fixup on insert or update, but all indexes recover their data by jumping directly to the data row without thousands or millions of bookmark lookups or full blown table scans.

    If you can generate a query plan and discover many bookmark lookups are taking place, it’s worth considering unclustering Table_A.

  4. #4
    Join Date
    Oct 2001
    Location
    Naples, FL
    Posts
    273
    If a table is non-clustered (or heaped), the leaf nodes of all its indexes consist of offset pointers directly into the data blob. This has a significant performance risk in that any index page splitting becomes expensive to fixup on insert or update, but all indexes recover their data by jumping directly to the data row without thousands or millions of bookmark lookups or full blown table scans.
    I disagree. If the table is a heap the pointers point to the ROWID, which takes into place dbname, table, page, etc. Not sure what you mean by data blob.

    and so very large clustered tables exhibit unpleasant performance when a query is conducted through a non-clustered index.
    I disagree again. SQL will choose to do a clustered index scan if a table scan is more efficient, a clustered index seek is actually using the index and SQL will usually choose to ignore the NCI and just scan the table (clustered index scan) Also, while you mention pagesplitting, a heap reclaims empty space, this is horrible as it takes time for SQL to find the empty space. Also, choosing the right clustered key will keep your inserts fast and your NCI lookups fast as your key should be small (smaller than the ROWID lookup)

    I have found there are very rare occasions when a clustered index should not be on a table.

    For the poster - mess with your indexes and play with the "set statistics IO on" command, this will show you how many page reads your query costs you and allow you the ability to tell if removing or adding an index will really benefit you or hurt you.

    HTH
    ---------------
    Ray Higdon MCSE, MCDBA, CCNA

  5. #5
    Join Date
    Nov 2003
    Posts
    94
    Originally posted by rhigdon
    I disagree. If the table is a heap the pointers point to the ROWID, which takes into place dbname, table, page, etc. Not sure what you mean by data blob.



    I disagree again. SQL will choose to do a clustered index scan if a table scan is more efficient, a clustered index seek is actually using the index and SQL will usually choose to ignore the NCI and just scan the table (clustered index scan) Also, while you mention pagesplitting, a heap reclaims empty space, this is horrible as it takes time for SQL to find the empty space. Also, choosing the right clustered key will keep your inserts fast and your NCI lookups fast as your key should be small (smaller than the ROWID lookup)

    I have found there are very rare occasions when a clustered index should not be on a table.

    For the poster - mess with your indexes and play with the "set statistics IO on" command, this will show you how many page reads your query costs you and allow you the ability to tell if removing or adding an index will really benefit you or hurt you.

    HTH
    You are quite welcome to disagree. However very large data tables behave differently from more modest deployments.

    I quote from http://www.sql-server-performance.co...analysis5d.asp

    "The key observation for multi-row select queries is that there can be a very wide discrepancy between the point where query optimizer switches the execution plan to a Table Scan and the actually observed cross-over point."

    "Other important points include the following. Bookmark Lookups are less expensive for heap organized tables than tables with a clustered index. It is frequently recommended that tables have a clustered index. If clustering only benefits a small fraction of the queries (weighted by the number of rows involved), then it may be better to leave the table a heap."

  6. #6
    Join Date
    Oct 2001
    Location
    Naples, FL
    Posts
    273
    "Other important points include the following. Bookmark Lookups are less expensive for heap organized tables than tables with a clustered index. It is frequently recommended that tables have a clustered index. If clustering only benefits a small fraction of the queries (weighted by the number of rows involved), then it may be better to leave the table a heap."
    I agree if you will not be having a lot of inserts or deletes. The reclaiming of empty space is non-optimal if either are occuring. The other benefit of more efficient key locks with a clustered index than row locks should be taken into consideration if you will be joining to the table. I would be real interested to hear the posters logical IO when using or not using a clustered index. There are very few tables I work with that are read-only or used solely for querying, the only place I really have that is in a warehouse that I use cubes for anyway rather than SQL to query.

    Looks like an interesting article, will have a read.
    ---------------
    Ray Higdon MCSE, MCDBA, CCNA

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Karolyn, limteckboon's query is not going to make four million passes through the table. That's ridiculous. Your solution is functionally equivalent to his, but with extra coding.

    limteckboon,
    It's unclear whether fields frst_requested_date, and financial_year are in Table_A or Master_financial_Table. It makes a difference in how the query should be written. Please clarify. If they are in Table_A then possibly a JOIN instead of a WHERE clause would improve efficiency:

    update Table_A
    set financial_yr = t2.Master_financial_year,
    financial_period = t2.Master_financial_period
    from Table_A
    inner join Master_financial_Table t2
    on Table_A frst_requested_date between t2.Start_date and t2.End_date
    and Table_A.financial_year>2000

    In the meantime, you may get better performance if you DROP the indexes on Table_A prior to your update, especially any indexes involving columns financial_yr and financial_period. Add the indexes back in at the end of the process.

    blindman

  8. #8
    Join Date
    Nov 2003
    Location
    FRANCE
    Posts
    393
    my query is not equivalent...

    there's no JOINs on my query
    so the UPDATE query will be excuted faster

  9. #9
    Join Date
    Nov 2003
    Location
    FRANCE
    Posts
    393
    reindexing a 4-million-table must cost a lot...

  10. #10
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    It costs less than continuously reshuffling the index pages.

  11. #11
    Join Date
    Jun 2002
    Location
    Singapore
    Posts
    71
    Hello All.

    Thank you very much for all your most valuable advises. Greatly appreciated. Now, I need time to digest them. Will give a try on the 2 suggested codes to see which is more suitable for me but I am very sure both set of codes will definitely give better performance than mine.

    Once again Thanks a million to all.


    Best regards

  12. #12
    Join Date
    Sep 2003
    Posts
    522
    i don't see a difference between Karolyn and blindy's code, except for "extra coding" which actually eliminates a whole table out of the update (that's actually a plus, blindy)

    limteckboon:
    consider non-conventional data modifications. i guarantee that we can shrink the execution time down to under 60 minutes, maybe even less than 30 if you stop listening to blabbering about indexing and inner joins.

    did i get your attention?

  13. #13
    Join Date
    Nov 2003
    Location
    FRANCE
    Posts
    393
    ms_sql_dba posted in another thread

    "i've done this type of update on larger number of rows without using update. the trick? use queryout on bcp with values that you want to have, then truncate and bcp data back in."

    to his idea
    ---> use Bulk Insert to put back in the data is faster than Bcp
    ---> and set the option 'select into/bulkcopy' to True with this command

    exec sp_option 'DataBaseName', 'select into/bulkcopy','True'

  14. #14
    Join Date
    Nov 2003
    Location
    FRANCE
    Posts
    393
    simple... *** admiration ***

  15. #15
    Join Date
    Oct 2003
    Posts
    706

    Re: Any Faster Way To Process This?

    If you're going to examine 4 million rows, so a table scan is going to be done (almost certainly), but I doubt that the scanning time is going to be an issue. What will certainly be an issue is the fact that you are updating every single one of them, every single time!
    • Once you've processed 4 million records this week, how many of them are really going to change by next week? Your query should select, and only update the status of, those records which have changed (or that do not have a known status yet). It seems to me that if you are examining the "first-requested" date, not too many records will ever change status a second time! Take full advantage of that!
    • If you must make the change to all the records, write a script that processes 1,000 records at a time, then commits. Otherwise, the server is prepared to roll back every one of those changes!
    • Any change to an indexed column will cause the index to be updated every time. This should be avoided.
    • Just as an afterthought: can these be calculated fields? I mean, with only twenty-something date ranges, total ...

    No matter how efficient a computer or a piece of software may be, the best way to get good performance out of it is to ask it to do absolutely no more than it has to. I think that the root cause of the problem is that you are making the server do far too much work.
    ChimneySweep(R): fast, automatic
    table repair at a click of the
    mouse! http://www.sundialservices.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •