Page 1 of 2 12 LastLast
Results 1 to 15 of 27
  1. #1
    Join Date
    Sep 2003
    Posts
    54

    Unanswered: How to optimize this sql statement?

    I have a very simple T-SQL statement which takes hours to run. The statement is something like:

    update table1 set column1 = coalesce(table2.column1, ''),
    column2 = table2.column2, column3 = table2.column3
    from table1, table2
    where table1.column4 = table2.column4

    The machine is a P3 800 512 ram and 80G HD with 8M cache. I'm running Windows 2000 Server and SQL Server 2000. Table1 has about 500,000 records and Table2 has about 2 millions records. I have added non-clustered indexes for table1.column4 and table2.column4.

    When I check Execution Plan, it has Hash Match/Aggregate which costs about 50%.


    Thank you for any help.

  2. #2
    Join Date
    Sep 2003
    Posts
    522
    i'd replace your where clause with inner join...on... but that's just semantics. if you have an index on the joining fields that's pretty much it. if number of matching values in table2.column4 is relatively small (let's say 100 or so) you can create a temp table by selecting distinct column4 from table2, and then do an update of table1 by joining it with the temp table.

  3. #3
    Join Date
    Sep 2003
    Posts
    54
    Table1 and Table2 are parent and child tables. Each table1 records are related to about 5 or more table2 records. Does it make sense the query takes hours to run?

    Also, I notice the process is sleeping mode and has wait type pageiolatch.

  4. #4
    Join Date
    Sep 2003
    Posts
    522
    this is what i found on this waittype:

    Short-term light-weight synchronization objects. Latches are not held for the duration of a transaction. Typical latching operations occur during row transfers to memory, controlling modifications to row offset table, etc. Consequently, latch duration is normally sensitive to available memory.

    If the wait is significant, it normally indicates cache contention.
    also look at this site for more info.

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    "512 ram"? Tell me that's not 512K ram? If so, thats a problem right there.

    You are updating parent columns 1, 2, and 3 from the child table? Are these parent columns part of the parent table's clustered index (traditionally the first columns of a table are part of the clustered index). If so, that is also going to be expensive, and I'd have to wonder about your database schema if you are updating a parent's clustered index from a child's values.

    Lastly, make sure the execution plan is using the indexes you have added.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Sep 2003
    Posts
    522
    i don't think you can find a 512k memory stick that would fit a piii-800 server. i'd say it's 512meg that the poster refers to. though the design could have been a little better (agree on positioning of clustered index at the head of the field list). have you tried to convert your update to a select with a count(table2.column4)? once you do that, look at the plan and see if you get an index seek.

  7. #7
    Join Date
    Sep 2003
    Posts
    54
    I find most of the hardware/software information provided in the original post is not correct. Here is the updated information:

    Operating System: Microsoft Windows NT - 5.0(2195)
    Product Version: 8.00.760 (SP3)
    OS memory: 256(MB)
    Hard Drive: 12.6 GB

    update table1 set column1 = coalesce(table2.column1, ''),
    column2 = table2.column2, column3 = table2.column3
    from table1, table2
    where table1.column4 = table2.column4

    Table1: 1343488 records
    Table2: 524288 records

    It took about 7 hrs to run the T-SQL statement. Does the Estimated Execution Plan provide some hint how long the T-SQL statement will run?

    This is the Estimated Execution Plan from SQL Query Analyzer:

    Update
    Cost: 0%

    Physical operation: UPDATE
    Logical operation: UPDATE
    Estimated row count: 1148114
    Estimated subtree cost: 307

    Table Update/Update
    Cost: 0%

    Physical operation: Table Update
    Logical operation: Update
    Estimated row count: 1148114
    Estimated row size: 21
    Estimated I/O cost: 0.0103
    Estimated CPU cost: 1.15
    Estimated number of executes: 1.0
    Estimated cost: 1.158450(0%)
    Estimated subtree cost: 307

    Compute Scalar
    Cost: 0%

    Physical operation: Compute Scalar
    Logical operation: Compute Scalar
    Estimated row count: 1148114
    Estimated row size: 63
    Estimated I/O cost: 0.000000
    Estimated CPU cost: 0.114
    Estimated number of executes: 1.0
    Estimated cost: 0.114800(0%)
    Estimated subtree cost: 306

    Top
    Cost: 0%

    Physical operation: Top
    Logical operation: Top
    Estimated row count: 1148114
    Estimated row size: 63
    Estimated I/O cost: 0.000000
    Estimated CPU cost: 0.114
    Estimated number of executes: 1.0
    Estimated cost: 0.114800(0%)
    Estimated subtree cost: 305

    Hash Match/Aggregate
    Cost: 55%

    Physical operation: Hash Match
    Logical operation: Aggregate
    Estimated row count: 1148114
    Estimated row size: 63
    Estimated I/O cost: 106
    Estimated CPU cost: 64.0
    Estimated number of executes: 1.0
    Estimated cost: 169.770770(55%)
    Estimated subtree cost: 305

    Hash Match/Inner Join
    Cost: 55%

    Physical operation: Hash Match
    Logical operation: Inner Join
    Estimated row count: 1148114
    Estimated row size: 83
    Estimated I/O cost: 27.7
    Estimated CPU cost: 38.1
    Estimated number of executes: 1.0
    Estimated cost: 65.797972(21%)
    Estimated subtree cost: 136

    Table Scan
    Cost: 21%

    Physical operation: Table Scan
    Logical operation: Table Scan
    Estimated row count: 524288
    Estimated row size: 73
    Estimated I/O cost: 64.8
    Estimated CPU cost: 0.576
    Estimated number of executes: 1.0
    Estimated cost: 65.341042(21%)
    Estimated subtree cost: 65.3

    Index Scan
    Cost: 1%

    Physical operation: Index Scan
    Logical operation: Index Scan
    Estimated row count: 1343488
    Estimated row size: 25
    Estimated I/O cost: 2.99
    Estimated CPU cost: 1.48
    Estimated number of executes: 1.0
    Estimated cost: 4.468086(1%)
    Estimated subtree cost: 4.47

    When I run the script, I check the Process Info and see status: sleeping and wait type: pageiolatch.

    I plan to run the script on a better machine to see how long it takes:
    P4 2.60 GHz
    768MB of RAM
    Windows XP Professional SP1
    SQL Server 2000 SP3
    Hard Drive 1: 30GB 2MB Buffer (tempdb)
    Hard Drive 2: 80GB 8MB Buffer (my database)


    Thank you for any suggestion.

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Look, start by taking ms_sql_dba's suggestion and change your SQL statement to use a JOIN instead of the WHERE clause:

    update table1 set column1 = coalesce(table2.column1, ''),
    column2 = table2.column2, column3 = table2.column3
    from table1 inner join table2 on table1.column4 = table2.column4

    Then please answer my questions from my previous post.

    Also, are these tables being accessed by other logins during the process?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Sep 2003
    Posts
    54
    Hi blindman,

    Sorry for not replying your post in my last post.

    1. The memory is 256MB.
    2. I have not changed to use inner join since it should not affect performance but I agree that I should change it to follow ANSI standard.
    3. Column1, Column2, Column3 are not part of the clustered index.
    4. I have included the Estimated Execution Plan in my last post. It seems that the table2 (the parent table) index will not be used since Table Scan will be used and table1 (the child table) index will be used since Index Scan will be used.
    5. I'm the only person accessing that SQL Server.

    Hi ms_sql_dba,

    I have looked at the Estimated Execution Plan when I change the statement to:

    select count(table1.column4)
    from table1, table2
    where table1.column4 = table2.column4

    or

    select count(table2.column4)
    from table1, table2
    where table1.column4 = table2.column4

    The Plan will have:
    Hash Match/Inner Join: 76%
    Index Scan (table2): 6%
    Index Scan (tabel1): 17%

    Thank you for any help.

  10. #10
    Join Date
    Apr 2004
    Location
    Kansas City, MO
    Posts
    734
    1. Change your join to an INNER LOOP JOIN.
    --And yes you will have to change the join type because it won't work otherwise.
    --See what that does for you. Sometimes this can be more efficient, although SQL Server is supposed to figure that out for you.
    2. How many records are in the parent table? Try running again with the INNER LOOP JOIN and see if you still get that table scan still. If you do, see what it does when you put an index hint on it (not normally recommended either).

    UPDATE t1
    SET
    t1.column1 = coalesce(t2.column2, ''),
    t1.column2 = t2.column2,
    t1.column3 = t2.column3
    FROM
    table1 t1
    INNER LOOP JOIN table2 t2 on t1.column4 = t2.column4
    MeanOldDBA
    derrickleggett@hotmail.com
    When life gives you a lemon, fire the DBA.

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    256 MB is not nearly enough memory. Add memory and you will increase your processing speed.

    INNER LOOP JOIN?

    OK, I'll bite. What does the LOOP do beyond a straight INNER JOIN?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    I think what ms_sql_dba was referring to was INDEX SEEK. If you're getting INDEX SCAN, it may actually be even worse than TABLE SCAN.

    BOL:

    Arguments
    LOOP | HASH | MERGE

    Specifies that the join in the query should use looping, hashing, or merging. Using LOOP | HASH | MERGE JOIN enforces a particular join between two tables.

    This is not very clear without further info, so I'm going to have a look at the differences and see if Mr. Leggett is right (hey, no offense, but I don't take any advice without knowing if it's not a hoax )
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  13. #13
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    OK, Mr. Leggett, I think you're very close, in fact, too close to be ignored. Based on your post you actually looked at the "Hash Match/Inner Join" section of the plan. LOOP join hint may actually improve the execution as opposed to HASH which the optimizer appeared to have chosen. Have you tried that, pcsql?
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  14. #14
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    select * from authors a inner join titleauthor ta on a.au_id = ta.au_id
    --Table 'titleauthor'. Scan count 23, logical reads 46, physical reads 0, read-ahead reads 0.
    --Table 'authors'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

    select * from authors a inner LOOP join titleauthor ta on a.au_id = ta.au_id
    --Table 'titleauthor'. Scan count 23, logical reads 46, physical reads 0, read-ahead reads 0.
    --Table 'authors'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0.


    select * from authors a inner HASH join titleauthor ta on a.au_id = ta.au_id
    --Table 'titleauthor'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
    --Table 'authors'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.


    select * from authors a inner MERGE join titleauthor ta on a.au_id = ta.au_id
    --Table 'titleauthor'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
    --Table 'authors'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  15. #15
    Join Date
    Sep 2003
    Posts
    54
    Hi Everyone,

    I use the Index Tuning Wizard and add a nonclustered index for all the columns in the statement. The index contains Column1, Column2, Column3, Column4.
    With this index, the statement took about 15 minutes or less. The Estimated Execution Plan:

    Update - Cost 0%
    Table Update/Update - Cost 1%
    Compute Scalar - Cost 0%
    Top - Cost 0%
    Stream Aggregate/Aggregate - Cost 5%
    Nested Loops/Inner Join - Cost 3%
    Index Scan (table1) - Cost 3%
    Index Seek (table2) - Cost 88%

    Can someone explain to me why this index will speed the performance?

    Now, I'm working on few other statements which have similar problem but the record sets are even larger. I'm currently trying the inner loop join suggested by derrickleggett on one of the statement.

    Thank you for any help.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •