Page 1 of 2 12 LastLast
Results 1 to 15 of 16
  1. #1
    Join Date
    Jan 2008
    Posts
    14

    Unanswered: How can this query be improved to increase performance

    I have a table with 26 million rows. I have 9 queries that take at least 16 hours to complete. The queries are run every three months but i would like to change the query as they take too long to process. I have a feeling the design of the query can be changed.

    All 9 queries look like this
    Code:
    update transaction
    set counted='L'
    where counted is null 
    and 
    (
     user_id like 'AB12EFXX%'
     or user_id like 'ABCD1XX%'
     or user_id like 'ABCDE2FXX%'
     or user_id like 'ABCDEF3XX%'
     or user_id like 'ABC4DEFXX%'
     or user_id like 'ABCD5EFXX%'
     or user_id like 'ABCDE6FXX%'
     or user_id like 'ABCDE7FXX%'
     or user_id like 'ABCDE8FXX%'
     or user_id like 'ABCDE9FXX%'
     or user_id like 'ABCDE11FXX%'
     or user_id like 'ABCDEFXX12%'
     or user_id like 'ABCDEFXX45%'
     or user_id like 'ABCDEFXXFTYT%'
     or user_id like 'ABCDEFXXD%'
     or user_id like 'ABCDEFXX9%'
     or user_id like 'ABCDEFXX8%'
     or user_id like 'ABCDEFX7X%'
     or user_id like 'ABCDEF6XX%'
     or user_id like 'ABCDEFX5GX%'
     or user_id like 'ABCDEFXXX%'
     or user_id like 'ABCDEF3XXX%'
     or user_id like 'ABCDEF1XX%'
     or user_id like 'ABCDEF0XX%'
    )
    I had a think about this and i think that what its doing is that it is checking each of the above user_id's against all 21 million rows. The above query has about 25 user_id so it must be going through 21 million rows 25 times. And it does this 9 times because i have 9 scripts.

    This is just a guess but if true then that explains why it takes at least 16 hours to complete. Now im looking for ideas and suggestions as to how the above can be improved as that is where i am failing.

    All the other queries use the same syntax the only difference is that some use "NOT LIKE" instead of "LIKE"

    Thanks in advance

    ps. user_id is indexed
    Last edited by @ziggy; 01-21-08 at 02:19.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    are you sure that query runs? it looks likely to produce syntax errors, because all those LIKEs need to be connected by ORs
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jan 2008
    Posts
    14
    Yes you are right. I just typed an example of how the query looks and forgot the ORs. I've corrected it.

  4. #4
    Join Date
    Jul 2003
    Location
    Amsterdam, Nederland
    Posts
    450

    what about this

    update transaction
    set counted='L'
    where counted is null
    and user_id in
    (SELECT user_id WHERE
    user_id like 'AB12EFXX%'
    or user_id like 'ABCD1XX%'
    or user_id like 'ABCDE2FXX%'
    or user_id like 'ABCDEF3XX%'
    or user_id like 'ABC4DEFXX%'
    or user_id like 'ABCD5EFXX%'
    or user_id like 'ABCDE6FXX%'
    or user_id like 'ABCDE7FXX%'
    or user_id like 'ABCDE8FXX%'
    or user_id like 'ABCDE9FXX%'
    or user_id like 'ABCDE11FXX%'
    or user_id like 'ABCDEFXX12%'
    or user_id like 'ABCDEFXX45%'
    or user_id like 'ABCDEFXXFTYT%'
    or user_id like 'ABCDEFXXD%'
    or user_id like 'ABCDEFXX9%'
    or user_id like 'ABCDEFXX8%'
    or user_id like 'ABCDEFX7X%'
    or user_id like 'ABCDEF6XX%'
    or user_id like 'ABCDEFX5GX%'
    or user_id like 'ABCDEFXXX%'
    or user_id like 'ABCDEF3XXX%'
    or user_id like 'ABCDEF1XX%'
    or user_id like 'ABCDEF0XX%'
    )

  5. #5
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    How many records are updated approximately ?

    In this example, "counted is null" avoids using an index on the COUNTED column as null values are not indexed, but if there are not many rows to update compared to the 26 million rows (say less than 1,000,000), then you'd better create an index on the COUNTED column and enter a special value for the rows to be processed instead of NULL.

    At least that is worth a try .

    Don't forget to gather statistics on the index before testing .

    Which version of Oracle are you using ?

    Regards,

    rbaraer
    ORA-000TK : No bind variable detected... Shared Pool Alert code 5 - Nuclear query ready .

  6. #6
    Join Date
    Jun 2004
    Location
    Liverpool, NY USA
    Posts
    2,509
    Is is only scanning once, but it has too look at every row because NULL values are not normally indexed. However, there is an old trick. Say you have a column that will never be null (primary key anyone). In the example, we will call the always not null column key1 and the sometimes column null1. Make an index as follows

    create index my_table_i1 on my_table(null1,key1);

    This will allow the null columns to be indexed and found quickly in a scan.
    Bill
    You do not need a parachute to skydive. You only need a parachute to skydive twice.

  7. #7
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Quote Originally Posted by beilstwh
    However, there is an old trick. Say you have a column that will never be null (primary key anyone). In the example, we will call the always not null column key1 and the sometimes column null1. Make an index as follows

    create index my_table_i1 on my_table(null1,key1);

    This will allow the null columns to be indexed and found quickly in a scan.
    Nice trick indeed, but the index can be big with 26 million rows .

    Worth giving it a try, anyway, you're right .

    rbaraer
    ORA-000TK : No bind variable detected... Shared Pool Alert code 5 - Nuclear query ready .

  8. #8
    Join Date
    Jul 2007
    Posts
    96
    I'm going to give a suggestion but I'd like to advise that it's based on my limited knowledge of databases. So, just let me know if my assumptions are wrong...

    You seem to have your user's id based on the concatenation of a variable prefix and the rest. If you would store those details in a different table perhaps it's faster to fetch them.

    Say you have a userId_Prefix table that has the following data:
    Id [PK] [Unique] [Clustered Index]
    'AB12EFXX'
    'ABCD1XX'
    'ABCDE2FXX'
    'ABCDEF3XX'
    'ABC4DEFXX'
    'ABCD5EFXX'
    'ABCDE6FXX'
    'ABCDE7FXX'
    'ABCDE8FXX'
    'ABCDE9FXX'
    'ABCDE11FXX'
    'ABCDEFXX12'
    'ABCDEFXX45'
    'ABCDEFXXFTYT'
    'ABCDEFXXD'
    'ABCDEFXX9'
    'ABCDEFXX8'
    'ABCDEFX7X'
    'ABCDEF6XX'
    'ABCDEFX5GX'
    'ABCDEFXXX'
    'ABCDEF3XXX'
    'ABCDEF1XX'
    'ABCDEF0XX'

    With this in place your users table would now be much more efficient if the user id was actually stored in two columns: prefixId & theRest.

    In my opinion this change would significantly boost your query performance. I would also be much more friendly to the reporting side of life. But then again I might be wrong...

    Updating your system to reflect this design would be a 1 time cost that could be undertaken during a maintainence plan. And, if implemented correctly, it would be transparent to the application and therefore to the users

  9. #9
    Join Date
    Jun 2004
    Location
    Liverpool, NY USA
    Posts
    2,509
    Just an idea. if the flag needs to be set, why not make a trigger on your transaction table that sets the counted flag if the data matches you particular pattern. This means that the flag would be set at the point of insert.
    Bill
    You do not need a parachute to skydive. You only need a parachute to skydive twice.

  10. #10
    Join Date
    Jan 2008
    Posts
    14
    Quote Originally Posted by RBARAER
    How many records are updated approximately ?

    In this example, "counted is null" avoids using an index on the COUNTED column as null values are not indexed, but if there are not many rows to update compared to the 26 million rows (say less than 1,000,000), then you'd better create an index on the COUNTED column and enter a special value for the rows to be processed instead of NULL.

    At least that is worth a try .

    Don't forget to gather statistics on the index before testing .

    Which version of Oracle are you using ?

    Regards,

    rbaraer
    Out of the 26m rows, about half of them will be updated. The database version is 10gR2

  11. #11
    Join Date
    Jan 2008
    Posts
    14
    The idea with the trigger sounds very interesting. Does this have any disadvantages?

  12. #12
    Join Date
    Jul 2007
    Posts
    96
    Quote Originally Posted by @ziggy
    The idea with the trigger sounds very interesting. Does this have any disadvantages?
    Your DML scrips executed againts the triggered table will suffer a performance hit since you're doing the validations / updates every time a user is inserted instead doing a bulk update during a maintenance routine.

    If implemented right it shouldn't be _that_ significant, or even noticable at all...

  13. #13
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    >I have a table with 26 million rows. I have 9 queries that take at least 16 hours to complete.
    That is about 2 hours per query.
    26M rows is NOT a big table.
    I would suggest running SQL_TRACE & TKPROF to actually see where the time is being spent.
    Oracle should be able to spin across 26M rows in a couple of minutes.
    IMO, some thing else is happening that either you are not telling us or don't know about.

    Code:
    15:04:07 SQL> select count(*) from  dw_2000000010662.line_item;
    
      COUNT(*)
    ----------
      39204708
    
    15:05:39 SQL>
    My DB just did 39M+ rows in under TWO minutes.
    Last edited by anacedent; 01-21-08 at 19:07.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  14. #14
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Quote Originally Posted by anacedent
    >I have a table with 26 million rows. I have 9 queries that take at least 16 hours to complete.
    That is about 2 hours per query.
    26M rows is NOT a big table.
    I would suggest running SQL_TRACE & TKPROF to actually see where the time is being spent.
    Oracle should be able to spin across 26M rows in a couple of minutes.
    IMO, some thing else is happening that either you are not telling us or don't know about.

    Code:
    15:04:07 SQL> select count(*) from  dw_2000000010662.line_item;
    
      COUNT(*)
    ----------
      39204708
    
    15:05:39 SQL>
    My DB just did 39M+ rows in under TWO minutes.
    You're right, anacedent. The problem is that ORs are often (if not always) transformed in UNION ALLs that kill performance. In such a case, it would be merely like doing a nested loop on the query for a single user_id, so you would get 24 * 2 minutes = 48 minutes (yes, this is very very approximative ). Anyway, I agree : an execution plan and even better a tkprof report could really help us.

    If it can be done via a trigger, then I don't see why it is not done directly by the insert statements...? I personally will never advise using triggers (with the exception of INSTEAD OF TRIGGERS on VIEWS), because they hide business logic and make things much much more complex as the application grows. I thought it was some sort of update after some batch processing was done on the rows. It is not ?

    Regards,

    rbaraer
    ORA-000TK : No bind variable detected... Shared Pool Alert code 5 - Nuclear query ready .

  15. #15
    Join Date
    Dec 2003
    Posts
    1,074
    In 10.1.0.4, it looks like an index on user_id would be utilized. That's what I expected. Why does everyone think the test for NULL on a non-ley field would override Oracle's ability to use an index on a primary key, when both are included in the same query?

    Code:
    create table million (user_id varchar2(7) primary key, counted varchar2(1));
    - - -
    table created 
    
    begin
     for i in 1 .. 100000
     loop
       insert into million (user_id) values (to_char(i));
     end loop;
    end;
    
    select count(*) from million;
    - - - 
    100000
    
    update million
    set counted = 'L'
    where counted is null and
    (user_id like '10099%' or 
     user_id like '20099%' or 
     user_id like '30099%' or 
     user_id like '40099%' or 
     user_id like '50099%' or 
     user_id like '60099%' or 
     user_id like '70099%' or 
     user_id like '80099%' or 
     user_id like '90099%' );
    Explain Plan:
    You see the UNION ALL effect, but the index on user_id is also being used. I think the statistics are the answer. Unless the user_id's selected on the table comprise a list so large that an index would be ignored anyway.

    Code:
    Operation	Object Name	Rows	Bytes	Cost	Object Node	In/Out	PStart	PStop
    
    UPDATE STATEMENT Optimizer Mode=ALL_ROWS		9  	 	9.02948185983382  	 	      	             	 
      UPDATE	FORBESC.MILLION	  	 	 	 	      	             	 
        CONCATENATION		  	 	 	 	      	             	 
          TABLE ACCESS BY INDEX ROWID	FORBESC.MILLION	1  	7  	1.00918399467754  	 	      	             	 
            INDEX RANGE SCAN	FORBESC.SYS_C00183187	1  	 	2.00215845593643  	 	      	             	 
          TABLE ACCESS BY INDEX ROWID	FORBESC.MILLION	1  	7  	1.0025747245411  	 	      	             	 
            INDEX RANGE SCAN	FORBESC.SYS_C00183187	1  	 	2.00171743761252  	 	      	             	 
          TABLE ACCESS BY INDEX ROWID	FORBESC.MILLION	1  	7  	1.00253189707984  	 	      	             	 
            INDEX RANGE SCAN	FORBESC.SYS_C00183187	1  	 	2.00169653219846  	 	      	             	 
          TABLE ACCESS BY INDEX ROWID	FORBESC.MILLION	1  	7  	1.00253187488574  	 	      	             	 
            INDEX RANGE SCAN	FORBESC.SYS_C00183187	1  	 	2.0016966324299  	 	      	             	 
          TABLE ACCESS BY INDEX ROWID	FORBESC.MILLION	1  	7  	1.00253187377603  	 	      	             	 
            INDEX RANGE SCAN	FORBESC.SYS_C00183187	1  	 	2.00169663744147  	 	      	             	 
          TABLE ACCESS BY INDEX ROWID	FORBESC.MILLION	1  	7  	1.00253187372055  	 	      	             	 
            INDEX RANGE SCAN	FORBESC.SYS_C00183187	1  	 	2.00169663769205  	 	      	             	 
          TABLE ACCESS BY INDEX ROWID	FORBESC.MILLION	1  	7  	1.00253187371777  	 	      	             	 
            INDEX RANGE SCAN	FORBESC.SYS_C00183187	1  	 	2.00169663770458  	 	      	             	 
          TABLE ACCESS BY INDEX ROWID	FORBESC.MILLION	1  	7  	1.00253187371763  	 	      	             	 
            INDEX RANGE SCAN	FORBESC.SYS_C00183187	1  	 	2.0016966377052  	 	      	             	 
          TABLE ACCESS BY INDEX ROWID	FORBESC.MILLION	1  	7  	1.00253187371763  	 	      	             	 
            INDEX RANGE SCAN	FORBESC.SYS_C00183187	1  	 	2.00169663770524
    --=cf

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •