Results 1 to 3 of 3
  1. #1
    Join Date
    Mar 2001
    Location
    Lexington, KY
    Posts
    606

    Unanswered: SQL Performance Tuning

    I have four tables in our Oracle 9i DBMS:
    sorgeor( sorgeor_code, sorgeor_start_range, sorgeor_end_range )
    Cardinality 230,000

    spraddr( spraddr_pidm, spraddr_atyp_code, spraddr_status_ind, spraddr_zip, etc. )
    Cardinality 2,200,000

    aprcatg( aprcatg_pidm, aprcatg_donr_code )
    Cardinality 260,000

    spbpers( spbpers_pidm, spbpers_dead_ind )
    Cardinality 716,000

    SORGEOR is a ‘georegion’ table. Basically we define geographical areas and store the zip code ranges in the sorgeor table. For example, I might have a georegion row that looks like this:
    sorgeor{ OH_COL, 43080-0000, 43085-9999 }
    sorgeor{ OH_COL, 43090-0000, 43091-9999 }

    This means that we’ve defined the Columbus, OH metro georegion to include anyone whose zip code is between 43080-000, 43085-9999 or 43090-0000 to 43091-9999.

    I would be included in this list because my spraddr row looks like this:
    spraddr{ 12345, ‘MA’, NULL, 43081-1234 }

    My brother would not be, because his address has expired:
    spraddr{ 11111, ‘MA’, ‘INACTIVE’, 43082-3333 }

    My neighbor would not be included even though he lives in Columbus because he is deceased:
    spraddr{ 1234, ‘MA’, NULL, 43080 }
    spbpers{ 1234, ‘DECEASED’ }

    My mom would not be included even though she lives in Columbus because she does not have any aprcatg_donr_codes which start with A:
    spraddr{ 9999, ‘MA’, NULL, 43091-1111 }
    aprcatg{ 9999, ‘PG’ }

    My goal is, for particular georegions, get a count of everyone who is still alive, has an ‘A%’ donr_code, and still lives in that georegion (status_ind IS NULL and spraddr_to_date < SYSDATE or spraddr_to_date IS NULL) and is of type MA (spraddr_atyp_code of ‘MA’).

    This is what I currently have:
    Code:
      SELECT /*+INDEX( SPRADDR SPRADDR_KEY_INDEX ) +INDEX( APRCATG PK_APRCATG ) +ALL_ROWS*/
      	   COUNT( DISTINCT spraddr_pidm )
      	 , sorgeor_geor_code
        FROM sorgeor
       INNER
        JOIN spraddr
      	ON spraddr_pidm IN( SELECT /*+INDEX( APRCATG PK_APRCATG )*/ aprcatg_pidm
     						 FROM aprcatg
     					 WHERE aprcatg_pidm = spraddr_pidm
     						 AND aprcatg_donr_code LIKE 'A%' )
         AND spraddr_pidm IN( SELECT spbpers_pidm
     						 FROM spbpers
     					 WHERE spbpers_pidm = spraddr_pidm
     						 AND spbpers_dead_ind IS NULL )
         AND spraddr_atyp_code = 'MA'
         AND spraddr_status_ind IS NULL
         AND NVL( spraddr_to_date, SYSDATE - 1 ) < SYSDATE
         AND spraddr_zip BETWEEN sorgeor_start_range AND sorgeor_end_range   
      GROUP
      	BY sorgeor_geor_code
    Here is the query plan for the first query:
    Code:
      0, SELECT STATEMENT
        1,  SORT,  GROUP BY
         2,   TABLE ACCESS,   BY INDEX ROWID,   SORGEOR
      	3,	NESTED LOOPS
      	 4,	 NESTED LOOPS
      	  5,	  NESTED LOOPS
     	 6,	 INDEX,	 FULL SCAN,	 PK_APRCATG
     	 7,	 TABLE ACCESS,	 BY INDEX ROWID,	 SPRADDR
     	 8,	 INDEX,		RANGE SCAN,		SPRADDR_KEY_INDEX
     	 9,	 TABLE ACCESS,	 BY INDEX ROWID,	 SPBPERS
     	 10,	 INDEX,	 UNIQUE SCAN,	 PK_SPBPERS
      	 11,	 INLIST ITERATOR
     	 12,	 INDEX,	 RANGE SCAN,	 SORGEOR_KEY_INDEX2
    Here is if I remove the index hints:
    Code:
      0, SELECT STATEMENT
        1,  SORT,  GROUP BY
         2,   NESTED LOOPS
      	3,	NESTED LOOPS
      	 4,	 HASH JOIN
     	 5,	 TABLE ACCESS,	 FULL,	 SPRADDR
     	 6,	 TABLE ACCESS,	 FULL,	 APRCATG
      	 7,	 INLIST ITERATOR
     	 8,	 TABLE ACCESS,	 BY INDEX ROWID,	 SORGEOR
     	 9,	 INDEX,	 RANGE SCAN,	 SORGEOR_KEY_INDEX2
     	10,	TABLE ACCESS,	BY INDEX ROWID,	SPBPERS
     	 11,	 INDEX,	 UNIQUE SCAN,	 PK_SPBPERS
    If I try this query:
    Code:
      SELECT  /*+ALL_ROWS INDEX( SPRADDR SPRADDR_KEY_INDEX )*/  
      	   COUNT( DISTINCT spraddr_pidm ) 
      	 , sorgeor_geor_code 
        FROM sorgeor 
       INNER 
        JOIN spraddr 
      	ON EXISTS( SELECT /*+INDEX( APRCATG PK_APRCATG )*/ *
     						 FROM aprcatg 
     					 WHERE aprcatg_pidm = spraddr_pidm 
     						 AND aprcatg_donr_code LIKE 'A%' ) 
         AND EXISTS ( SELECT /*+INDEX( SPBPERS PK_SPBPERS )*/ *
     						 FROM spbpers 
     					 WHERE spbpers_pidm = spraddr_pidm 
     						 AND spbpers_dead_ind IS NULL ) 
         AND spraddr_atyp_code = 'MA' 
         AND spraddr_status_ind IS NULL 
         AND NVL( spraddr_to_date, SYSDATE - 1 ) < SYSDATE 
         AND spraddr_zip BETWEEN sorgeor_start_range AND sorgeor_end_range  
       GROUP 
      	BY sorgeor_geor_code
    I get the following plan:
    Code:
      0, SELECT STATEMENT
        1,  SORT,  GROUP BY
         2,   NESTED LOOPS
      	3,	MERGE JOIN
      	 4,	 SORT,	 JOIN
     	 5,	 TABLE ACCESS,	 BY INDEX ROWID,	 SPRADDR
      	   6,	   NESTED LOOPS
     	 7,	 INDEX,		FULL SCAN,		PK_APRCATG
     	 8,	 INDEX,		RANGE SCAN,		SPRADDR_KEY_INDEX
      	 9,	 FILTER
      	  10,	  SORT,	  JOIN
     	 11,	 TABLE ACCESS,	 BY INDEX ROWID,	 SORGEOR
     	 12,	 INDEX,		RANGE SCAN,		SORGEOR_KEY_INDEX2
     	13,	TABLE ACCESS,	BY INDEX ROWID,	SPBPERS
     	 14,	 INDEX,	 UNIQUE SCAN,	 PK_SPBPERS
    The third query is unconditionally better than the first two. The problem is that it is still pretty slow. For georegion IREL39369 (371 rows in SORGEOR) it returns 538 people in about four minutes. For georegion IREL39106 (5249 rows) it returns 711 people in about fifty minutes. This is nearly linear. If I extrapolate this linear relationship to the total number of georegion rows I get approximately forty-some odd hours to run for all of them.

    Any ideas on how to optimize these queries and/or any query ideas from the gallery? This is a 3rd party application DB and I cannot (without headaches) change the schema which includes adding indexes or creating new tables.
    Thanks,

    Matt

  2. #2
    Join Date
    Mar 2001
    Location
    Lexington, KY
    Posts
    606
    One note, the 'real' spraddr primary key contains a contrived 'spraddr_seq_no' -- a sequence number. I might be able to have two active MA addresses -- for example if I own two homes. This means that a person can possibly qualify for multiple georegions, which is fine.
    Thanks,

    Matt

  3. #3
    Join Date
    Apr 2004
    Location
    Arizona
    Posts
    75
    Specify the fields and only the fields you are going to use in your query unless you are doing something like a direct copy or a insert where you need all of the fields you should try to stear clear of Select *.

    Other things you could do is to Prepare the table before opening it and setting the read to dirty read.

    Also try to avoid Inner, Left and Right Joins. Use a natural join if you can. I know this is hard because some times you need the row even if the forign key is blank.

    Onle last thing you could, which looks like you have already done, is to setup indexes on your keys.




    It looks like your speed issue may be in the Inner Join. See if you can do with out the inner join and use a natural join.

    Also Try Creating an Index on "sorgeor_geor_code" to speed up your group by.

    If this does not work, I have found that creating a view solves this problem if your query does not use any paramters.

    I hope this helps.
    Last edited by AceOmega; 07-13-04 at 16:43.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •