If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > SQL Query Doubt

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-20-06, 16:32
ksolomon ksolomon is offline
Registered User
 
Join Date: Jun 2004
Posts: 38
SQL Query Doubt

My current SQL query behaves poorly. The SQL is
SELECT DISTINCT A.no, A.subject_cd, seq_nbr FROM etl.student A
WHERE
A.subject_cd IN ('3800', '4000', '5000', '6000')
AND
A.seq_nbr IN ( SELECT MAX(seq_nbr) FROM etl.student B WHERE
A.no = B.no AND
A.subject_cd = B.subject_cd )



Sample Data
======================
no subject_cd seq_nbr
1 3800 1
1 3800 2
1 4000 3
2 4000 4
2 5000 5
3 5000 6
4 6000 7

Expected O/P
==========================
no subject_cd seq_nbr
1 3800 2
2 4000 4
2 5000 5
3 5000 6
4 6000 7

How can I improve the query. I have indexes on all the 3 columns and the database statistics is also updated using runstats.

Any idea ?

Thanks,Vij
Reply With Quote
  #2 (permalink)  
Old 02-20-06, 16:57
sathyaram_s sathyaram_s is offline
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,534
You have a co-related subquery ... try using joins ..

The have not syntax checked it ...

with temp as
(
SELECT no,subject_cd,MAX(seq_nbr) FROM etl.student B
where subject_cd IN ('3800', '4000', '5000', '6000')
group by no,subject_cd
)
select distinct a.no,a.subject_cd,seq_nbr from etl.student A,
etl.student B
where
A.no=B.no and A.subject_cd=b.subject_cd
and A.subject_cd IN ('3800', '4000', '5000', '6000')

hmmm ... there should be a much better method ..

Cheers
Sathyaram

Quote:
Originally Posted by ksolomon
My current SQL query behaves poorly. The SQL is
SELECT DISTINCT A.no, A.subject_cd, seq_nbr FROM etl.student A
WHERE
A.subject_cd IN ('3800', '4000', '5000', '6000')
AND
A.seq_nbr IN ( SELECT MAX(seq_nbr) FROM etl.student B WHERE
A.no = B.no AND
A.subject_cd = B.subject_cd )



Sample Data
======================
no subject_cd seq_nbr
1 3800 1
1 3800 2
1 4000 3
2 4000 4
2 5000 5
3 5000 6
4 6000 7

Expected O/P
==========================
no subject_cd seq_nbr
1 3800 2
2 4000 4
2 5000 5
3 5000 6
4 6000 7

How can I improve the query. I have indexes on all the 3 columns and the database statistics is also updated using runstats.

Any idea ?

Thanks,Vij
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #3 (permalink)  
Old 02-20-06, 17:04
sathyaram_s sathyaram_s is offline
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,534
max will anyways cause a sort ...

So, why not something like this ?

I have given a template only
select * from
(select no,subject_cd,rownumber() over (partition by no,subject_cd order by seq_num desc) as rowid from etl.student
) as x where rowid=1

HTH

Sathyaram
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #4 (permalink)  
Old 02-20-06, 20:16
ksolomon ksolomon is offline
Registered User
 
Join Date: Jun 2004
Posts: 38
Thanks Sathyaram, The 2nd option which you suggested using the OLAP function gave me a huge improvement.
Reply With Quote
  #5 (permalink)  
Old 03-01-06, 05:49
Rob den Heijer Rob den Heijer is offline
Registered User
 
Join Date: Feb 2006
Location: Utrecht, Netherlands
Posts: 16
This is probably the best solution, Sathyaram.
In some instances, it might help to create the index with 'reverse scan' enabled. This could improve max() function performance; differences between min() and max() should disappear, performance-wise.
In the statement where the index is created, use 'ALLOW REVERSE SCANS' .

See
http://safariexamples.informit.com/0...n/r0000919.htm

cheers,
Rob.
Reply With Quote
  #6 (permalink)  
Old 03-01-06, 07:18
sathyaram_s sathyaram_s is offline
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,534
Ron, Thanks for pointing out my error ..

What I actually meant to say was 'Group by' will cause a sort ...

Thanks for the link

Sathyaram
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #7 (permalink)  
Old 03-01-06, 07:25
Rob den Heijer Rob den Heijer is offline
Registered User
 
Join Date: Feb 2006
Location: Utrecht, Netherlands
Posts: 16
So sorry!
I meant you were right in supplying the 'partition by' solution. That's the most direct answer, and usage of this syntax needs to be encouraged!

But I discovered recently why min() can perform much better than max(), and I wanted to share this knowledge. Both your approach and mine can help others.
Reply With Quote
  #8 (permalink)  
Old 03-02-06, 15:47
DBA-Jr DBA-Jr is offline
Registered User
 
Join Date: Mar 2005
Posts: 108
It seems one record is missing from ksolomon's "Expected O/P" - "1 4000 3". If I understand ksolomon's need corrently, The original code:
Quote:
SELECT DISTINCT A.no, A.subject_cd, seq_nbr FROM etl.student A
WHERE
A.subject_cd IN ('3800', '4000', '5000', '6000')
AND
A.seq_nbr IN ( SELECT MAX(seq_nbr) FROM etl.student B WHERE
A.no = B.no AND
A.subject_cd = B.subject_cd )
should be easily converted to a the simplest one:
Code:
SELECT no, subject_cd, max(seq_nbr) 
   FROM etl.student 
   WHERE subject_cd IN ('3800', '4000', '5000', '6000')
   group by no, subject_cd
The "group by" grantees "distinct" is applied. The subselect is simply unnecessary. Am I too simple to be correct? Please correct me if I am actually wrong.

Last edited by DBA-Jr; 03-02-06 at 15:55.
Reply With Quote
  #9 (permalink)  
Old 03-02-06, 18:36
sathyaram_s sathyaram_s is offline
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,534
hmmm ... may be ...

My understanding was that for each subject, select the row with the max seq number

ksolomon should clarify ...







Quote:
Originally Posted by DBA-Jr
It seems one record is missing from ksolomon's "Expected O/P" - "1 4000 3". If I understand ksolomon's need corrently, The original code:

should be easily converted to a the simplest one:
Code:
SELECT no, subject_cd, max(seq_nbr) 
   FROM etl.student 
   WHERE subject_cd IN ('3800', '4000', '5000', '6000')
   group by no, subject_cd
The "group by" grantees "distinct" is applied. The subselect is simply unnecessary. Am I too simple to be correct? Please correct me if I am actually wrong.
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #10 (permalink)  
Old 03-02-06, 19:15
ksolomon ksolomon is offline
Registered User
 
Join Date: Jun 2004
Posts: 38
DBA-Jr was right. Even I missed that row in my expected output. Sorry for the confusion.
Reply With Quote
  #11 (permalink)  
Old 03-03-06, 03:23
Rob den Heijer Rob den Heijer is offline
Registered User
 
Join Date: Feb 2006
Location: Utrecht, Netherlands
Posts: 16
DBA-Jr,
I think you are very much right. This is very simple, and (AFAICT) correct.
We can now wait for performance results (with or without ALLOW REVERSE SCANS, I hope)
Rob.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On