If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Top N Query on a large table

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-11-06, 06:00
rodionos rodionos is offline
Registered User
 
Join Date: Aug 2006
Posts: 1
Top N Query on a large table

Hello!

My knowledge of SQL is limited, so thanks for reading and hopefully responding to this simple question on DB2 for Windows.

I have a large warehouse table (3 mln rows with timestamps) and I need to select distinct values from a column (character type). A query SELECT (COLNAME) FROM TABNAME takes over (40 seconds). In these 3 mln rows, there are only 50 distinct values, and I'm only concerned with distinct values in the most recent (say 1000) rows. Unfortunately, a sorted query with an ORDER BY clause using FETCH FIRST N ROWS executes equally slow, probably because timestamp is written in a character column (another story).

What options do I have?

Thanks!

Sergei
Reply With Quote
  #2 (permalink)  
Old 08-11-06, 06:46
grofaty grofaty is offline
Registered User
 
Join Date: Jan 2003
Posts: 1,570
1. create index on column colname something like:
create index on schema.myindex on schema.tablename (colname desc allow reverse scans)

2. run statistics:
runstats on table schema.tablename

3. run sql:
SELECT DISTINCT COLNAME FROM TABNAME

What is the result?

Hope this helps,
Grofaty
Reply With Quote
  #3 (permalink)  
Old 08-11-06, 08:23
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
Concerning your second solution:
Even when an index is present on "timestamps", using FETCH FIRST 1000 ROWS combined with ORDER BY might not use the index if the optimizer judges that sorting a 3-mln row table is faster than doing an indexed scan. (The "FETCH FIRST N ROWS" does not influence the optimizer's choice.)
In that case it might help to add "OPTIMIZE FOR 1 ROW" to the query.
Alternatively, use an index on "timestamps" and a WHERE condition something like "timestamps > CAST(current date - 7 days AS char(10))".
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On