Page 1 of 2 12 LastLast
Results 1 to 15 of 23
  1. #1
    Join Date
    Nov 2004
    Posts
    285
    Provided Answers: 1

    Unanswered: Selecting distinct records quickly

    Good day,

    I have a table of approximately 10 million rows. The table has 3 field making up the key, namely:
    ID, Date, Program

    I need to extract all the distinct Program's from the table.
    I have don so with:
    Code:
    Select distinct Program from table
    This unfortunately takes roughly 2 minutes which is far to long. Is there something I can do to help speed this process up?

    Thanks in advance.

  2. #2
    Join Date
    Jan 2004
    Location
    In a large office with bad lighting
    Posts
    1,040
    Go look up covering index. Implement same. See improved response time.

    -- This is all just a Figment of my Imagination --

  3. #3
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941
    What's a "covering index"? I can't find it in BOL.
    Inspiration Through Fermentation

  4. #4
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    10 million rows - 2 minutes...
    Doesn't sound too bad to me?
    George
    Home | Blog

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    a covering index "covers" the requirements of the query, because it contains all the columns referenced in the query

    the 3-column key in this instance is a covering index
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Since you are looking for distinct values of the program column, make sure that you have an index that includes program as the first (logically left-most) column. See what indexes you have now, but if you don't have one that starts with the program column, make a new index that does start with it.

    -PatP

  7. #7
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941
    That leads me to another question about composite keys and indexes.
    Many times, not only will I index the composite key, I may index some (or all) of the fields in the key. Is that poor practice? Should I be using a covering query instead?

    EDIT: I guess Pat's reply answers my question
    Inspiration Through Fermentation

  8. #8
    Join Date
    Jan 2005
    Posts
    28
    Hi,

    You can use the NoLock hint.

    Also if Program values is selective you can create an index on the Program column. Then your query will be faster.

    Eralper
    http://www.kodyaz.com

  9. #9
    Join Date
    Nov 2004
    Posts
    285
    Provided Answers: 1
    Once the distinct select statement has been run it only returns about 700 rows, surely there must be a way to speed up the process. I understand that 2 minutes is not that long in terms of 10 million rows, but to retrieve only 700 from that and take 2 minutes seems a very long time.

  10. #10
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    The SQL statement lokos at EVERY row.
    2 minutes = 120 seconds.
    120 secs / 10million rows = 0.000012 seconds per row.
    I'd say that was quick.

    EDIT: That's approximately 83,333 records per second.
    George
    Home | Blog

  11. #11
    Join Date
    Jan 2004
    Location
    In a large office with bad lighting
    Posts
    1,040
    Quote Originally Posted by plsh
    Once the distinct select statement has been run it only returns about 700 rows, surely there must be a way to speed up the process. I understand that 2 minutes is not that long in terms of 10 million rows, but to retrieve only 700 from that and take 2 minutes seems a very long time.
    The time is not only a function of the number of rows, but also the number of data pages that have to be retrieved from disk.

    Let's say, for example, that your data is 800 bytes. So of your 10 million rows you can fit 10 of them on an 8K data page. So the i/o sub system has to pull 1 million data pages (table scan) to determine the uniqueness of your search criteria ... and no it cannot pull only a portion of each data row.

    Now let's say that your covering index consists of 80 bytes ... now you only have to pull 100,000 rows of the index pages to answer your question.

    Which do you think is faster?

    -- This is all just a Figment of my Imagination --

  12. #12
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    If you are frequently querying for disting Programs, then I'd suggest that should be a separate table with a foreign key in your stored in your current table. Enforce relational integrity, etc. That is the proper way to handle this. I suspect you performance is poor because your design is not normalized, and no amount of coding is going to fully compensate for that.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  13. #13
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by eralper
    Hi,

    You can use the NoLock hint.
    You sure can, and then you will have take a shower after the fact, because you will get dirty

    Also if Program values is selective you can create an index on the Program column. Then your query will be faster.

    Eralper
    http://www.kodyaz.com
    That's debatable

    In any case the blind dude is correct....and then throw a trigger on the base table to update insert delete the new table to keep it in synch, or add RI like he says....but then you need a process to make sure the parent table gets the data before the child
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  14. #14
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    I would have to agree with blindman here : it would seem that you need to normalise your tables correctly first.

  15. #15
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    aschk, that's two woulds on one sentence

    leaving aside for a moment the question of whether the table {ID,date,Program} is not normalized (i say it is), can anyone suggest a good primary key for this new Program table

    stands back apprehensively
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •