Results 1 to 8 of 8
  1. #1
    Join Date
    Oct 2015
    Posts
    6

    Angry Unanswered: Simple query over large dataset taking too much time.

    Hi,

    I have been having a hard time with this query, I have a webpage which is supposed to run this query a dozen of times, and with an execution time up to 40 - 50 seconds per query, you can see how this is becoming quite problematic as the database keeps growing.

    Query :
    select distinct(scanid) from misspelled where word = ?

    knowing that the table 'misspelled' can grow up to a dozen million rows, is there any way to make this work ?

    Thanks, here are more details :

    Query : select distinct(scanid) from misspelled where word = ?

    Explain Analyze : http://explain.depesz.com/s/7ttn

    Table definition : http://pastebin.com/D6jrT2NC

    Version 9.4.0

    If you need anything else please let me know.

    Thanks
    Last edited by abdou-boulegh; 10-27-15 at 21:56.

  2. #2
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Please read this:

    https://wiki.postgresql.org/wiki/Slow_Query_Questions

    and the post the missing information
    I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags: http://www.dbforums.com/misc.php?do=bbcode#code

    Tips for good questions:

    http://tkyte.blogspot.de/2005/06/how...questions.html
    http://wiki.postgresql.org/wiki/SlowQueryQuestions
    http://catb.org/esr/faqs/smart-questions.html

  3. #3
    Join Date
    Oct 2015
    Posts
    6

    details

    Explain Analyze :

    http://explain.depesz.com/s/7ttn

    explain (analyze, buffers):

    http://explain.depesz.com/s/REC

    table definition ( misspelled ) :

    http://pastebin.com/D6jrT2NC

    postgres version : 9.4.0

    Hardware : 3.8 GHZ I7-2600K CPU, 16 GB of RAM

    This should be enough, if there is anything else you need to know please let me know.

  4. #4
    Join Date
    Oct 2015
    Posts
    6

    Details

    Thanks, here are more details :

    Query : select distinct(scanid) from misspelled where word = ?

    Explain Analyze : http://explain.depesz.com/s/7ttn

    Table definition : http://pastebin.com/D6jrT2NC

    Version 9.4.0

    If you need anything else please let me know.

  5. #5
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    This looks strange. The second plan looks quite reasonable with 820ms. This is probably caused by everything being in the cache. Which in turn means your harddisk seems to be extremely slow, otherwise I can't explain why scanning 250 MB should take 38 seconds.

    The number of blocks scanned in relation to the number of rows is also quite high - maybe you are suffering from a table (and index) bloat.
    Can you run vacuum full analyze misspelled and see if that changes anything?
    The output of that statement might also be interesting.

    9.4.0. is not the most resent 9.4 release though, but I doubt that this has anything to do with it.

    Btw (unrelated to your problem): distinct is NOT a function. Writing select distinct(scanid) is useless, distinct always operates on all columns of the select list. You should remove those unnecessary parentheses. It's bad habit that will bite you in the foot if you do that for more than one column, because (column_1, column_2) is something different than column_1, column_2
    Last edited by shammat; 11-01-15 at 11:59.
    I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags: http://www.dbforums.com/misc.php?do=bbcode#code

    Tips for good questions:

    http://tkyte.blogspot.de/2005/06/how...questions.html
    http://wiki.postgresql.org/wiki/SlowQueryQuestions
    http://catb.org/esr/faqs/smart-questions.html

  6. #6
    Join Date
    Oct 2015
    Posts
    6

    strange indeed

    "This looks strange. The second plan looks quite reasonable with 820ms. This is probably caused by everything being in the cache. "

    Yes, this is due to the cache being used, I only get a long execution time the first time I run a certain query, After that the time become < 1 s.

    I ran the vacuum full analyze misspelled command and the output was :
    Query returned successfully with no result in 200186 ms.

    Knowing that the misspelled table can go to dozens of millions in size, is there any reason for this delay ? or any hope to fix it ?

  7. #7
    Join Date
    Oct 2015
    Posts
    6

    thanks

    And thanks for the tip about the distinct thing.

  8. #8
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by abdou-boulegh View Post
    I ran the vacuum full analyze misspelled command and the output was :
    Query returned successfully with no result in 200186 ms.

    Knowing that the misspelled table can go to dozens of millions in size, is there any reason for this delay? or any hope to fix it ?
    The question is: did the execution plan change after that?
    I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags: http://www.dbforums.com/misc.php?do=bbcode#code

    Tips for good questions:

    http://tkyte.blogspot.de/2005/06/how...questions.html
    http://wiki.postgresql.org/wiki/SlowQueryQuestions
    http://catb.org/esr/faqs/smart-questions.html

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •