Results 1 to 8 of 8
  1. #1
    Join Date
    Aug 2007
    Location
    France
    Posts
    14

    Unanswered: Optimizer : query rewrite and execution plan ?

    Hi

    I have discovered an issue on my Postgresql database recently installed : it seems that the optimizer can not, when possible, simplify and rewrite a simple query before running it. Here is a simple and reproducible example :

    my_db=# create table test (n numeric);
    CREATE
    my_db=# insert into test values (1); --> run 10 times
    INSERT
    my_db=# insert into test values (0); --> run 10 times
    INSERT
    my_db=# select count(*) from test;
    count
    -------
    20
    (1 row)
    my_db=# vacuum full analyze test;
    VACUUM
    my_db=# explain select * from test where n = 1;
    QUERY PLAN
    ------------------------------------------------------
    Seq Scan on test (cost=0.00..1.25 rows=10 width=9)
    Filter: (n = 1::numeric)
    (2 rows)

    my_db=# explain select * from test where n = 1 and n = 1;
    QUERY PLAN
    -----------------------------------------------------
    Seq Scan on test (cost=0.00..1.30 rows=5 width=9)
    Filter: ((n = 1::numeric) AND (n = 1::numeric))
    (2 rows)

    In the first SELECT query (with "where n=1"), the estimated number of returned rows is correct (10), whereas in the second SELECT query (with "where n=1 and n=1"), the estimated number of returned rows is 5 (instead of 10 !)
    So the optimizer has under-estimated the number of rows returned
    That issue is very annoying because with generated SQL queries (from Business Objects for example) on big tables, it is possible that some queries have several times the same "where" condition ("where n=1 and n=1" for example), and as the optimizer is under-estimating the number of returned rows, some bad execution plans can be chosen (nested loops instead of hash joins for example)

    Is the estimated number of returned rows directly linked to the decision of the optimizer to chose Hash Joins or Nested Loops in join queries ?
    Is there a way for the Postgresql optimizer to be able to simplify and rewrite the SQL statements before running them ? Are there some parameters that could change the execution plans ?

    Thanks by advance for your help

  2. #2
    Join Date
    Nov 2006
    Posts
    82
    strange, i didn't realize the optimizer works that way. the simplest way to help optimizer is to use IN clause rather than AND connected statements, i checked
    optimier works then better. (sorry for english)

  3. #3
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Odd.

    Now, you don't have an index on your table, so it knows it has to perform a sequential scan.

    It (vacuum full analyze) saw only 2 discrete values of N, and apparently assumed that with a table with one field that this field would be unique.

    It's a fairly standard rule to have a primary key in your table design, so I'm not sure that its valid for you to extrapolate invalid performance from a contrived instance.
    Last edited by loquin; 02-05-08 at 17:01.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  4. #4
    Join Date
    Aug 2007
    Location
    France
    Posts
    14
    Thanks for your reply

    I tried to add a new column as primary key, and create an index on my column n but it did not change anything :

    my_db=# create table test (id serial primary key, n numeric);
    NOTICE: CREATE TABLE will create implicit sequence "test_id_seq" for serial column "test.id"
    NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
    CREATE TABLE
    my_db=# insert into test(n) values (1); --> run 10 times
    INSERT 0 1
    my_db=# insert into test(n) values (0); --> run 10 times
    INSERT 0 1
    my_db=# select * from test;
    id | n
    ----+---
    1 | 1
    2 | 1
    3 | 1
    4 | 1
    5 | 1
    6 | 1
    7 | 1
    8 | 1
    9 | 1
    10 | 1
    11 | 0
    12 | 0
    13 | 0
    14 | 0
    15 | 0
    16 | 0
    17 | 0
    18 | 0
    19 | 0
    20 | 0
    (20 rows)

    my_db=# vacuum full analyze test;
    VACUUM
    my_db=# explain select * from test where n = 1;
    QUERY PLAN
    ------------------------------------------------------
    Seq Scan on test (cost=0.00..1.25 rows=10 width=13)
    Filter: (n = 1::numeric)
    (2 rows)

    my_db=# explain select * from test where n = 1 and n = 1;
    QUERY PLAN
    -----------------------------------------------------
    Seq Scan on test (cost=0.00..1.30 rows=5 width=13)
    Filter: ((n = 1::numeric) AND (n = 1::numeric))
    (2 rows)

    my_db=# create index indx_test on test(n);
    CREATE INDEX
    my_db=# vacuum full analyze test;
    VACUUM
    my_db=# select * from pg_stats where tablename = 'test';
    schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
    ------------+-----------+---------+-----------+-----------+------------+------------------+-------------------+-------------------------------+-------------
    public | test | id | 0 | 4 | -1 | | | {1,2,4,6,8,10,12,14,16,18,20} | 1
    public | test | n | 0 | 9 | 2 | {0,1} | {0.5,0.5} | | -0.503759
    my_db=# explain select * from test where n = 1;
    QUERY PLAN
    ------------------------------------------------------
    Seq Scan on test (cost=0.00..1.25 rows=10 width=13)
    Filter: (n = 1::numeric)
    (2 rows)

    my_db=# explain select * from test where n = 1 and n = 1;
    QUERY PLAN
    -----------------------------------------------------
    Seq Scan on test (cost=0.00..1.30 rows=5 width=13)
    Filter: ((n = 1::numeric) AND (n = 1::numeric))
    (2 rows)

    Is there any advanced documentation on the Postgresql optimizer that could describe how it works ? It would like to know for example what can make the optimizer chose hash join or nested loop : cost ? estimated number of returned rows ? or both ?
    Last edited by scheu; 02-06-08 at 04:23.

  5. #5
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Quote Originally Posted by scheu
    ... Is there any advanced documentation on the Postgresql optimizer that could describe how it works ? It would like to know for example what can make the optimizer chose hash join or nested loop : cost ? estimated number of returned rows ? or both ?
    Other than Chapter 13 and Chapter 54 of the documenation?

    Possibly Explaining Explain and Performance Tuning PostgreSQL

    BTW - which version of PostgreSQL are you using? (I assumed 8.2 in the documentation links, above)
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  6. #6
    Join Date
    Aug 2007
    Location
    France
    Posts
    14
    I use Postgresql 8.2
    Thanks for the links

  7. #7
    Join Date
    Aug 2007
    Location
    France
    Posts
    14
    Somebody told me in another forum that this issue was fixed in release 8.3, but I have not tested it yet

  8. #8
    Join Date
    Aug 2007
    Location
    France
    Posts
    14
    After testing, it is indeed fixed in release 8.3

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •