| |
Welcome to the dBforums forums.
You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!
If you have any problems with the registration process or your account login, please contact contact support.
If you prefer not to see double-underlined words and corresponding ads, place your cursor here for ContentLink opt out.
|
 |

02-05-08, 09:16
|
|
Registered User
|
|
Join Date: Aug 2007
Location: France
Posts: 14
|
|
|
Optimizer : query rewrite and execution plan ?
|
Hi
I have discovered an issue on my Postgresql database recently installed : it seems that the optimizer can not, when possible, simplify and rewrite a simple query before running it. Here is a simple and reproducible example :
my_db=# create table test (n numeric);
CREATE
my_db=# insert into test values (1); --> run 10 times
INSERT
my_db=# insert into test values (0); --> run 10 times
INSERT
my_db=# select count(*) from test;
count
-------
20
(1 row)
my_db=# vacuum full analyze test;
VACUUM
my_db=# explain select * from test where n = 1;
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..1.25 rows=10 width=9)
Filter: (n = 1::numeric)
(2 rows)
my_db=# explain select * from test where n = 1 and n = 1;
QUERY PLAN
-----------------------------------------------------
Seq Scan on test (cost=0.00..1.30 rows=5 width=9)
Filter: ((n = 1::numeric) AND (n = 1::numeric))
(2 rows)
In the first SELECT query (with "where n=1"), the estimated number of returned rows is correct (10), whereas in the second SELECT query (with "where n=1 and n=1"), the estimated number of returned rows is 5 (instead of 10 !)
So the optimizer has under-estimated the number of rows returned
That issue is very annoying because with generated SQL queries (from Business Objects for example) on big tables, it is possible that some queries have several times the same "where" condition ("where n=1 and n=1" for example), and as the optimizer is under-estimating the number of returned rows, some bad execution plans can be chosen (nested loops instead of hash joins for example)
Is the estimated number of returned rows directly linked to the decision of the optimizer to chose Hash Joins or Nested Loops in join queries ?
Is there a way for the Postgresql optimizer to be able to simplify and rewrite the SQL statements before running them ? Are there some parameters that could change the execution plans ?
Thanks by advance for your help
|
|

02-05-08, 16:01
|
|
Registered User
|
|
Join Date: Nov 2006
Posts: 12
|
|
strange, i didn't realize the optimizer works that way. the simplest way to help optimizer is to use IN clause rather than AND connected statements, i checked
optimier works then better. (sorry for english)
|
|

02-05-08, 16:58
|
|
Super Moderator
|
|
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,354
|
|
|
Odd.
Now, you don't have an index on your table, so it knows it has to perform a sequential scan.
It (vacuum full analyze) saw only 2 discrete values of N, and apparently assumed that with a table with one field that this field would be unique.
It's a fairly standard rule to have a primary key in your table design, so I'm not sure that its valid for you to extrapolate invalid performance from a contrived instance.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
|
Last edited by loquin : 02-05-08 at 17:01.
|

02-06-08, 04:14
|
|
Registered User
|
|
Join Date: Aug 2007
Location: France
Posts: 14
|
|
Thanks for your reply
I tried to add a new column as primary key, and create an index on my column n but it did not change anything :
my_db=# create table test (id serial primary key, n numeric);
NOTICE: CREATE TABLE will create implicit sequence "test_id_seq" for serial column "test.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE
my_db=# insert into test(n) values (1); --> run 10 times
INSERT 0 1
my_db=# insert into test(n) values (0); --> run 10 times
INSERT 0 1
my_db=# select * from test;
id | n
----+---
1 | 1
2 | 1
3 | 1
4 | 1
5 | 1
6 | 1
7 | 1
8 | 1
9 | 1
10 | 1
11 | 0
12 | 0
13 | 0
14 | 0
15 | 0
16 | 0
17 | 0
18 | 0
19 | 0
20 | 0
(20 rows)
my_db=# vacuum full analyze test;
VACUUM
my_db=# explain select * from test where n = 1;
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..1.25 rows=10 width=13)
Filter: (n = 1::numeric)
(2 rows)
my_db=# explain select * from test where n = 1 and n = 1;
QUERY PLAN
-----------------------------------------------------
Seq Scan on test (cost=0.00..1.30 rows=5 width=13)
Filter: ((n = 1::numeric) AND (n = 1::numeric))
(2 rows)
my_db=# create index indx_test on test(n);
CREATE INDEX
my_db=# vacuum full analyze test;
VACUUM
my_db=# select * from pg_stats where tablename = 'test';
schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
------------+-----------+---------+-----------+-----------+------------+------------------+-------------------+-------------------------------+-------------
public | test | id | 0 | 4 | -1 | | | {1,2,4,6,8,10,12,14,16,18,20} | 1
public | test | n | 0 | 9 | 2 | {0,1} | {0.5,0.5} | | -0.503759
my_db=# explain select * from test where n = 1;
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..1.25 rows=10 width=13)
Filter: (n = 1::numeric)
(2 rows)
my_db=# explain select * from test where n = 1 and n = 1;
QUERY PLAN
-----------------------------------------------------
Seq Scan on test (cost=0.00..1.30 rows=5 width=13)
Filter: ((n = 1::numeric) AND (n = 1::numeric))
(2 rows)
Is there any advanced documentation on the Postgresql optimizer that could describe how it works ? It would like to know for example what can make the optimizer chose hash join or nested loop : cost ? estimated number of returned rows ? or both ?
|
Last edited by scheu : 02-06-08 at 04:23.
|

02-07-08, 14:19
|
|
Super Moderator
|
|
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,354
|
|
Quote:
|
Originally Posted by scheu
... Is there any advanced documentation on the Postgresql optimizer that could describe how it works ? It would like to know for example what can make the optimizer chose hash join or nested loop : cost ? estimated number of returned rows ? or both ?
|
Other than Chapter 13 and Chapter 54 of the documenation?
Possibly Explaining Explain and Performance Tuning PostgreSQL
BTW - which version of PostgreSQL are you using? (I assumed 8.2 in the documentation links, above)
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
|
|

02-08-08, 04:04
|
|
Registered User
|
|
Join Date: Aug 2007
Location: France
Posts: 14
|
|
I use Postgresql 8.2
Thanks for the links
|
|

02-20-08, 06:25
|
|
Registered User
|
|
Join Date: Aug 2007
Location: France
Posts: 14
|
|
Somebody told me in another forum that this issue was fixed in release 8.3, but I have not tested it yet
|
|

03-14-08, 12:51
|
|
Registered User
|
|
Join Date: Aug 2007
Location: France
Posts: 14
|
|
After testing, it is indeed fixed in release 8.3
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|