Welcome to the dBforums forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact support.

If you prefer not to see double-underlined words and corresponding ads, place your cursor
here for ContentLink opt out.

Go Back  dBforums > Database Server Software > PostgreSQL > Optimizer : query rewrite and execution plan ?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-05-08, 09:16
scheu scheu is offline
Registered User
 
Join Date: Aug 2007
Location: France
Posts: 14
Optimizer : query rewrite and execution plan ?

Hi

I have discovered an issue on my Postgresql database recently installed : it seems that the optimizer can not, when possible, simplify and rewrite a simple query before running it. Here is a simple and reproducible example :

my_db=# create table test (n numeric);
CREATE
my_db=# insert into test values (1); --> run 10 times
INSERT
my_db=# insert into test values (0); --> run 10 times
INSERT
my_db=# select count(*) from test;
count
-------
20
(1 row)
my_db=# vacuum full analyze test;
VACUUM
my_db=# explain select * from test where n = 1;
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..1.25 rows=10 width=9)
Filter: (n = 1::numeric)
(2 rows)

my_db=# explain select * from test where n = 1 and n = 1;
QUERY PLAN
-----------------------------------------------------
Seq Scan on test (cost=0.00..1.30 rows=5 width=9)
Filter: ((n = 1::numeric) AND (n = 1::numeric))
(2 rows)

In the first SELECT query (with "where n=1"), the estimated number of returned rows is correct (10), whereas in the second SELECT query (with "where n=1 and n=1"), the estimated number of returned rows is 5 (instead of 10 !)
So the optimizer has under-estimated the number of rows returned
That issue is very annoying because with generated SQL queries (from Business Objects for example) on big tables, it is possible that some queries have several times the same "where" condition ("where n=1 and n=1" for example), and as the optimizer is under-estimating the number of returned rows, some bad execution plans can be chosen (nested loops instead of hash joins for example)

Is the estimated number of returned rows directly linked to the decision of the optimizer to chose Hash Joins or Nested Loops in join queries ?
Is there a way for the Postgresql optimizer to be able to simplify and rewrite the SQL statements before running them ? Are there some parameters that could change the execution plans ?

Thanks by advance for your help
Reply With Quote
  #2 (permalink)  
Old 02-05-08, 16:01
rski rski is offline
Registered User
 
Join Date: Nov 2006
Posts: 12
strange, i didn't realize the optimizer works that way. the simplest way to help optimizer is to use IN clause rather than AND connected statements, i checked
optimier works then better. (sorry for english)
Reply With Quote
  #3 (permalink)  
Old 02-05-08, 16:58
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,354
Odd.

Now, you don't have an index on your table, so it knows it has to perform a sequential scan.

It (vacuum full analyze) saw only 2 discrete values of N, and apparently assumed that with a table with one field that this field would be unique.

It's a fairly standard rule to have a primary key in your table design, so I'm not sure that its valid for you to extrapolate invalid performance from a contrived instance.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Last edited by loquin : 02-05-08 at 17:01.
Reply With Quote
  #4 (permalink)  
Old 02-06-08, 04:14
scheu scheu is offline
Registered User
 
Join Date: Aug 2007
Location: France
Posts: 14
Thanks for your reply

I tried to add a new column as primary key, and create an index on my column n but it did not change anything :

my_db=# create table test (id serial primary key, n numeric);
NOTICE: CREATE TABLE will create implicit sequence "test_id_seq" for serial column "test.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE
my_db=# insert into test(n) values (1); --> run 10 times
INSERT 0 1
my_db=# insert into test(n) values (0); --> run 10 times
INSERT 0 1
my_db=# select * from test;
id | n
----+---
1 | 1
2 | 1
3 | 1
4 | 1
5 | 1
6 | 1
7 | 1
8 | 1
9 | 1
10 | 1
11 | 0
12 | 0
13 | 0
14 | 0
15 | 0
16 | 0
17 | 0
18 | 0
19 | 0
20 | 0
(20 rows)

my_db=# vacuum full analyze test;
VACUUM
my_db=# explain select * from test where n = 1;
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..1.25 rows=10 width=13)
Filter: (n = 1::numeric)
(2 rows)

my_db=# explain select * from test where n = 1 and n = 1;
QUERY PLAN
-----------------------------------------------------
Seq Scan on test (cost=0.00..1.30 rows=5 width=13)
Filter: ((n = 1::numeric) AND (n = 1::numeric))
(2 rows)

my_db=# create index indx_test on test(n);
CREATE INDEX
my_db=# vacuum full analyze test;
VACUUM
my_db=# select * from pg_stats where tablename = 'test';
schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
------------+-----------+---------+-----------+-----------+------------+------------------+-------------------+-------------------------------+-------------
public | test | id | 0 | 4 | -1 | | | {1,2,4,6,8,10,12,14,16,18,20} | 1
public | test | n | 0 | 9 | 2 | {0,1} | {0.5,0.5} | | -0.503759
my_db=# explain select * from test where n = 1;
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..1.25 rows=10 width=13)
Filter: (n = 1::numeric)
(2 rows)

my_db=# explain select * from test where n = 1 and n = 1;
QUERY PLAN
-----------------------------------------------------
Seq Scan on test (cost=0.00..1.30 rows=5 width=13)
Filter: ((n = 1::numeric) AND (n = 1::numeric))
(2 rows)

Is there any advanced documentation on the Postgresql optimizer that could describe how it works ? It would like to know for example what can make the optimizer chose hash join or nested loop : cost ? estimated number of returned rows ? or both ?

Last edited by scheu : 02-06-08 at 04:23.
Reply With Quote
  #5 (permalink)  
Old 02-07-08, 14:19
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,354
Quote:
Originally Posted by scheu
... Is there any advanced documentation on the Postgresql optimizer that could describe how it works ? It would like to know for example what can make the optimizer chose hash join or nested loop : cost ? estimated number of returned rows ? or both ?
Other than Chapter 13 and Chapter 54 of the documenation?

Possibly Explaining Explain and Performance Tuning PostgreSQL

BTW - which version of PostgreSQL are you using? (I assumed 8.2 in the documentation links, above)
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert

Reply With Quote
  #6 (permalink)  
Old 02-08-08, 04:04
scheu scheu is offline
Registered User
 
Join Date: Aug 2007
Location: France
Posts: 14
I use Postgresql 8.2
Thanks for the links
Reply With Quote
  #7 (permalink)  
Old 02-20-08, 06:25
scheu scheu is offline
Registered User
 
Join Date: Aug 2007
Location: France
Posts: 14
Somebody told me in another forum that this issue was fixed in release 8.3, but I have not tested it yet
Reply With Quote
  #8 (permalink)  
Old 03-14-08, 12:51
scheu scheu is offline
Registered User
 
Join Date: Aug 2007
Location: France
Posts: 14
After testing, it is indeed fixed in release 8.3
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On