If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > ANSI SQL > Trying to get rid of subqueries

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-13-09, 22:56
onthemovefit onthemovefit is offline
Registered User
 
Join Date: Oct 2009
Posts: 2
Trying to get rid of subqueries

Hi-

I've got my query working using subselects, but would really like to get away from them as I know there can be performance issues. Here's the query.

select do.* from DATA_OBJECTS as do where
do.id in (select data_object_id from permissions_for_data_objects where user_id=?))
OR
(do.id in (select data_object_id from permissions_for_data_objects_share_group
where share_group_id in (select share_group_id from share_group_user where user_id=?)))
OR do.id in (SELECT data_object_id FROM permissions_for_data_objects_corp_group WHERE corp_group_id IN (3, 2, 4))


In nutshell I need data objects where the user has permissions in permissions_for_data_objects or permissions_for_data_objects_share_group or permissions_for_data_objects_corp_group.

I've also wondered if separate queries would be a better choice.

Any advice would be great appreciated.

Thanks,

Dave
Reply With Quote
  #2 (permalink)  
Old 10-14-09, 03:35
JarlH JarlH is offline
Registered User
 
Join Date: Dec 2008
Location: At work...
Posts: 68
I'm not sure moving the sub-selects will improve the performance. A four-way (left outer?) join may also be utterly slow if you haven't got the appropriate indexes.

And if you have the indexes, the current query will be pretty fast.
Reply With Quote
  #3 (permalink)  
Old 10-14-09, 04:14
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Correlating the subqueries as EXISTS statements rather than IN is typically significantly faster (how much faster depends on the size of these four tables, data distribution and available indexes). There are subtle semantic differences between EXISTS and IN. I wouldn't use left outer join for a couple of reasons.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #4 (permalink)  
Old 10-14-09, 08:17
onthemovefit onthemovefit is offline
Registered User
 
Join Date: Oct 2009
Posts: 2
Really, I re-wrote it using left joins last night as follows. Haven't verified all the results as of yet.

Why not use left join?

select do.* from
DATA_OBJECTS as do left join PERMISSIONS_FOR_DATA_OBJECTS as pdo on id=data_object_id
left join permissions_for_data_objects_share_group as pdosg on do.id=pdosg.data_object_id
left join share_group_user as sgu on sgu.share_group_id=pdosg.share_group_id
left join PERMISSIONS_FOR_DATA_OBJECTS_CORP_GROUP pdocg on do.id=pdocg.data_object_id
where pdo.user_id=2 or sgu.user_id=2 or pdocg.corp_group_id in (0)
Reply With Quote
  #5 (permalink)  
Old 10-14-09, 09:30
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
There are two reasons I prefer not to use left join for this.
1) Accuracy: If these are 1:m relationships then you will\ can get duplicates in your results.
2) Consistency: I consistently code my SQL so that tables ONLY appear in the FROM clause if I am actually reading data from those tables. If their only purpose is to establish whether or not a record exists then it goes in the EXISTS clause. To parse this query you need to cross reference the WHERE clause with the FROM clause to establish exactly what is going on here and that there are no redundant joins. Also, the reverse of this query would now only require changing of EXISTS to NOT EXISTS. Reversing your query would take a lot more changes. Obviously it does not look like it applies for this particular requirement.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #6 (permalink)  
Old 10-14-09, 12:04
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
Good explanation and reasoning Poots; I hadn't thought of it that way before.

I rarely use EXISTS; I should really make a start!
__________________
George
Twitter | Blog
Reply With Quote
  #7 (permalink)  
Old 10-14-09, 16:28
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Quote:
Originally Posted by pootle flump
1) Accuracy: If these are 1:m relationships then you will\ can get duplicates in your results.
I don't understand this one. If you have an 1:n relationship and do a LOJ, you get some specific semantics for the result (incl. duplicates on the "1" side). If you use a subselect instead, you get other semantics. It really depends on what you need and "accuracy" is surely not a criteria for that.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #8 (permalink)  
Old 10-14-09, 17:02
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,403
Quote:
Originally Posted by onthemovefit
I've got my query working using subselects, but would really like to get away from them as I know there can be performance issues.
As general rule this is definitely not true.
It entirely depends on your DBMS and the data in your tables.

I have had situations where subselect have even been faster, but most of the time I observe that the optimizer generates the same exection plan anyway.

Once you have a correct solution without a subselect you should analyze the execution plan to make sure that your assumption is correct (for that query).
Ideally you even trace the real IO that happens when running the query (if your DBMS supports that)
Reply With Quote
  #9 (permalink)  
Old 10-15-09, 04:12
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Quote:
Originally Posted by stolze
I don't understand this one. If you have an 1:n relationship and do a LOJ, you get some specific semantics for the result (incl. duplicates on the "1" side). If you use a subselect instead, you get other semantics. It really depends on what you need and "accuracy" is surely not a criteria for that.
Accuracy is perhaps not the best word. Perhaps "redundant rows" would have been better. The number of redundant rows is dependant on the number of matching rows in the many table when using LOJ. There are (or should not be, assuming there are no dupes in the "1" table) no redundant rows when using EXISTS.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On