Results 1 to 9 of 9
  1. #1
    Join Date
    Oct 2009
    Posts
    2

    Unanswered: Trying to get rid of subqueries

    Hi-

    I've got my query working using subselects, but would really like to get away from them as I know there can be performance issues. Here's the query.

    select do.* from DATA_OBJECTS as do where
    do.id in (select data_object_id from permissions_for_data_objects where user_id=?))
    OR
    (do.id in (select data_object_id from permissions_for_data_objects_share_group
    where share_group_id in (select share_group_id from share_group_user where user_id=?)))
    OR do.id in (SELECT data_object_id FROM permissions_for_data_objects_corp_group WHERE corp_group_id IN (3, 2, 4))


    In nutshell I need data objects where the user has permissions in permissions_for_data_objects or permissions_for_data_objects_share_group or permissions_for_data_objects_corp_group.

    I've also wondered if separate queries would be a better choice.

    Any advice would be great appreciated.

    Thanks,

    Dave

  2. #2
    Join Date
    Dec 2008
    Location
    At work...
    Posts
    92
    I'm not sure moving the sub-selects will improve the performance. A four-way (left outer?) join may also be utterly slow if you haven't got the appropriate indexes.

    And if you have the indexes, the current query will be pretty fast.

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Correlating the subqueries as EXISTS statements rather than IN is typically significantly faster (how much faster depends on the size of these four tables, data distribution and available indexes). There are subtle semantic differences between EXISTS and IN. I wouldn't use left outer join for a couple of reasons.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  4. #4
    Join Date
    Oct 2009
    Posts
    2
    Really, I re-wrote it using left joins last night as follows. Haven't verified all the results as of yet.

    Why not use left join?

    select do.* from
    DATA_OBJECTS as do left join PERMISSIONS_FOR_DATA_OBJECTS as pdo on id=data_object_id
    left join permissions_for_data_objects_share_group as pdosg on do.id=pdosg.data_object_id
    left join share_group_user as sgu on sgu.share_group_id=pdosg.share_group_id
    left join PERMISSIONS_FOR_DATA_OBJECTS_CORP_GROUP pdocg on do.id=pdocg.data_object_id
    where pdo.user_id=2 or sgu.user_id=2 or pdocg.corp_group_id in (0)

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    There are two reasons I prefer not to use left join for this.
    1) Accuracy: If these are 1:m relationships then you will\ can get duplicates in your results.
    2) Consistency: I consistently code my SQL so that tables ONLY appear in the FROM clause if I am actually reading data from those tables. If their only purpose is to establish whether or not a record exists then it goes in the EXISTS clause. To parse this query you need to cross reference the WHERE clause with the FROM clause to establish exactly what is going on here and that there are no redundant joins. Also, the reverse of this query would now only require changing of EXISTS to NOT EXISTS. Reversing your query would take a lot more changes. Obviously it does not look like it applies for this particular requirement.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  6. #6
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Good explanation and reasoning Poots; I hadn't thought of it that way before.

    I rarely use EXISTS; I should really make a start!
    George
    Home | Blog

  7. #7
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by pootle flump
    1) Accuracy: If these are 1:m relationships then you will\ can get duplicates in your results.
    I don't understand this one. If you have an 1:n relationship and do a LOJ, you get some specific semantics for the result (incl. duplicates on the "1" side). If you use a subselect instead, you get other semantics. It really depends on what you need and "accuracy" is surely not a criteria for that.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  8. #8
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by onthemovefit
    I've got my query working using subselects, but would really like to get away from them as I know there can be performance issues.
    As general rule this is definitely not true.
    It entirely depends on your DBMS and the data in your tables.

    I have had situations where subselect have even been faster, but most of the time I observe that the optimizer generates the same exection plan anyway.

    Once you have a correct solution without a subselect you should analyze the execution plan to make sure that your assumption is correct (for that query).
    Ideally you even trace the real IO that happens when running the query (if your DBMS supports that)

  9. #9
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by stolze
    I don't understand this one. If you have an 1:n relationship and do a LOJ, you get some specific semantics for the result (incl. duplicates on the "1" side). If you use a subselect instead, you get other semantics. It really depends on what you need and "accuracy" is surely not a criteria for that.
    Accuracy is perhaps not the best word. Perhaps "redundant rows" would have been better. The number of redundant rows is dependant on the number of matching rows in the many table when using LOJ. There are (or should not be, assuming there are no dupes in the "1" table) no redundant rows when using EXISTS.
    Testimonial:
    pootle flump
    ur codings are working excelent.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •