Results 1 to 10 of 10
  1. #1
    Join Date
    Jun 2007
    Location
    London
    Posts
    117

    Unanswered: Lots of Joins: Speed Impact?

    Hello all,

    I recently got advice from another forum when I had a query issue. I have opted for this solution primarily because I can loop through the join lines in my application logic so I can populate as many tag searches as the user wants to narrow down their search. Btw, I am building a sort of bookmarking service.

    Code:
    SELECT bookmarks.*
    FROM bookmarks
    JOIN tags AS T1 ON T1.bookmark_id = bookmarks.id AND T1.tag = 'clean'
    JOIN tags AS T2 ON T2.bookmark_id = bookmarks.id AND T2.tag = 'simple'
    I could potentially have something like this:

    Code:
    SELECT bookmarks.*
    FROM bookmarks
    JOIN tags AS T1 ON T1.bookmark_id = bookmarks.id AND T1.tag = 'clean'
    JOIN tags AS T2 ON T2.bookmark_id = bookmarks.id AND T2.tag = 'simple'
    JOIN tags AS T3 ON T3.bookmark_id = bookmarks.id AND T3.tag = 'clean'
    JOIN mvv AS T4 ON T4.bookmark_id = bookmarks.id AND T4.mvs = '6'
    JOIN mvv AS T5 ON T5.bookmark_id = bookmarks.id AND T5.mvs = '12'
    JOIN mvv AS T6 ON T6.bookmark_id = bookmarks.id AND T6.mvs = '18'
    I am getting worried this might be really in-efficient. I would appreciate more help on this or a better way of doing this if it needs improving and it most likely does!?!

    I will only ever be joining no more than 4 tables but they can have lots of criteria.

    Thanks all, I appreciate any help.

  2. #2
    Join Date
    Apr 2010
    Posts
    5
    if there is index on tags.tag and also index on mvv.mvs, and the duplication of these two columns are small, and there is index on bookmarks.id, I think this query is ok. Please check the execution plan, and do some test.

  3. #3
    Join Date
    Jun 2007
    Location
    London
    Posts
    117
    Quote Originally Posted by guhongying View Post
    if there is index on tags.tag and also index on mvv.mvs, and the duplication of these two columns are small
    Without an index does this query look inefficient to you - just the fact that I am using JOIN again and again on the same tables?

    The columns tags.tag and mvv.mvs are likely to contain duplication as a bookmark can have many tags and each bookmark can have the same tag applied to it. However, there will not ever be a duplicate of tags for a particular bookmark.

    Quote Originally Posted by guhongying View Post
    Please check the execution plan, and do some test.
    I am preparing test data to use the EXPLAIN, I am not very good with analysing the output of EXPLAIN!
    Last edited by compsci; 04-08-10 at 07:08.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Code:
    SELECT bookmarks.*
      FROM bookmarks
    INNER
      JOIN ( SELECT bookmark_id 
               FROM tags
              WHERE tag IN ( 'clean', 'simple', ) -- 2 tags listed
             GROUP
                 BY bookmark_id 
             HAVING COUNT(*) = 2 ) AS b -- check 2 tags were found
        ON b.bookmark_id = bookmarks.id 
    INNER
      JOIN ( SELECT bookmark_id 
               FROM mvv
              WHERE mvs IN ( '6', '12', '18' ) -- 3 mvs listed
             GROUP
                 BY bookmark_id 
             HAVING COUNT(*) = 3 ) AS m -- check 3 mvs were found
        ON m.bookmark_id = bookmarks.id
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Jul 2009
    Location
    New Delhi
    Posts
    2
    One great way to JOIN while maintaining efficiency would be if you order your join for minimization of Cartesian Product result size, if you have a ballpark expectation of the data size of the database.

    Think of it - what is a JOIN in terms of fundamental Relational Algebra? It is Cartesian Product, followed by Select (Sigma, not the RDBMS SELECT) followed by Project (Pi).

    So if the Cartesian Product size is small all through, the Sigma operation and in fact the Pi operation too becomes faster.

    How to minimize the size of the Cartesian Product result? You can do it only if you have an approximate idea of your table sizes. Order the JOIN operations in a way that you minimize this.

    If you do not have an idea of the expected sizes, then this is not going to work in practice. All of it is just a practical approach (engineering approach), without any theoretical bound.

  6. #6
    Join Date
    Jun 2007
    Location
    London
    Posts
    117
    Hi r937,

    I tried this query which is basically your query but looking for matching one tag:

    Code:
    SELECT bookmarks.* FROM bookmarks INNER JOIN
    (SELECT bookmark_id FROM tags WHERE tag IN ('clean') 
    GROUP BY bookmark_id HAVING COUNT(*) = 1) 
    AS b ON b.bookmark_id
    However, the above returns bookmarks that do not even have the tag 'clean'? Here is my DB schema:

    Code:
    CREATE TABLE IF NOT EXISTS `bookmarks` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `title` char(250) NOT NULL,
      `link` text NOT NULL,
      `date_added` datetime NOT NULL,
      `privacy_type` int(11) NOT NULL,
      `user_id` int(11) NOT NULL,
      PRIMARY KEY (`id`),
      KEY `user_id` (`user_id`)
    ) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=71 ;
    
    CREATE TABLE IF NOT EXISTS `tags` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `bookmark_id` int(11) DEFAULT NULL,
      `user_id` int(11) NOT NULL,
      `tag` char(45) NOT NULL,
      PRIMARY KEY (`id`),
      KEY `bookmark_id` (`bookmark_id`)
    ) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;
    
    ALTER TABLE `bookmarks`
      ADD CONSTRAINT `user_id` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
    
    ALTER TABLE `tags`
      ADD CONSTRAINT `bookmark_id` FOREIGN KEY (`bookmark_id`) REFERENCES `bookmarks` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by compsci View Post
    However, the above returns bookmarks that do not even have the tag 'clean'?
    your ON clause is deficient

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Jun 2007
    Location
    London
    Posts
    117
    Quote Originally Posted by r937 View Post
    your ON clause is deficient
    Do you mean removing this:

    Code:
    ON b.bookmark_id
    I did this and I still get rows returned that do not have the tag 'clean'! Sorry if i misunderstood, but my sql skills are not very good.

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    take a closer look at the ON clause in post #4

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  10. #10
    Join Date
    Jun 2007
    Location
    London
    Posts
    117
    Quote Originally Posted by r937 View Post
    take a closer look at the ON clause in post #4


    Got it, it works great now. Thanks r937!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •