If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Lots of Joins: Speed Impact?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-07-10, 17:02
compsci compsci is offline
Registered User
 
Join Date: Jun 2007
Location: London
Posts: 117
Lots of Joins: Speed Impact?

Hello all,

I recently got advice from another forum when I had a query issue. I have opted for this solution primarily because I can loop through the join lines in my application logic so I can populate as many tag searches as the user wants to narrow down their search. Btw, I am building a sort of bookmarking service.

Code:
SELECT bookmarks.*
FROM bookmarks
JOIN tags AS T1 ON T1.bookmark_id = bookmarks.id AND T1.tag = 'clean'
JOIN tags AS T2 ON T2.bookmark_id = bookmarks.id AND T2.tag = 'simple'
I could potentially have something like this:

Code:
SELECT bookmarks.*
FROM bookmarks
JOIN tags AS T1 ON T1.bookmark_id = bookmarks.id AND T1.tag = 'clean'
JOIN tags AS T2 ON T2.bookmark_id = bookmarks.id AND T2.tag = 'simple'
JOIN tags AS T3 ON T3.bookmark_id = bookmarks.id AND T3.tag = 'clean'
JOIN mvv AS T4 ON T4.bookmark_id = bookmarks.id AND T4.mvs = '6'
JOIN mvv AS T5 ON T5.bookmark_id = bookmarks.id AND T5.mvs = '12'
JOIN mvv AS T6 ON T6.bookmark_id = bookmarks.id AND T6.mvs = '18'
I am getting worried this might be really in-efficient. I would appreciate more help on this or a better way of doing this if it needs improving and it most likely does!?!

I will only ever be joining no more than 4 tables but they can have lots of criteria.

Thanks all, I appreciate any help.
Reply With Quote
  #2 (permalink)  
Old 04-07-10, 21:15
guhongying guhongying is offline
Registered User
 
Join Date: Apr 2010
Posts: 5
if there is index on tags.tag and also index on mvv.mvs, and the duplication of these two columns are small, and there is index on bookmarks.id, I think this query is ok. Please check the execution plan, and do some test.
Reply With Quote
  #3 (permalink)  
Old 04-08-10, 06:03
compsci compsci is offline
Registered User
 
Join Date: Jun 2007
Location: London
Posts: 117
Quote:
Originally Posted by guhongying View Post
if there is index on tags.tag and also index on mvv.mvs, and the duplication of these two columns are small
Without an index does this query look inefficient to you - just the fact that I am using JOIN again and again on the same tables?

The columns tags.tag and mvv.mvs are likely to contain duplication as a bookmark can have many tags and each bookmark can have the same tag applied to it. However, there will not ever be a duplicate of tags for a particular bookmark.

Quote:
Originally Posted by guhongying View Post
Please check the execution plan, and do some test.
I am preparing test data to use the EXPLAIN, I am not very good with analysing the output of EXPLAIN!

Last edited by compsci; 04-08-10 at 06:08.
Reply With Quote
  #4 (permalink)  
Old 04-08-10, 08:47
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
Code:
SELECT bookmarks.*
  FROM bookmarks
INNER
  JOIN ( SELECT bookmark_id 
           FROM tags
          WHERE tag IN ( 'clean', 'simple', ) -- 2 tags listed
         GROUP
             BY bookmark_id 
         HAVING COUNT(*) = 2 ) AS b -- check 2 tags were found
    ON b.bookmark_id = bookmarks.id 
INNER
  JOIN ( SELECT bookmark_id 
           FROM mvv
          WHERE mvs IN ( '6', '12', '18' ) -- 3 mvs listed
         GROUP
             BY bookmark_id 
         HAVING COUNT(*) = 3 ) AS m -- check 3 mvs were found
    ON m.bookmark_id = bookmarks.id
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 04-08-10, 12:19
rampurhaat rampurhaat is offline
Registered User
 
Join Date: Jul 2009
Location: New Delhi
Posts: 2
One great way to JOIN while maintaining efficiency would be if you order your join for minimization of Cartesian Product result size, if you have a ballpark expectation of the data size of the database.

Think of it - what is a JOIN in terms of fundamental Relational Algebra? It is Cartesian Product, followed by Select (Sigma, not the RDBMS SELECT) followed by Project (Pi).

So if the Cartesian Product size is small all through, the Sigma operation and in fact the Pi operation too becomes faster.

How to minimize the size of the Cartesian Product result? You can do it only if you have an approximate idea of your table sizes. Order the JOIN operations in a way that you minimize this.

If you do not have an idea of the expected sizes, then this is not going to work in practice. All of it is just a practical approach (engineering approach), without any theoretical bound.
Reply With Quote
  #6 (permalink)  
Old 04-08-10, 12:23
compsci compsci is offline
Registered User
 
Join Date: Jun 2007
Location: London
Posts: 117
Hi r937,

I tried this query which is basically your query but looking for matching one tag:

Code:
SELECT bookmarks.* FROM bookmarks INNER JOIN
(SELECT bookmark_id FROM tags WHERE tag IN ('clean') 
GROUP BY bookmark_id HAVING COUNT(*) = 1) 
AS b ON b.bookmark_id
However, the above returns bookmarks that do not even have the tag 'clean'? Here is my DB schema:

Code:
CREATE TABLE IF NOT EXISTS `bookmarks` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` char(250) NOT NULL,
  `link` text NOT NULL,
  `date_added` datetime NOT NULL,
  `privacy_type` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=71 ;

CREATE TABLE IF NOT EXISTS `tags` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `bookmark_id` int(11) DEFAULT NULL,
  `user_id` int(11) NOT NULL,
  `tag` char(45) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `bookmark_id` (`bookmark_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;

ALTER TABLE `bookmarks`
  ADD CONSTRAINT `user_id` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;

ALTER TABLE `tags`
  ADD CONSTRAINT `bookmark_id` FOREIGN KEY (`bookmark_id`) REFERENCES `bookmarks` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
Reply With Quote
  #7 (permalink)  
Old 04-08-10, 12:28
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
Quote:
Originally Posted by compsci View Post
However, the above returns bookmarks that do not even have the tag 'clean'?
your ON clause is deficient

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #8 (permalink)  
Old 04-08-10, 12:37
compsci compsci is offline
Registered User
 
Join Date: Jun 2007
Location: London
Posts: 117
Quote:
Originally Posted by r937 View Post
your ON clause is deficient
Do you mean removing this:

Code:
ON b.bookmark_id
I did this and I still get rows returned that do not have the tag 'clean'! Sorry if i misunderstood, but my sql skills are not very good.
Reply With Quote
  #9 (permalink)  
Old 04-08-10, 12:42
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
take a closer look at the ON clause in post #4

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #10 (permalink)  
Old 04-08-10, 13:00
compsci compsci is offline
Registered User
 
Join Date: Jun 2007
Location: London
Posts: 117
Quote:
Originally Posted by r937 View Post
take a closer look at the ON clause in post #4


Got it, it works great now. Thanks r937!
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On