| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

04-07-10, 17:02
|
|
Registered User
|
|
Join Date: Jun 2007
Location: London
Posts: 117
|
|
|
Lots of Joins: Speed Impact?
|
|
Hello all,
I recently got advice from another forum when I had a query issue. I have opted for this solution primarily because I can loop through the join lines in my application logic so I can populate as many tag searches as the user wants to narrow down their search. Btw, I am building a sort of bookmarking service.
Code:
SELECT bookmarks.*
FROM bookmarks
JOIN tags AS T1 ON T1.bookmark_id = bookmarks.id AND T1.tag = 'clean'
JOIN tags AS T2 ON T2.bookmark_id = bookmarks.id AND T2.tag = 'simple'
I could potentially have something like this:
Code:
SELECT bookmarks.*
FROM bookmarks
JOIN tags AS T1 ON T1.bookmark_id = bookmarks.id AND T1.tag = 'clean'
JOIN tags AS T2 ON T2.bookmark_id = bookmarks.id AND T2.tag = 'simple'
JOIN tags AS T3 ON T3.bookmark_id = bookmarks.id AND T3.tag = 'clean'
JOIN mvv AS T4 ON T4.bookmark_id = bookmarks.id AND T4.mvs = '6'
JOIN mvv AS T5 ON T5.bookmark_id = bookmarks.id AND T5.mvs = '12'
JOIN mvv AS T6 ON T6.bookmark_id = bookmarks.id AND T6.mvs = '18'
I am getting worried this might be really in-efficient. I would appreciate more help on this or a better way of doing this if it needs improving and it most likely does!?!
I will only ever be joining no more than 4 tables but they can have lots of criteria.
Thanks all, I appreciate any help.
|
|

04-07-10, 21:15
|
|
Registered User
|
|
Join Date: Apr 2010
Posts: 5
|
|
if there is index on tags.tag and also index on mvv.mvs, and the duplication of these two columns are small, and there is index on bookmarks.id, I think this query is ok. Please check the execution plan, and do some test.
|
|

04-08-10, 06:03
|
|
Registered User
|
|
Join Date: Jun 2007
Location: London
Posts: 117
|
|
|
|
Quote:
Originally Posted by guhongying
if there is index on tags.tag and also index on mvv.mvs, and the duplication of these two columns are small
|
Without an index does this query look inefficient to you - just the fact that I am using JOIN again and again on the same tables?
The columns tags.tag and mvv.mvs are likely to contain duplication as a bookmark can have many tags and each bookmark can have the same tag applied to it. However, there will not ever be a duplicate of tags for a particular bookmark.
Quote:
Originally Posted by guhongying
Please check the execution plan, and do some test.
|
I am preparing test data to use the EXPLAIN, I am not very good with analysing the output of EXPLAIN!
|
Last edited by compsci; 04-08-10 at 06:08.
|

04-08-10, 08:47
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
Code:
SELECT bookmarks.*
FROM bookmarks
INNER
JOIN ( SELECT bookmark_id
FROM tags
WHERE tag IN ( 'clean', 'simple', ) -- 2 tags listed
GROUP
BY bookmark_id
HAVING COUNT(*) = 2 ) AS b -- check 2 tags were found
ON b.bookmark_id = bookmarks.id
INNER
JOIN ( SELECT bookmark_id
FROM mvv
WHERE mvs IN ( '6', '12', '18' ) -- 3 mvs listed
GROUP
BY bookmark_id
HAVING COUNT(*) = 3 ) AS m -- check 3 mvs were found
ON m.bookmark_id = bookmarks.id
|
|

04-08-10, 12:19
|
|
Registered User
|
|
Join Date: Jul 2009
Location: New Delhi
Posts: 2
|
|
One great way to JOIN while maintaining efficiency would be if you order your join for minimization of Cartesian Product result size, if you have a ballpark expectation of the data size of the database.
Think of it - what is a JOIN in terms of fundamental Relational Algebra? It is Cartesian Product, followed by Select (Sigma, not the RDBMS SELECT) followed by Project (Pi).
So if the Cartesian Product size is small all through, the Sigma operation and in fact the Pi operation too becomes faster.
How to minimize the size of the Cartesian Product result? You can do it only if you have an approximate idea of your table sizes. Order the JOIN operations in a way that you minimize this.
If you do not have an idea of the expected sizes, then this is not going to work in practice. All of it is just a practical approach (engineering approach), without any theoretical bound.
|
|

04-08-10, 12:23
|
|
Registered User
|
|
Join Date: Jun 2007
Location: London
Posts: 117
|
|
Hi r937,
I tried this query which is basically your query but looking for matching one tag:
Code:
SELECT bookmarks.* FROM bookmarks INNER JOIN
(SELECT bookmark_id FROM tags WHERE tag IN ('clean')
GROUP BY bookmark_id HAVING COUNT(*) = 1)
AS b ON b.bookmark_id
However, the above returns bookmarks that do not even have the tag 'clean'? Here is my DB schema:
Code:
CREATE TABLE IF NOT EXISTS `bookmarks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` char(250) NOT NULL,
`link` text NOT NULL,
`date_added` datetime NOT NULL,
`privacy_type` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=71 ;
CREATE TABLE IF NOT EXISTS `tags` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`bookmark_id` int(11) DEFAULT NULL,
`user_id` int(11) NOT NULL,
`tag` char(45) NOT NULL,
PRIMARY KEY (`id`),
KEY `bookmark_id` (`bookmark_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;
ALTER TABLE `bookmarks`
ADD CONSTRAINT `user_id` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
ALTER TABLE `tags`
ADD CONSTRAINT `bookmark_id` FOREIGN KEY (`bookmark_id`) REFERENCES `bookmarks` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
|
|

04-08-10, 12:28
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
Quote:
Originally Posted by compsci
However, the above returns bookmarks that do not even have the tag 'clean'?
|
your ON clause is deficient

|
|

04-08-10, 12:37
|
|
Registered User
|
|
Join Date: Jun 2007
Location: London
Posts: 117
|
|
Quote:
Originally Posted by r937
your ON clause is deficient
|
Do you mean removing this:
I did this and I still get rows returned that do not have the tag 'clean'! Sorry if i misunderstood, but my sql skills are not very good.
|
|

04-08-10, 12:42
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
take a closer look at the ON clause in post #4

|
|

04-08-10, 13:00
|
|
Registered User
|
|
Join Date: Jun 2007
Location: London
Posts: 117
|
|
Quote:
Originally Posted by r937
take a closer look at the ON clause in post #4
|

Got it, it works great now. Thanks r937!
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|