Results 1 to 12 of 12
  1. #1
    Join Date
    Mar 2008
    Posts
    11

    Unanswered: subquery question

    here is a pretty complicated query...


    Code:
    
    SELECT *
    FROM table_categories 
    JOIN table_files 
    ON table_categories.file_id = table_files.file_id 
    
    WHERE table_categories.unique_id IN ( 
    	( SELECT i2t1.object_id 
    		FROM table_tags t0 
    		CROSS JOIN table_tags t1 
    		CROSS JOIN table_tags t2 
    		INNER JOIN table_tagged_objects i2t0 
    		ON t0.id = i2t0.tag_id 
    		INNER JOIN table_tagged_objects i2t1 
    		ON i2t0.object_id = i2t1.object_id 
    		AND i2t1.tag_id = t1.id 
    		LEFT JOIN table_tagged_objects i2t2 
    		ON i2t0.object_id = i2t2.object_id 
    		AND i2t1.object_id = i2t2.object_id 
    		AND i2t2.tag_id= t2.id WHERE t0.tag = 'fruit' 
    		AND t1.tag = 'appl' 
    		AND t2.tag = 'green' 
    		AND i2t2.object_id IS NULL) 
    	) 
    		
    AND table_categories.display_type='1' 
    AND (table_categories.visible=1 OR table_categories.author_id='26') 
    ORDER BY table_categories.title ASC 
    LIMIT 0,24
    the subquery is a search for apple+fruit, exclude green. The outer query is selecting from a table of records and joining onto it (the pictures).

    Is it really inefficient to use a subquery in IN () ? I am not sure if I have too many options since the search query is so complicated. Any insight would be extremely helpful-

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by morguefile
    Is it really inefficient to use a subquery in IN () ?
    in general, no, not at all

    however, your CROSS CROSS INNER INNER LEFT looks hellish inefficient
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Mar 2008
    Posts
    11
    I don't think it is (maybe I am wrong), I am using and example from mysql. (slide 21)

    http://www.slideshare.net/jonathanba...em-with-mysql/

    The idea is using a group / count method would have been inefficient because it has to use a temporary table, while this method does not.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    there is one way to see if it is inefficient or not: run an EXPLAIN on it
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Mar 2008
    Posts
    11
    I did- it doesn't use temporary file

    http://screencast.com/t/k5Z3MkIdLN

    But what I was reading was that since the subquery is in an IN(...) it would be doing the subquery (A*B) rather than (A+B). Not sure if that is whats happening

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    what about this:
    Code:
    SELECT table_tagged_objects.object_id 
      FROM table_tags 
    INNER
      JOIN table_tagged_objects
        ON table_tagged_objects.tag_id = table_tags.id
     WHERE table_tags.tag IN ( 'fruit','apple','green' )
    GROUP
        BY table_tagged_objects.object_id
    HAVING COUNT(CASE WHEN table_tags.tag = 'fruit'
                      THEN 1 END ) > 0
       AND COUNT(CASE WHEN table_tags.tag = 'apple'
                      THEN 1 END ) > 0
       AND COUNT(CASE WHEN table_tags.tag = 'green'
                      THEN 1 END ) = 0
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Mar 2008
    Posts
    11
    sorry went on vacation. I tried the above statement but it doesn't work since it uses temporary file. Anytime it has to use count it becomes very inefficient. I thought I had it but after some testing, this still doesn't work for unison searches. (OR)


    SELECT i2t1.object_id
    FROM table_tags t0
    CROSS JOIN table_tags t1
    CROSS JOIN table_tags t2
    INNER JOIN table_tagged_objects i2t0
    ON t0.id = i2t0.tag_id
    INNER JOIN table_tagged_objects i2t1
    ON i2t0.object_id = i2t1.object_id
    OR i2t1.tag_id = t1.id LEFT
    JOIN table_tagged_objects i2t2
    ON i2t0.object_id = i2t2.object_id
    AND i2t1.object_id = i2t2.object_id
    AND i2t2.tag_id= t2.id WHERE t0.tag = 'fruit'
    AND t1.tag = 'appl'
    AND t2.tag = 'green'
    AND i2t2.object_id IS NULL

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by morguefile
    ... but it doesn't work since it uses temporary file. Anytime it has to use count it becomes very inefficient.
    "doesn't work" <> "becomes inefficient"

    besides being slow, does it actually run? does it actually produce the correct results?

    if so, then optimizing it is a secondary step -- the first step is getting correct results

    what indexes have you defined on your tables?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Mar 2008
    Posts
    11
    it did work, using explain produced...

    Using where; Using temporary; Using filesort

    The results showed results of apples that were green. Rather than all fruits. (not that that is wrong)

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    actually, it is wrong

    the results should not have any green objects -- check the HAVING clause again

    and what indexes have you defined on your tables?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Mar 2008
    Posts
    11
    oops - sorry you are right it returned only red apples.


    CREATE TABLE IF NOT EXISTS `table_tagged_objects` (
    `tag_id` int(10) unsigned NOT NULL default '0',
    `tagger_id` int(10) unsigned NOT NULL default '0',
    `object_id` int(10) unsigned NOT NULL default '0',
    `tagged_on` datetime NOT NULL default '0000-00-00 00:00:00',
    `file_type` tinyint(1) NOT NULL default '0',
    `display_type` int(3) NOT NULL,
    PRIMARY KEY (`tag_id`,`tagger_id`,`object_id`),
    KEY `tag_id_index` (`tag_id`),
    KEY `tagger_id_index` (`tagger_id`),
    KEY `object_id_index` (`object_id`),
    KEY `display_type` (`display_type`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;


    CREATE TABLE IF NOT EXISTS `table_tags` (
    `id` int(10) unsigned NOT NULL auto_increment,
    `tag` varchar(30) character set latin1 NOT NULL default '',
    `raw_tag` varchar(50) character set latin1 NOT NULL default '',
    PRIMARY KEY (`id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=13 ;

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    i'm not the best guy to discuss optimization strategy, but it seems to me that the tagger_id in the middle of the 3-part index is what's keeping that index from being used properly

    you could also use an index on (tag,id)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •