Results 1 to 9 of 9
  1. #1
    Join Date
    Jul 2007
    Posts
    10

    Unanswered: Searching for logical combination of tags

    I have a database where video is tagged with tags. It has three tables: a Videos table with id and other information, a Tags table with id and tag, and I have a join table Videos_Tags with video_id and tag_id which connects videos to tags and vice versa.

    I want to offer the users of a web interface the possibility to search for complex combinations of tags, for example:

    'romantic'
    --AND
    'short'
    --OR
    'action'
    --AND
    'long'

    or any other combination.
    So also (('romantic' OR 'short') AND 'action') should be possible.

    I need to dynamically build the sql query which returns the right videos.

    for the first case, I suppose the query has to be like this:

    select * from videos
    JOIN Videos_Tags a on videos.id = a.video_id
    JOIN Videos_Tags b on videos.id = b.video_id
    JOIN Tags x on a.tag_id = x.id
    JOIN Tags y on b.tag_id = y.id
    WHERE
    x.tag = 'romantic' AND y.tag = 'short'
    OR
    x.tag = 'action' AND y.tag = 'long'

    but for the second query:

    select * from videos
    JOIN Videos_Tags a on videos.id = a.video_id
    JOIN Videos_Tags b on videos.id = b.video_id
    JOIN Tags x on a.tag_id = x.id
    OIN Tags y on b.tag_id = y.id
    WHERE
    (x.tag = 'romantic' OR x.tag = 'short')
    AND
    y.tag = 'action'


    I'm having a hard time building these queries dynamically. If I have three tags combined with AND, I have to join three times, but it gets complicated when the logical tree gets more complicated.
    Are there some general rules here? I suppose there are but I didn't really find something.

    I thought it would be a common problem but it seems like most 'solutions' just use a simple 'IN' or even store their tags as a comma-separated string in their main table...

    Thanks!

    --edit

    I think I got it!

    'romantic'
    --AND
    'short'
    --OR
    'action'
    --AND
    'long'

    results in a tree like this:
    _________________OR
    _______________/_____\
    ____________AND_____AND
    _________/______\____/___\
    _______rom._____sh._act.__long

    I have to start at the root of the tree. If it's an OR I give every child the same letter, e.g. 'A'.
    If it's an AND I give every child a different letter, e.g 'A', 'B' and 'C'
    This has to be done recursive, and every time the newly passed letter has to be added by the letter the child already had. This means that in the example above the AND and AND each get letter 'A', 'romantic' gets 'AA', 'short' gets 'AB', 'action' gets 'AA' and 'long' gets 'AB'.

    Any comments?
    Last edited by CyberMonkey; 04-29-08 at 19:50.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    the following solutions assume that you cannot assign the same tag to the same video more than once -- the easiest way to do this is to declare the 2 FKs in the Videos_Tags table as the PK

    ( 'romantic' AND 'short' ) OR ( 'action' AND 'long' ) --
    Code:
    SELECT videos.id
         , videos.otherstuff 
      FROM Tags
    INNER
      JOIN Videos_Tags 
        ON Videos_Tags.tag_id = Tags.id
    INNER
      JOIN videos
        ON videos.id = Videos_Tags.video_id
     WHERE Tags.tag IN ( 'romantic', 'short'
                       , 'action', 'long' )
    GROUP
        BY videos.id
    HAVING SUM(CASE WHEN Tags.tag IN ( 'romantic', 'short' )
                    THEN 1 ELSE 0 END ) = 2
        OR SUM(CASE WHEN Tags.tag IN ( 'action', 'long' )
                    THEN 1 ELSE 0 END ) = 2
    i'm sure you could come up with other combinations, and they all have similar, simple solutions

    note: in mysql, you are allowed to have other columns in the SELECT that aren't mentioned in the GROUP BY

    in other database systems, you would have to include them in the GROUP BY (which might be problematic if they include a BLOB column, but you can simply push everything down one subquery level)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jul 2007
    Posts
    10
    That's pretty nice, I would never have thought of that. Thanks a lot!!

    It's indeed a suitable and performant solution for searching for tags. However, I wonder what I have to do if I also want to include where clauses for columns of Videos?

    Let's say I want a video with

    (tag='romantic' AND tag='short') or (videos.title LIKE 'The%' AND tag='kurosawa')

    Is this possible with this solution with GROUP?

    Thanks!

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Code:
    ...
     WHERE Tags.tag IN ( 'romantic', 'short' , 'kurosawa' )
    ...
    HAVING SUM(CASE WHEN Tags.tag IN ( 'romantic', 'short' )
                    THEN 1 ELSE 0 END ) = 2
        OR SUM(CASE WHEN Tags.tag = 'kurosawa'
                      OR videos.title LIKE 'The%' 
                    THEN 1 ELSE 0 END ) = 2
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Jul 2007
    Posts
    10
    Fantastic! Thanks

  6. #6
    Join Date
    Jul 2007
    Posts
    10
    I just tried your last solution. Is it possible that when I have a video with id = 2 and title 'The seven samurai' and with 5 different tags, the solution is not correct?

    This is because the join creates five lines with videos.id = 2, each with a different tag but all with the same title.

    However, if I change the OR to AND and the outcome of the sum to 1, it works.

    Code:
    ...
     WHERE Tags.tag IN ( 'romantic', 'short' , 'kurosawa' )
    ...
    HAVING SUM(CASE WHEN Tags.tag IN ( 'romantic', 'short' )
                    THEN 1 ELSE 0 END ) = 2
        OR SUM(CASE WHEN Tags.tag = 'kurosawa'
                      AND videos.title LIKE 'The%' 
                    THEN 1 ELSE 0 END ) = 1
    This kind of complicates the building of the query I suppose. I have to count the number of tags in a SUM to know what the sum has to be equal to.
    It seems to me my initial solution with joins is more elegant as it is easier to implement the logical hierararchy, although the performance will be much worse.

    What do you think?

    Thanks again

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by CyberMonkey
    I just tried your last solution. Is it possible that when I have a video with id = 2 and title 'The seven samurai' and with 5 different tags, the solution is not correct?
    it's possible -- which 5 tags does it have?


    Quote Originally Posted by CyberMonkey
    I have to count the number of tags in a SUM to know what the sum has to be equal to.
    and the difficulty with a correlated subquery in the HAVING clause is... ?


    Quote Originally Posted by CyberMonkey
    It seems to me my initial solution with joins is more elegant as it is easier to implement the logical hierararchy, although the performance will be much worse.
    okay, go ahead, enjoy yourself, join as many times as you want

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Jul 2007
    Posts
    10
    Quote Originally Posted by r937
    and the difficulty with a correlated subquery in the HAVING clause is... ?
    That's not difficult indeed, the problem is the code which makes it happen.
    Not only the query has to work, but the query has to be build-able with code which is robust and elegant and I'm afraid that won't be the case with the group-by.

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    the GROUP BY query is simple and elegant, and involves only counting in the HAVING clause

    the join approach has a variable number of joins based on the problem, and if you are more comfortable coding that, then there you have your answer!!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •