If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > ANSI SQL > Searching for logical combination of tags

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-29-08, 18:25
CyberMonkey CyberMonkey is offline
Registered User
 
Join Date: Jul 2007
Posts: 10
Searching for logical combination of tags

I have a database where video is tagged with tags. It has three tables: a Videos table with id and other information, a Tags table with id and tag, and I have a join table Videos_Tags with video_id and tag_id which connects videos to tags and vice versa.

I want to offer the users of a web interface the possibility to search for complex combinations of tags, for example:

'romantic'
--AND
'short'
--OR
'action'
--AND
'long'

or any other combination.
So also (('romantic' OR 'short') AND 'action') should be possible.

I need to dynamically build the sql query which returns the right videos.

for the first case, I suppose the query has to be like this:

select * from videos
JOIN Videos_Tags a on videos.id = a.video_id
JOIN Videos_Tags b on videos.id = b.video_id
JOIN Tags x on a.tag_id = x.id
JOIN Tags y on b.tag_id = y.id
WHERE
x.tag = 'romantic' AND y.tag = 'short'
OR
x.tag = 'action' AND y.tag = 'long'

but for the second query:

select * from videos
JOIN Videos_Tags a on videos.id = a.video_id
JOIN Videos_Tags b on videos.id = b.video_id
JOIN Tags x on a.tag_id = x.id
OIN Tags y on b.tag_id = y.id
WHERE
(x.tag = 'romantic' OR x.tag = 'short')
AND
y.tag = 'action'


I'm having a hard time building these queries dynamically. If I have three tags combined with AND, I have to join three times, but it gets complicated when the logical tree gets more complicated.
Are there some general rules here? I suppose there are but I didn't really find something.

I thought it would be a common problem but it seems like most 'solutions' just use a simple 'IN' or even store their tags as a comma-separated string in their main table...

Thanks!

--edit

I think I got it!

'romantic'
--AND
'short'
--OR
'action'
--AND
'long'

results in a tree like this:
_________________OR
_______________/_____\
____________AND_____AND
_________/______\____/___\
_______rom._____sh._act.__long

I have to start at the root of the tree. If it's an OR I give every child the same letter, e.g. 'A'.
If it's an AND I give every child a different letter, e.g 'A', 'B' and 'C'
This has to be done recursive, and every time the newly passed letter has to be added by the letter the child already had. This means that in the example above the AND and AND each get letter 'A', 'romantic' gets 'AA', 'short' gets 'AB', 'action' gets 'AA' and 'long' gets 'AB'.

Any comments?

Last edited by CyberMonkey; 04-29-08 at 18:50.
Reply With Quote
  #2 (permalink)  
Old 04-29-08, 19:10
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
the following solutions assume that you cannot assign the same tag to the same video more than once -- the easiest way to do this is to declare the 2 FKs in the Videos_Tags table as the PK

( 'romantic' AND 'short' ) OR ( 'action' AND 'long' ) --
Code:
SELECT videos.id
     , videos.otherstuff 
  FROM Tags
INNER
  JOIN Videos_Tags 
    ON Videos_Tags.tag_id = Tags.id
INNER
  JOIN videos
    ON videos.id = Videos_Tags.video_id
 WHERE Tags.tag IN ( 'romantic', 'short'
                   , 'action', 'long' )
GROUP
    BY videos.id
HAVING SUM(CASE WHEN Tags.tag IN ( 'romantic', 'short' )
                THEN 1 ELSE 0 END ) = 2
    OR SUM(CASE WHEN Tags.tag IN ( 'action', 'long' )
                THEN 1 ELSE 0 END ) = 2
i'm sure you could come up with other combinations, and they all have similar, simple solutions

note: in mysql, you are allowed to have other columns in the SELECT that aren't mentioned in the GROUP BY

in other database systems, you would have to include them in the GROUP BY (which might be problematic if they include a BLOB column, but you can simply push everything down one subquery level)
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 04-29-08, 19:33
CyberMonkey CyberMonkey is offline
Registered User
 
Join Date: Jul 2007
Posts: 10
That's pretty nice, I would never have thought of that. Thanks a lot!!

It's indeed a suitable and performant solution for searching for tags. However, I wonder what I have to do if I also want to include where clauses for columns of Videos?

Let's say I want a video with

(tag='romantic' AND tag='short') or (videos.title LIKE 'The%' AND tag='kurosawa')

Is this possible with this solution with GROUP?

Thanks!
Reply With Quote
  #4 (permalink)  
Old 04-29-08, 19:38
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Code:
...
 WHERE Tags.tag IN ( 'romantic', 'short' , 'kurosawa' )
...
HAVING SUM(CASE WHEN Tags.tag IN ( 'romantic', 'short' )
                THEN 1 ELSE 0 END ) = 2
    OR SUM(CASE WHEN Tags.tag = 'kurosawa'
                  OR videos.title LIKE 'The%' 
                THEN 1 ELSE 0 END ) = 2
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 04-29-08, 19:41
CyberMonkey CyberMonkey is offline
Registered User
 
Join Date: Jul 2007
Posts: 10
Fantastic! Thanks
Reply With Quote
  #6 (permalink)  
Old 04-30-08, 05:06
CyberMonkey CyberMonkey is offline
Registered User
 
Join Date: Jul 2007
Posts: 10
I just tried your last solution. Is it possible that when I have a video with id = 2 and title 'The seven samurai' and with 5 different tags, the solution is not correct?

This is because the join creates five lines with videos.id = 2, each with a different tag but all with the same title.

However, if I change the OR to AND and the outcome of the sum to 1, it works.

Code:
...
 WHERE Tags.tag IN ( 'romantic', 'short' , 'kurosawa' )
...
HAVING SUM(CASE WHEN Tags.tag IN ( 'romantic', 'short' )
                THEN 1 ELSE 0 END ) = 2
    OR SUM(CASE WHEN Tags.tag = 'kurosawa'
                  AND videos.title LIKE 'The%' 
                THEN 1 ELSE 0 END ) = 1
This kind of complicates the building of the query I suppose. I have to count the number of tags in a SUM to know what the sum has to be equal to.
It seems to me my initial solution with joins is more elegant as it is easier to implement the logical hierararchy, although the performance will be much worse.

What do you think?

Thanks again
Reply With Quote
  #7 (permalink)  
Old 04-30-08, 06:37
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by CyberMonkey
I just tried your last solution. Is it possible that when I have a video with id = 2 and title 'The seven samurai' and with 5 different tags, the solution is not correct?
it's possible -- which 5 tags does it have?


Quote:
Originally Posted by CyberMonkey
I have to count the number of tags in a SUM to know what the sum has to be equal to.
and the difficulty with a correlated subquery in the HAVING clause is... ?


Quote:
Originally Posted by CyberMonkey
It seems to me my initial solution with joins is more elegant as it is easier to implement the logical hierararchy, although the performance will be much worse.
okay, go ahead, enjoy yourself, join as many times as you want

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #8 (permalink)  
Old 04-30-08, 07:15
CyberMonkey CyberMonkey is offline
Registered User
 
Join Date: Jul 2007
Posts: 10
Quote:
Originally Posted by r937
and the difficulty with a correlated subquery in the HAVING clause is... ?
That's not difficult indeed, the problem is the code which makes it happen.
Not only the query has to work, but the query has to be build-able with code which is robust and elegant and I'm afraid that won't be the case with the group-by.
Reply With Quote
  #9 (permalink)  
Old 04-30-08, 07:28
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
the GROUP BY query is simple and elegant, and involves only counting in the HAVING clause

the join approach has a variable number of joins based on the problem, and if you are more comfortable coding that, then there you have your answer!!
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On