If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Multiple rows returned in OR query due to cartesian product

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-29-09, 11:46
raymon raymon is offline
Registered User
 
Join Date: Apr 2009
Posts: 4
Multiple rows returned in OR query due to cartesian product

Hi,

sorry to post such a basic question, but I've been all day trying to solve this (simple) problem with no success, and I don't know the exact wording I have to use in Google to find a solution. I also tried on the MySQL.com forum but didn't get any answer.

I've made up an example so my problem is easy to understand...

I have the following tables:

mysql> select * from food;
+----+--------+
| id | name |
+----+--------+
| 1 | orange |
| 2 | meat |
| 3 | apple |
| 4 | water |
+----+--------+

mysql> select * from calories;
+--------+------+
| foodId | Kcal |
+--------+------+
| 1 | 10 |
| 2 | 75 |
| 3 | 5 |
| 4 | 0 |
+--------+------+

mysql> select * from similar;
+----------+----------+------------+
| foodId_A | foodId_B | similarity |
+----------+----------+------------+
| 1 | 2 | 10 |
| 1 | 3 | 80 |
| 3 | 2 | 15 |
| 1 | 4 | 5 |
| 4 | 2 | 5 |
| 3 | 4 | 5 |
+----------+----------+------------+

(in this table, the pairs (foodId_A, foodId_B) do not follow any rules in terms of which food is A or B. Therefore, we will have to test whether our food of interest is foodId_A or foodId_B)


Now, I want to get all foods that are either similar to an orange (similarity>50) or have a lot of calories (Kcal>70). That is, I want to get any food that respects at least one of the two conditions: similar to an orange ( with id=1) or high calory count.

The answer in this simple example should be 'apple' (it is similar to an orange) and 'meat' (it has lots of calories).

When I do the query...

select * from food, similar, calories where (calories.foodId=food.id and calories.Kcal>50) or ((similar.foodId_A=food.id and similar.foodId_B=1 and similar.similarity>50) or (similar.foodId_B=food.id and similar.foodId_A=1 and similar.similarity>50) );

... I get:

+----+-------+----------+----------+------------+--------+------+
| id | name | foodId_A | foodId_B | similarity | foodId | Kcal |
+----+-------+----------+----------+------------+--------+------+
| 2 | meat | 1 | 2 | 10 | 2 | 75 |
| 3 | apple | 1 | 3 | 80 | 1 | 10 |
| 2 | meat | 1 | 3 | 80 | 2 | 75 |
| 3 | apple | 1 | 3 | 80 | 2 | 75 |
| 3 | apple | 1 | 3 | 80 | 3 | 5 |
| 3 | apple | 1 | 3 | 80 | 4 | 0 |
| 2 | meat | 3 | 2 | 15 | 2 | 75 |
| 2 | meat | 1 | 4 | 5 | 2 | 75 |
| 2 | meat | 4 | 2 | 5 | 2 | 75 |
| 2 | meat | 3 | 4 | 5 | 2 | 75 |
+----+-------+----------+----------+------------+--------+------+

which is the 'apple', 'meat' answer I was expecting (if applying distinct to the select). But this is clearly not efficient, since in a real database there are so many records that I am getting thousands of duplicate rows and it takes forever to get an answer.

I understand this is happening because mysql is doing a cartesian product between table similarities (when applying the condition on calories) and table calories (when applying the condition on similarity). But I don't know how to solve this...

What else do I have to include in my query so that those duplicate rows do not come back as an answer?

Thank you very much for your help,

Ramon
Reply With Quote
  #2 (permalink)  
Old 04-29-09, 12:16
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
Code:
CREATE VIEW fud
  AS
SELECT f.id
     , f.name
     , c.kcal
FROM   @food As f
 INNER
  JOIN @calories As c
    ON f.id = c.foodid
Code:
CREATE VIEW sim
  AS
SELECT foodid_a
     , foodid_b
     , similarity
FROM   @similar
UNION ALL
SELECT foodid_b
     , foodid_a
     , similarity
FROM   @similar
Code:
SELECT a.id
     , a.name
     , a.kcal
     , b.id
     , b.name
     , b.kcal
     , s.similarity
FROM   sim As s
 INNER
  JOIN fud As a
    ON a.id = s.foodid_a
 INNER
  JOIN fud As b
    ON b.id = s.foodid_b
WHERE  a.name = 'orange'
AND    ( s.similarity > 50
      OR b.kcal > 70 )
__________________
George
Twitter | Blog
Reply With Quote
  #3 (permalink)  
Old 04-29-09, 12:33
raymon raymon is offline
Registered User
 
Join Date: Apr 2009
Posts: 4
Thank you, George!

Is this the only way to do it? Can it be done without creating the views in one single query? In this simple example creating the views is not a problem, but in my real scenario, creating a view for each case would be a nightmare...
Reply With Quote
  #4 (permalink)  
Old 04-29-09, 12:53
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
The views are not essential - you can change them in to derived tables if required e.g.
Code:
SELECT a.id
     , a.name
     , a.kcal
     , b.id
     , b.name
     , b.kcal
     , s.similarity
FROM   (
        SELECT foodid_a
             , foodid_b
             , similarity
        FROM   @similar
        UNION ALL
        SELECT foodid_b
             , foodid_a
             , similarity
        FROM   @similar
       ) As s
 INNER
  JOIN
...
But the code gets pretty messy.

I'm sure someone else will be along soon to tell us that my solution is needlessly complicated anyway
__________________
George
Twitter | Blog
Reply With Quote
  #5 (permalink)  
Old 04-30-09, 03:44
raymon raymon is offline
Registered User
 
Join Date: Apr 2009
Posts: 4
Thank you again. I'll try this approach, unless somebody reading this post tells me there is a less complicated solution.
Reply With Quote
  #6 (permalink)  
Old 05-01-09, 13:55
raymon raymon is offline
Registered User
 
Join Date: Apr 2009
Posts: 4
Well... After asking everywhere, I finally found the answer by myself. I think this is the most optimal way to do the query I was looking for. If someone thinks there is a better answer, I'll be very happy to hear about it

SELECT *

FROM calories JOIN food LEFT JOIN similar ON

((similar.foodId_A=food.id and similar.foodId_B=1 and food.id=calories.foodId)

or

(similar.foodId_B=food.id and similar.foodId_A=1 and food.id=calories.foodId))


WHERE (Kcal>50 or similarity>50) and similarity IS NOT NULL;
Reply With Quote
  #7 (permalink)  
Old 05-08-09, 08:58
ashish_mat1979 ashish_mat1979 is offline
Registered User
 
Join Date: Aug 2005
Posts: 30
Below query can solve your problem and might be faster:

SELECT DISTINCT f.name FROM similar s LEFT JOIN food f ON f.id=(if(foodId_A=1,foodId_B,foodId_A))
where (foodId_A=1 or foodId_B=1) and similarity>50
UNION DISTINCT
SELECT f1.name FROM calories c1 LEFT JOIN food f1 ON c1.foodId=f1.id where c1.Kcal>70
__________________
Ashish
Entertainment Overloaded
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On