Unanswered: Multiple rows returned in OR query due to cartesian product
sorry to post such a basic question, but I've been all day trying to solve this (simple) problem with no success, and I don't know the exact wording I have to use in Google to find a solution. I also tried on the MySQL.com forum but didn't get any answer.
I've made up an example so my problem is easy to understand...
I have the following tables:
mysql> select * from food;
| id | name |
| 1 | orange |
| 2 | meat |
| 3 | apple |
| 4 | water |
(in this table, the pairs (foodId_A, foodId_B) do not follow any rules in terms of which food is A or B. Therefore, we will have to test whether our food of interest is foodId_A or foodId_B)
Now, I want to get all foods that are either similar to an orange (similarity>50) or have a lot of calories (Kcal>70). That is, I want to get any food that respects at least one of the two conditions: similar to an orange ( with id=1) or high calory count.
The answer in this simple example should be 'apple' (it is similar to an orange) and 'meat' (it has lots of calories).
When I do the query...
select * from food, similar, calories where (calories.foodId=food.id and calories.Kcal>50) or ((similar.foodId_A=food.id and similar.foodId_B=1 and similar.similarity>50) or (similar.foodId_B=food.id and similar.foodId_A=1 and similar.similarity>50) );
which is the 'apple', 'meat' answer I was expecting (if applying distinct to the select). But this is clearly not efficient, since in a real database there are so many records that I am getting thousands of duplicate rows and it takes forever to get an answer.
I understand this is happening because mysql is doing a cartesian product between table similarities (when applying the condition on calories) and table calories (when applying the condition on similarity). But I don't know how to solve this...
What else do I have to include in my query so that those duplicate rows do not come back as an answer?
CREATE VIEW fud
FROM @food As f
JOIN @calories As c
ON f.id = c.foodid
CREATE VIEW sim
FROM sim As s
JOIN fud As a
ON a.id = s.foodid_a
JOIN fud As b
ON b.id = s.foodid_b
WHERE a.name = 'orange'
AND ( s.similarity > 50
OR b.kcal > 70 )
Is this the only way to do it? Can it be done without creating the views in one single query? In this simple example creating the views is not a problem, but in my real scenario, creating a view for each case would be a nightmare...
Well... After asking everywhere, I finally found the answer by myself. I think this is the most optimal way to do the query I was looking for. If someone thinks there is a better answer, I'll be very happy to hear about it
FROM calories JOIN food LEFT JOIN similar ON
((similar.foodId_A=food.id and similar.foodId_B=1 and food.id=calories.foodId)
(similar.foodId_B=food.id and similar.foodId_A=1 and food.id=calories.foodId))
WHERE (Kcal>50 or similarity>50) and similarity IS NOT NULL;
Below query can solve your problem and might be faster:
SELECT DISTINCT f.name FROM similar s LEFT JOIN food f ON f.id=(if(foodId_A=1,foodId_B,foodId_A))
where (foodId_A=1 or foodId_B=1) and similarity>50
SELECT f1.name FROM calories c1 LEFT JOIN food f1 ON c1.foodId=f1.id where c1.Kcal>70