If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > How to search over 3 tables

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-10-08, 00:33
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
How to search over 3 tables

Hello ALL

The project we're working on is a document retrieval system

I have 3 tables as follows. Table 1 contains words from a example doc. Table 2 contains words from documents in a database. Table3 contain only one column ‘DocID’.

I would like to search the database to find those docs similar to the example doc by calculating similarity score between the example doc and each doc in the database. Specifically, the simi score is calculated by adding up the sum of word frequencies of all matched word. Additionally, I am only interested in the those documents whose DocID appear in table3.

Code:
Table 1
Word    Freq
Book      2
Desk      3
Pen        3
Board      3


Table2
DocID   Word    Freq
1         Book     3
1         English    2
1          Math     1
2         Desk       2
2         Machine    5
2         Power      2
3         Desk       3
3         Teacher    3
3         Class       2
4         Building    1
4         Tower     2


Table3
DocID
1
2
I am not sure the following statement is correct or not
Code:
SELECT
      DocID, table1.Freq+ table2.Freq
FROM
      table1, table2, table3
WHERE
     table1.Word = table2.Word AND  table2.DocID = table3.DocID;
Reply With Quote
  #2 (permalink)  
Old 11-10-08, 06:31
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
there is always one very good way to determine if a query is correct or not

test it and see what happens

often, this will tell you right away if it is incorrect, if it returns something that you know is wrong

if, on the other hand, it appears to work correctly, then your task is to test it on increasingly robust and comprehensive data samples, up to and including full volume, live data


so, what happened when you tested it?

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 11-10-08, 10:30
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
Ok, I will try my statement and report the result here tomorrow.

But, as you know, in some cases, a statement is not necessarily CORRECT even though it works well with simple man-made testing data. On the other hand, for real life large databases,it is often difficult for one to have a correct answer which can be used as criterion to evaluate a query result. If this happens, it is definately a nightmare for a project. Hence, I often visit this forum to post my own statements to seek help from experts like you.
thanks
Reply With Quote
  #4 (permalink)  
Old 11-10-08, 10:37
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
i'm not really an expert

i've just done more testing than you

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 11-10-08, 11:13
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
I have not tested the statement. I will test it later.

I recall the problem posted in
How to avoid using inefficient IN/NOT IN in a query SQL statement
and r937's answer

Code:
  FROM tb1
LEFT OUTER
  JOIN tb2
    ON tb2.Name = tb1.Name
LEFT OUTER
  JOIN tb3
    ON tb3.Name = tb1.Name
Suggested by the above answer, I figure out the following statement for the problem posted on this thread.
Code:
SELECT
      Tb2.DocID, (Tb1.Freq + Tb2.Freq )Total
INNER JOIN
      Tb1.Word = Tb2.Word
INNER JOIN
      Tb2.DocID  =  Tb3.DocID;
Again, I am not sure whether the above statement is correct or not. I will test it and report later.
Reply With Quote
  #6 (permalink)  
Old 11-19-08, 09:56
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
I figured out the correct answer as follows,
Code:
SELECT
      DocID, SUM(table1.Freq+ table2.Freq) Total
FROM
      table1, table2, table3
WHERE
     table1.Word = table2.Word AND  table2.DocID = table3.DocID
GROUP BY table2.DocID;
the above statement works well with the following data
Code:
Code:
Table 1
Word    Freq
Book      2
Desk      3
Pen        3
Board      3


Table2
DocID   Word    Freq
1         Book     3
1         English    2
1          Math     1
2         Desk       2
2         Machine    5
2         Power      2
3         Desk       3
3         Teacher    3
3         Class       2
4         Building    1
4         Tower     2


Table3
DocID
1
2
r937 please any suggestions?
Reply With Quote
  #7 (permalink)  
Old 11-19-08, 10:29
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
i'm not sure what you're asking

you said you "figured out the correct answer as follows" and also "the above statement works well"

so what's the problem?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #8 (permalink)  
Old 11-20-08, 05:42
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
I judged that the statement may be correct by testing it with the small dataset. I am still not sure whether it is REALLY correct for real data ( I cannot know the correct answer for real data in advance). Also, I am not sure it is the best one in terms of efficiency. SO, I would like have your opinion.
Thanks.
Reply With Quote
  #9 (permalink)  
Old 11-20-08, 06:47
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Why don't you just expand your test data to include some of the issues you're worried about. If your current data tests nothing then it's hardly test data. Then run your code on this new test data to see if it works - if it doesn't then you know you have something to fix. The good thing about having better test data is that you can use it going forwards to test any future SQL code you might write.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On