Results 1 to 9 of 9
  1. #1
    Join Date
    Apr 2007
    Posts
    130

    Unanswered: How to search over 3 tables

    Hello ALL

    The project we're working on is a document retrieval system

    I have 3 tables as follows. Table 1 contains words from a example doc. Table 2 contains words from documents in a database. Table3 contain only one column ‘DocID’.

    I would like to search the database to find those docs similar to the example doc by calculating similarity score between the example doc and each doc in the database. Specifically, the simi score is calculated by adding up the sum of word frequencies of all matched word. Additionally, I am only interested in the those documents whose DocID appear in table3.

    Code:
    Table 1
    Word    Freq
    Book      2
    Desk      3
    Pen        3
    Board      3
    
    
    Table2
    DocID   Word    Freq
    1         Book     3
    1         English    2
    1          Math     1
    2         Desk       2
    2         Machine    5
    2         Power      2
    3         Desk       3
    3         Teacher    3
    3         Class       2
    4         Building    1
    4         Tower     2
    
    
    Table3
    DocID
    1
    2
    I am not sure the following statement is correct or not
    Code:
    SELECT
          DocID, table1.Freq+ table2.Freq
    FROM
          table1, table2, table3
    WHERE
         table1.Word = table2.Word AND  table2.DocID = table3.DocID;

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    there is always one very good way to determine if a query is correct or not

    test it and see what happens

    often, this will tell you right away if it is incorrect, if it returns something that you know is wrong

    if, on the other hand, it appears to work correctly, then your task is to test it on increasingly robust and comprehensive data samples, up to and including full volume, live data


    so, what happened when you tested it?

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Apr 2007
    Posts
    130
    Ok, I will try my statement and report the result here tomorrow.

    But, as you know, in some cases, a statement is not necessarily CORRECT even though it works well with simple man-made testing data. On the other hand, for real life large databases,it is often difficult for one to have a correct answer which can be used as criterion to evaluate a query result. If this happens, it is definately a nightmare for a project. Hence, I often visit this forum to post my own statements to seek help from experts like you.
    thanks

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    i'm not really an expert

    i've just done more testing than you

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Apr 2007
    Posts
    130
    I have not tested the statement. I will test it later.

    I recall the problem posted in
    http://www.dbforums.com/showthread.php?t=1635250&page=2
    and r937's answer

    Code:
      FROM tb1
    LEFT OUTER
      JOIN tb2
        ON tb2.Name = tb1.Name
    LEFT OUTER
      JOIN tb3
        ON tb3.Name = tb1.Name
    Suggested by the above answer, I figure out the following statement for the problem posted on this thread.
    Code:
    SELECT
          Tb2.DocID, (Tb1.Freq + Tb2.Freq )Total
    INNER JOIN
          Tb1.Word = Tb2.Word
    INNER JOIN
          Tb2.DocID  =  Tb3.DocID;
    Again, I am not sure whether the above statement is correct or not. I will test it and report later.

  6. #6
    Join Date
    Apr 2007
    Posts
    130
    I figured out the correct answer as follows,
    Code:
    SELECT
          DocID, SUM(table1.Freq+ table2.Freq) Total
    FROM
          table1, table2, table3
    WHERE
         table1.Word = table2.Word AND  table2.DocID = table3.DocID
    GROUP BY table2.DocID;
    the above statement works well with the following data
    Code:
    Code:
    Table 1
    Word    Freq
    Book      2
    Desk      3
    Pen        3
    Board      3
    
    
    Table2
    DocID   Word    Freq
    1         Book     3
    1         English    2
    1          Math     1
    2         Desk       2
    2         Machine    5
    2         Power      2
    3         Desk       3
    3         Teacher    3
    3         Class       2
    4         Building    1
    4         Tower     2
    
    
    Table3
    DocID
    1
    2
    r937 please any suggestions?

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    i'm not sure what you're asking

    you said you "figured out the correct answer as follows" and also "the above statement works well"

    so what's the problem?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Apr 2007
    Posts
    130
    I judged that the statement may be correct by testing it with the small dataset. I am still not sure whether it is REALLY correct for real data ( I cannot know the correct answer for real data in advance). Also, I am not sure it is the best one in terms of efficiency. SO, I would like have your opinion.
    Thanks.

  9. #9
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Why don't you just expand your test data to include some of the issues you're worried about. If your current data tests nothing then it's hardly test data. Then run your code on this new test data to see if it works - if it doesn't then you know you have something to fix. The good thing about having better test data is that you can use it going forwards to test any future SQL code you might write.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •