# Thread: Which 'select' statement for querying is more efficient?

1. Registered User
Join Date
Apr 2007
Posts
130

## Unanswered: Which 'select' statement for querying is more efficient?

Hello ALL,

There are two tables in my database. Table DocDb has three fields, Doc_ID, representing the number of a Document, Word, representing words appearing in the document, Occ_Freq, representing the occurrence count of a word.

Another table ExampleDb has two fields, Word and Occ_Freq, having the same meaning as in the table DocDb. Word represents the word appearing in an example document; Occ_Freq represents the count of a word.

I would like calculate the similarity between the example doc and every doc in the DocDb by summing up occurrence count of matched word, and sort the documents by their similarity scores.

I have figure out two 'select' statements, but I am not sure which one is more efficient.

(1)
char ExampleDb_Word[3][20];
strcpy(ExampleDb_Word[0], "Water");
strcpy(ExampleDb_Word[0], "Plant");
strcpy(ExampleDb_Word[0], "Sun");

for(i= 0; i < 10000; i++)
{
for(j=0; j<3; j++)
{

select Occ_Freq from DocDb WHERE Word= ExampleDb_Word[j];

//do summing up here.

}
}
I suspect this method would be time consuming as there are two for loops.

(2)
select DocDb.Occ_Freq, ExampleDb.Occ_Freq from DocDb,ExampleDb WHERE DocDb.Word=ExampleDb.Word

Would anyone can tell me which method is better and Why?

any help and suggestion will be appreciated very much.

Table DocDb

Doc_ID Word Occ_Freq
------------------------------
1 Water 9
1 Bottle 6
1 Milk 3
2 Lamp 7
2 Desk 8
2 Food 5
3 Water 4
3 Plant 3
3 Green 2
4 House 8
4 Biulding 2
4 Renovation 5
....... ......... ..
10000 Sea 7
10000 Temperatur 6
10000 Wave 3

Table ExampleDb
Word Occ_Freq
-------------------
Water 3
Plant 2
Sun 2

2. SQL Consultant
Join Date
Apr 2002
Location
Posts
20,002
any time you execute thousands of queries to get an answer is going to be ~way~ slower than one query

i ~think~ you can do this in one query, but i don't know for sure what "summing up occurrence count of matched word, and sort the documents by their similarity scores" means

perhaps you could illustrate this by continuing your water-plant-sun example and showing us how you actually calculate the similarity of each of the first 4 documents

3. Registered User
Join Date
Apr 2007
Posts
130
Originally Posted by r937
any time you execute thousands of queries to get an answer is going to be ~way~ slower than one query

i ~think~ you can do this in one query, but i don't know for sure what "summing up occurrence count of matched word, and sort the documents by their similarity scores" means

perhaps you could illustrate this by continuing your water-plant-sun example and showing us how you actually calculate the similarity of each of the first 4 documents

Doc 1 share one word 'Water' with the example doc, so the similarity score is (9 + 3) = 12. Doc 3 share two words, 'Water' and 'Plant', so the simi score is (4+3) +(3+2) = 12. Other docs have a 0 score as they do not share any word with the example doc. This evaluation method for document similiarity is just for illustration purpose. The actual method would be a complicated one.

So do you have any further specific suggestions.
Last edited by cy163; 06-06-08 at 19:51.

4. SQL Consultant
Join Date
Apr 2002
Location
Posts
20,002
Code:
```SELECT d.Doc_ID
, SUM(d.Occ_Freq + e.Occ_Freq) AS score
FROM DocDb AS d
LEFT OUTER
JOIN ExampleDb AS e
ON e.Word = d.Word
GROUP
BY d.Doc_ID
ORDER
BY score DESC```

5. Registered User
Join Date
Apr 2007
Posts
130
Originally Posted by r937
Code:
```SELECT d.Doc_ID
, SUM(d.Occ_Freq + e.Occ_Freq) AS score
FROM DocDb AS d
LEFT OUTER
JOIN ExampleDb AS e
ON e.Word = d.Word
GROUP
BY d.Doc_ID
ORDER
BY score DESC```

Thanks again r937.

the real picture is a bit complicated. the are another field for the two tables,'TitleWord_YesNo', indicating whether the word is a word in the document title. Matching on words in the document title will be put more emphasis (usually original score will be multiplyed by a coefficient, say 2.0) than matching on non title words. Can I use 'if' statement in a SQL statement.

Doc 1 share one word 'Water' with the example doc, so the similarity score is (9 + 3) *2.0 = 24, if 'Water' is a title word both in Doc1 and the example doc

6. SQL Consultant
Join Date
Apr 2002
Location
Posts
20,002
Originally Posted by cy163
Can I use 'if' statement in a SQL statement.
yes

it's called CASE

7. Registered User
Join Date
Apr 2007
Posts
130
Originally Posted by r937
yes

it's called CASE

I really learn a lot by communicating with you.

have a nice weekend

8. SQL Consultant
Join Date
Apr 2002
Location
Posts
20,002
ta very much

9. World Class Flame Warrior
Join Date
Jun 2003
Location
Ohio
Posts
12,595
If you are interested, I have an Access database I set up to do this, for tracking error messages and solutions to other technical issues.
I found that simply looking for word matches was a poor algorithm, and that I had to weight each word according to the frequency with which it had successfully been matched to prior solutions.
The math was a fun challenge, which you could probably dig out of the formulas I came up with.
It actually works pretty dang well.

10. Registered User
Join Date
Jul 2003
Location
San Antonio, TX
Posts
3,662
And the solution is...where is it? Don't be stingy, I post my code (even if it is ugly )

11. Registered User
Join Date
Apr 2007
Posts
130
Originally Posted by blindman
If you are interested, I have an Access database I set up to do this, for tracking error messages and solutions to other technical issues.
I found that simply looking for word matches was a poor algorithm, and that I had to weight each word according to the frequency with which it had successfully been matched to prior solutions.
The math was a fun challenge, which you could probably dig out of the formulas I came up with.
It actually works pretty dang well.

Could you please present the formulas and more detailed information

12. World Class Flame Warrior
Join Date
Jun 2003
Location
Ohio
Posts
12,595
When matching articles, I parse each word of the articles into a WordList table.
I calculate the "Selectivity" value (from 0 to 1) of each word based upon how frequently it is used vs how frequently it has successfully matched articles in the past.
Then I calculate the match between two articles as the sum product of the matching word selectivities. To calculate the sum product, I use LOG functions:

Match = 1-Exp(Sum(Log(WordSelectivity)))

13. Registered User
Join Date
Apr 2007
Posts
130
Originally Posted by r937
yes

it's called CASE

Code:
```SELECT d.Doc_ID
, SUM(CASE d.TitleWord='1' AND e.TitleWord='1'  THEN 2*(d.Occ_Freq + e.Occ_Freq),
CASE d.TitleWord='0' OR e.TitleWord='0'  THEN d.Occ_Freq + e.Occ_Freq) AS score
FROM DocDb AS d
LEFT OUTER
JOIN ExampleDb AS e
ON e.Word = d.Word
GROUP
BY d.Doc_ID
ORDER
BY score DESC```
It seems that the above statement is wrong and does not work. What is the correct way to differentiate cases in the SUM clause based on table fields.

Actually, what i want to implement is sum up weight * (d.Occ_Freq + e.Occ_Freq), where the coefficient 'weight' depends on a condition expression consisting of some table fields.
if condition_expression = 1 then weight =2;
if condition_expression = 2 then weight =3;
if condition_expression = 3 then weight =4;
......
Last edited by cy163; 08-29-08 at 20:57. Reason: correction

14. Registered User
Join Date
Apr 2007
Posts
130

## How use CASE in a special situation

Hello,

I made this thread to learn an efficient way to query a database using SELECT statement. Friend r937 suggested using SUM and CASE. However, I do not exactly know how to use them in my case. So, please take a look and help me.

Felix.
Last edited by r937; 08-31-08 at 21:54.

15. SQL Consultant
Join Date
Apr 2002
Location
Posts
20,002
Code:
```SUM(
CASE WHEN condition_expression = 1 THEN 2
WHEN condition_expression = 2 THEN 3
WHEN condition_expression = 3 THEN 4
ELSE NULL END
* (d.Occ_Freq + e.Occ_Freq)
)```

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•