Hi All,
The execution time for the following query statement exceeds 25s. Please tell me of where I am wrong
I have 3 tables DocsFeature, CandidateDocs, ExampleDoc.
Generally, the query is used to find similar documents which are relevant to an example document. The example document is represented by a set of feature words listed in table ExampleDoc. Only those documents are SELECTED that share at least one word with the example document and whose DocID appear in CandidateDocs.
1.
DocsFeature contains feature words extracted from a collection of documents.
DocsFeature has the following fields
Code:
DocID char(8) (Serial Number)
Word varchar(20) (Feature words)
Frequent tinyint (Occurrence times for a word in this document)
Tag char(2) (v for verb, n for noun)
DocsFeature has index on DocID and Word
sample records are as follows
Code:
....
A111 Computer 9 n (Document A111 contains 'Computer',and it occurres 9 times)
C27 Dolphin 5 n (Document C27 contains 'Dolphin',and it occurres 5 times)
D21 Food 2 n (Document D21 contains 'Food',and it occurres 2 times)
.....
2.
CandidateDocs contains a list of DocID, having only one field
Code:
DocID char(8) (Serial Number)
CandidateDocs has index on DocID
3. ExampleDoc fields
Code:
Word varchar(20) (Feature words)
Frequent tinyint (Occurrence times for a word in the example doc)
Tag char(2) (v for verb, n for noun)
ExampleDoc has index on Word.
I do the following query
Code:
SELECT
a.DocID, SUM(a.Frequent + e.Frequent) AS Freq
FROM
DocsFeature a, ExampleDoc e, CandidateDocs b
WHERE
a.Word = e.Word AND a.DocID = b.DocID
There are 10 million records in CandidateDocs, about 90 records in ExampleDoc, and 20 thousand records in CandidateDocs.
In this case, It would takes 25s.
Please help me to improve the eifficiency of this statement.
any help will be appreciated very much.