If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Query Speed

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-10-09, 05:49
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
Query Speed

Hi All,

The execution time for the following query statement exceeds 25s. Please tell me of where I am wrong

I have 3 tables DocsFeature, CandidateDocs, ExampleDoc.

Generally, the query is used to find similar documents which are relevant to an example document. The example document is represented by a set of feature words listed in table ExampleDoc. Only those documents are SELECTED that share at least one word with the example document and whose DocID appear in CandidateDocs.

1.
DocsFeature contains feature words extracted from a collection of documents.

DocsFeature has the following fields
Code:
DocID  char(8) (Serial Number)
Word   varchar(20) (Feature words)
Frequent  tinyint   (Occurrence times for a word in this document)
Tag     char(2)     (v for verb, n for noun)
DocsFeature has index on DocID and Word

sample records are as follows
Code:
....
A111    Computer     9    n (Document A111 contains 'Computer',and it occurres 9 times)
C27      Dolphin         5   n (Document C27 contains 'Dolphin',and it occurres 5 times)

D21      Food            2    n (Document D21 contains 'Food',and it occurres 2 times)

.....

2.

CandidateDocs contains a list of DocID, having only one field

Code:
DocID   char(8) (Serial Number)
CandidateDocs has index on DocID

3. ExampleDoc fields
Code:
Word   varchar(20) (Feature words)
Frequent  tinyint   (Occurrence times for a word in the example doc)
Tag     char(2)     (v for verb, n for noun)
ExampleDoc has index on Word.

I do the following query
Code:
SELECT 
       a.DocID, SUM(a.Frequent + e.Frequent) AS Freq 

FROM 
        DocsFeature a, ExampleDoc e, CandidateDocs b 
WHERE
        a.Word = e.Word     AND  a.DocID  = b.DocID
There are 10 million records in CandidateDocs, about 90 records in ExampleDoc, and 20 thousand records in CandidateDocs.

In this case, It would takes 25s.

Please help me to improve the eifficiency of this statement.
any help will be appreciated very much.
Reply With Quote
  #2 (permalink)  
Old 08-10-09, 06:22
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
I didn't read your text but I did notice that the group by was missing:
Code:
SELECT a.DocID, SUM(a.Frequent + e.Frequent) AS Freq 
FROM   DocsFeature a, ExampleDoc e, CandidateDocs b 
WHERE  a.Word = e.Word     AND  a.DocID  = b.DocID
GROUP  BY a.DocID
You could make it a lot faster by restricting what you're showing rather than just joining every record to every other record. If you have 10m records in your tables it might be worth investing a little time and getting a little stronger at SQL.

PS it would have been more helpful to simply have the create table and index statements.
Reply With Quote
  #3 (permalink)  
Old 08-10-09, 20:40
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
Quote:
Originally Posted by mike_bike_kite
I didn't read your text but I did notice that the group by was missing:
Code:
SELECT a.DocID, SUM(a.Frequent + e.Frequent) AS Freq 
FROM   DocsFeature a, ExampleDoc e, CandidateDocs b 
WHERE  a.Word = e.Word     AND  a.DocID  = b.DocID
GROUP  BY a.DocID
You could make it a lot faster by restricting what you're showing rather than just joining every record to every other record. If you have 10m records in your tables it might be worth investing a little time and getting a little stronger at SQL.

PS it would have been more helpful to simply have the create table and index statements.
Thanks, mike_bike_kite.
Yes, I forgot the trailing 'GROUP BY a.DocID'. I just typed those words from memory.

the following are the create statements for DocsFeature and ExampleDoc,

Code:
CREATE TABLE `DocsFeature` (
  `DocID` varchar(8) COLLATE latin1_bin NOT NULL,
  `Word` varchar(20) COLLATE latin1_bin NOT NULL,
  `Freq` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `TitleWord` tinyint(1) NOT NULL,
  `CommonWord` tinyint(1) NOT NULL,
  `NrNsNtNz` char(2) COLLATE latin1_bin NOT NULL,
  KEY `idx_DocID` (`DocID`),
  KEY `idx_Word` (`Word`),
  KEY `idx_NrNsNtNz_TitleWord` (`NrNsNtNz`,`TitleWord`),
  KEY `idx_TitleWord` (`TitleWord`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_bin | 


 CREATE TABLE `ExampleDoc` (
  `Word` varchar(20) DEFAULT NULL,
  `Freq` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `TitleWord` tinyint(1) DEFAULT NULL,
  `CommonWord` tinyint(1) DEFAULT NULL,
  `NrNsNtNz` char(2) DEFAULT NULL,
  KEY `idx_Word` (`Word`),
  KEY `idx_NrNsNtNz` (`NrNsNtNz`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
As for CandidateDocs, it has only one column, DocID. It is created as a temp table on the fly when the program is running.

The SQL statement which I am complaining is as follows
Code:
CREATE TABLE TempTable_1   
   AS SELECT   
      DocsFeature.DocID AS DocID, SUM( DocsFeature.Freq+  ExampleDoc.Freq) Score  
   FROM  
       DocsFeature, ExampleDoc, CandidateDocs
    WHERE 
         DocsFeature.Word = ExampleDoc.Word  
        AND CandidateDocs.DocID =  DocsFeature.DocID  
    group by 
               DocsFeature.DocID
    HAVING COUNT(DocID) > 1;
That is, I need to put the query result into a temp table.

I have search and study many MySQL optimazition articals. However, up to now, i have not gain any significant improvement. Anyone can help me.

Last edited by cy163; 08-13-09 at 10:44.
Reply With Quote
  #4 (permalink)  
Old 08-10-09, 21:11
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
how long have you been working on this?

i seem to recall helping you with it several times over what seems like several years

have you learned how to do an EXPLAIN yet?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 08-11-09, 22:05
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
The EXPLAIN result is as follows

Code:
+----+-------------+----------------+------+------------------------+------------------------+---------+-------------------------------+------+---------------------------------+
| id | select_type | table          | type | possible_keys          | key                    | key_len | ref                           | rows | Extra                           |
+----+-------------+----------------+------+------------------------+------------------------+---------+-------------------------------+------+---------------------------------+
|  1 | SIMPLE      | ExampleDoc     | ALL  | idx_Word               | NULL                   | NULL    | NULL                          |  102 | Using temporary; Using filesort | 
|  1 | SIMPLE      | DocsFeature    | ref  | idx_DocID,idx_Word     | idx_Word               | 22      | MANAGERIE.ExampleDoc.Word  |   61 | Using where                     | 
|  1 | SIMPLE      | CandidateDocs  | ref  | idx_DocIDCandidateDocs | idx_DocIDCandidateDocs | 8       | MANAGERIE.DocsFeature.DocID   |    1 | Using where; Using index        | 
+----+-------------+----------------+------+------------------------+------------------------+---------+-------------------------------+------+---------------------------------+
Is it possible for me to further improve the query?

Last edited by cy163; 08-12-09 at 10:03.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On