If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > how to find out values shared by some members

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-15-09, 20:58
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
how to find out values shared by some members

Hello ALL,

I have a table contains words extracted from documents which are belonging to a number of categories. Each category contains a number of documents. For each category, I want to find out those words which occur at least in N documents belonging to this category.
Code:
Category    DocID       Word       Freq      Tag
Art             A12         Paint        3           verb
Art             A12        Exhibition    5           noun
Art             A12         Show        2           verb
Art             A12          Ticket       4          noun

Art             B34         Exhibition   2          noun
Art             B34         Musum       3          noun
Art             B34         Contemporary   2      adj
Art             B34         Show               1      noun
Art             B34         Visitor             2       noun


Art             E97         gallery             3      noun
Art             E97         Show               3      noun
Art             E97         Exhibition         2       noun


Electronics    A32        Circuit           2       noun
Electronics    A32        IC                 2       noun
Electronics    A32        Factory         2       noun
Electronics    A32       Electronics      3       noun

Electronics    F11        HiFi               2     noun
Electronics    F11        IC                 2     noun
Electronics    F11        Circuit           2     noun

Electronics   C55         IC                 3      noun
Electronics   C55         Circuit           3      noun
Electronics   C55         Transmit        2      verb
Electronics   C55         Receive         5      verb
I wonder how to do this query.

Thanks
Reply With Quote
  #2 (permalink)  
Old 01-15-09, 21:08
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Quote:
Originally Posted by cy163
I wonder how to do this query.
oh, come on, it's not that hard

you've been writing SQL on this schema for months and months

surely you have learned enough by now to give it a try on your own

otherwise, we're going to start asking you for a part of your salary...

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 01-16-09, 09:43
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
sorry.
Actually, I have tried to sovle the problem by myself, but i failed.
Reply With Quote
  #4 (permalink)  
Old 01-16-09, 09:46
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
show your attempt, and explain what results it returned and why those results were incorrect
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 01-16-09, 11:05
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
Is this one right?
Code:
SELECT
  a.Category, a.DocID, a.Word 
FROM 
     cateData as a 
INNER JOIN cateData as b 

ON 
    (a.Category = b.Category and a.Word = b.Word) 
where 
     a.DocId <> b.DocID;
Reply With Quote
  #6 (permalink)  
Old 01-16-09, 11:14
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Code:
SELECT Category    
     , DocID       
     , Word       
  FROM cateData
GROUP
    BY Category    
     , DocID       
     , Word    
HAVING COUNT(*) >= N
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #7 (permalink)  
Old 01-17-09, 03:50
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
I never know GROUP BY can be followed by more than one field names before. So, It is no doubt that I could not figure out this solution by myself.

Thanks r937.

Forget to mention, I also need the average value of occurrence frequency of a selected word. So, I adapt the solution. Is it right?

Code:
SELECT Category    
     , DocID       
     , Word 
     , AVG(Freq) as Avg_Freq     
  FROM cateData
GROUP
    BY Category    
     , DocID       
     , Word    
HAVING COUNT(*) >= N
Reply With Quote
  #8 (permalink)  
Old 01-17-09, 04:11
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Quote:
Originally Posted by cy163
Is it right?
yes, it is
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #9 (permalink)  
Old 01-19-09, 09:27
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
Today, I test the query with real data and find out the right statement is

Code:
SELECT Category    
     , DocID       
     , Word 
     , AVG(Freq) as Avg_Freq     
  FROM cateData
GROUP
    BY Category    
   
     , Word    
HAVING COUNT(*) >= N
That is, the GROUP BY clause should be
Code:
GROUP BY
    Category    
  , Word
DocID should not appear in the GROUP BY clause.
Reply With Quote
  #10 (permalink)  
Old 01-19-09, 09:35
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
you are certainly making good progress in learning SQL
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On