Results 1 to 10 of 10
  1. #1
    Join Date
    Apr 2007
    Posts
    130

    Unanswered: how to find out values shared by some members

    Hello ALL,

    I have a table contains words extracted from documents which are belonging to a number of categories. Each category contains a number of documents. For each category, I want to find out those words which occur at least in N documents belonging to this category.
    Code:
    Category    DocID       Word       Freq      Tag
    Art             A12         Paint        3           verb
    Art             A12        Exhibition    5           noun
    Art             A12         Show        2           verb
    Art             A12          Ticket       4          noun
    
    Art             B34         Exhibition   2          noun
    Art             B34         Musum       3          noun
    Art             B34         Contemporary   2      adj
    Art             B34         Show               1      noun
    Art             B34         Visitor             2       noun
    
    
    Art             E97         gallery             3      noun
    Art             E97         Show               3      noun
    Art             E97         Exhibition         2       noun
    
    
    Electronics    A32        Circuit           2       noun
    Electronics    A32        IC                 2       noun
    Electronics    A32        Factory         2       noun
    Electronics    A32       Electronics      3       noun
    
    Electronics    F11        HiFi               2     noun
    Electronics    F11        IC                 2     noun
    Electronics    F11        Circuit           2     noun
    
    Electronics   C55         IC                 3      noun
    Electronics   C55         Circuit           3      noun
    Electronics   C55         Transmit        2      verb
    Electronics   C55         Receive         5      verb
    I wonder how to do this query.

    Thanks

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by cy163
    I wonder how to do this query.
    oh, come on, it's not that hard

    you've been writing SQL on this schema for months and months

    surely you have learned enough by now to give it a try on your own

    otherwise, we're going to start asking you for a part of your salary...

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Apr 2007
    Posts
    130
    sorry.
    Actually, I have tried to sovle the problem by myself, but i failed.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    show your attempt, and explain what results it returned and why those results were incorrect
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Apr 2007
    Posts
    130
    Is this one right?
    Code:
    SELECT
      a.Category, a.DocID, a.Word 
    FROM 
         cateData as a 
    INNER JOIN cateData as b 
    
    ON 
        (a.Category = b.Category and a.Word = b.Word) 
    where 
         a.DocId <> b.DocID;

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Code:
    SELECT Category    
         , DocID       
         , Word       
      FROM cateData
    GROUP
        BY Category    
         , DocID       
         , Word    
    HAVING COUNT(*) >= N
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Apr 2007
    Posts
    130
    I never know GROUP BY can be followed by more than one field names before. So, It is no doubt that I could not figure out this solution by myself.

    Thanks r937.

    Forget to mention, I also need the average value of occurrence frequency of a selected word. So, I adapt the solution. Is it right?

    Code:
    SELECT Category    
         , DocID       
         , Word 
         , AVG(Freq) as Avg_Freq     
      FROM cateData
    GROUP
        BY Category    
         , DocID       
         , Word    
    HAVING COUNT(*) >= N

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by cy163
    Is it right?
    yes, it is
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Apr 2007
    Posts
    130
    Today, I test the query with real data and find out the right statement is

    Code:
    SELECT Category    
         , DocID       
         , Word 
         , AVG(Freq) as Avg_Freq     
      FROM cateData
    GROUP
        BY Category    
       
         , Word    
    HAVING COUNT(*) >= N
    That is, the GROUP BY clause should be
    Code:
    GROUP BY
        Category    
      , Word
    DocID should not appear in the GROUP BY clause.

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    you are certainly making good progress in learning SQL
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •