Results 1 to 4 of 4
  1. #1
    Join Date
    Sep 2004
    Posts
    3

    Unanswered: Unique words statement

    Hello all. I've been pulling my hair out for the last few weeks trying to come up with a statement that will do what I want. I'm hoping someone can lend some help.

    Basically I have a table of articles with titles. I want to go thru the titles and find out what words show up the most. For example, if I had these two article titles in two records:

    Microsoft develops new software for NASA
    NASA blames software problem on Microsoft

    I would get the following results - the word and the number of times it appears:

    Microsoft 2
    NASA 2
    Software 2

    The statement should ignore those words that only appear once. It would be nice to skip static words like the, and, a, etc.. (Or words 3 characters or less)

    I have fulltext query enabled on the table which works great for searching, but not for what I want it to do.

    Any suggestions? Thanks in advance.

  2. #2
    Join Date
    Apr 2003
    Location
    Phoenix, AZ
    Posts
    177
    Here's a start. This function splits a string into individual result rows:

    ================================================== ==
    CREATE FUNCTION dbo.Split(@String nvarchar(4000), @Delimiter char(1))
    RETURNS @Results TABLE (Items nvarchar(4000))
    AS
    /*

    Usage:

    select * from dbo.Split('Mary had a little lamb',' ')

    */
    BEGIN
    DECLARE @INDEX INT
    DECLARE @SLICE nvarchar(4000)
    -- HAVE TO SET TO 1 SO IT DOESNT EQUAL ZERO FIRST TIME IN LOOP
    SELECT @INDEX = 1
    WHILE @INDEX !=0


    BEGIN
    -- GET THE INDEX OF THE FIRST OCCURENCE OF THE SPLIT CHARACTER
    SELECT @INDEX = CHARINDEX(@Delimiter,@STRING)
    -- NOW PUSH EVERYTHING TO THE LEFT OF IT INTO THE SLICE VARIABLE
    IF @INDEX !=0
    SELECT @SLICE = LEFT(@STRING,@INDEX - 1)
    ELSE
    SELECT @SLICE = @STRING
    -- PUT THE ITEM INTO THE RESULTS SET
    INSERT INTO @Results(Items) VALUES(@SLICE)
    -- CHOP THE ITEM REMOVED OFF THE MAIN STRING
    SELECT @STRING = RIGHT(@STRING,LEN(@STRING) - @INDEX)
    -- BREAK OUT IF WE ARE DONE
    IF LEN(@STRING) = 0 BREAK
    END
    RETURN
    END

    ==================================================

    Then you could use something like:

    select Items
    , count(*)
    from dbo.Split('Mary Mary had a Little little lamb',' ')
    group by Items
    having count(*) > 1
    Fred Prose

  3. #3
    Join Date
    Sep 2004
    Posts
    3

    Thanks Fred

    That function rocks - I'm going to use that for something else. But it does not do *exactly* what I need.

    The function works with one row(or string) of data.

    I need to loop thru all the records and see which words appear the most. There could be 1000 records with titles that I need to find out which word shows up the most.

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    So take the function and use it to populate a subtable of individual words in each title. If you want to automate the process, then put the function in triggers on your title table.
    Then all you need is an aggregate query to get the results you want.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •