Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6

    Unanswered: full text noise word problems for sql 2000

    I have a sql server 2000 (still waiting on the DBAs to upgrade me) database with a number of full text catalogues hanging off of it. The full text searches generate errors when nothing but noise words are passed in or if after all of the noise words are passed the remaining search sting is so short that full text generates too many results. The procedures in the database are invoked by clients both fat and thin that exist on desktops web servers and application servers.

    My initial thought was to build a function to remove all of the noise words from the search string, evaluate the length of the string, and then decide whether or not to do a Full text or a double sided wildcard search. However while I writing this function (and wishing I had CLR to use for this one), I amrealizing that all of these nested REPLACE statements might suck performance wise.

    Any suggestions? I need it to be super fast.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    No nesting required..
    Code:
    DECLARE @noise_words table (
       word varchar(50)
    )
    
    INSERT INTO @noise_words (word) VALUES ('noise')
    INSERT INTO @noise_words (word) VALUES ('sentence')
    
    DECLARE @search_string varchar(50)
        SET @search_string = 'sentence containing noise words'
    
    SELECT @search_string = Replace(@search_string, word, '')
    FROM   @noise_words
    
    SELECT @search_string
    George
    Home | Blog

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Problem: noise words are typically "is", "it", "in".

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Code:
    DECLARE @noise_words table (
       word varchar(50)
    )
    
    INSERT INTO @noise_words (word) VALUES ('is')
    INSERT INTO @noise_words (word) VALUES ('in')
    INSERT INTO @noise_words (word) VALUES ('it')
    
    DECLARE @search_string varchar(50)
        SET @search_string = 'sentence containing noise words'
    
    SELECT @search_string = Replace(@search_string, word, '')
    FROM   @noise_words
    
    SELECT @search_string
    The output is strangely apt.

  5. #5
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Your point?
    Code:
    INSERT INTO @noise_words (word) VALUES (' is ')
    INSERT INTO @noise_words (word) VALUES (' it ')
    Note the spaces
    George
    Home | Blog

  6. #6
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Or even just
    Code:
    SELECT @search_string = Replace(@search_string, ' ' + word + ' ', '')
    FROM   @noise_words
    George
    Home | Blog

  7. #7
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    I have written something very like what you want, but unfortunately it was in CLR. The problem, as you say, is you have string manipulation of two sets of data that must be matched in a cross join (i.e. every row in the first matched to every row of the second).

    How big are your two sets? And is there nothing on the web to help you sort this error out at the source rather than roll your own solution?

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Code:
    DECLARE @noise_words table (
       word varchar(50)
    )
    
    INSERT INTO @noise_words (word) VALUES (' is ')
    INSERT INTO @noise_words (word) VALUES (' in ')
    INSERT INTO @noise_words (word) VALUES (' it ')
    
    DECLARE @search_string varchar(50)
        SET @search_string = 'it will fail with this won't it?'
    
    SELECT @search_string = Replace(@search_string, word, '')
    FROM   @noise_words
    
    SELECT @search_string

  9. #9
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    George - the problem is the noise word might start a sentence, end a sentence or be up against a non-alpha. If replace allowed pattern matches you could do it but without you will struggle.

  10. #10
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Starting or ending a string is no issue, but I had not considered the non-alphas; back to the drawing board for a mo...
    George
    Home | Blog

  11. #11
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Hmmm.....
    Code:
    DECLARE @noise_words table (
       word varchar(50)
    )
    
    INSERT INTO @noise_words (word) VALUES ('is')
    INSERT INTO @noise_words (word) VALUES ('in')
    INSERT INTO @noise_words (word) VALUES ('it')
    
    DECLARE @punctuation table (
       mark char(1)
    )
    
    INSERT INTO @punctuation (mark) VALUES ('?')
    INSERT INTO @punctuation (mark) VALUES ('!')
    INSERT INTO @punctuation (mark) VALUES (',')
    INSERT INTO @punctuation (mark) VALUES ('.')
    INSERT INTO @punctuation (mark) VALUES ('"')
    
    DECLARE @search_string varchar(100)
        SET @search_string = 'it is in, is it? Awesome. A word ending with an "in" would be thin.'
    
    
    SET @search_string = ' ' + @search_string + ' '
    
    SELECT @search_string = Replace(Replace(Replace(Replace(@search_string, ' ' +  word + ' ', '  '), mark + word + ' ', ' '), ' ' + word + mark, ' '), ' ' + mark + word + mark + ' ', '  ')
    FROM   @noise_words
         , @punctuation
    
    SELECT @search_string
    I won't give up that easily!
    Last edited by gvee; 05-13-10 at 12:38.
    George
    Home | Blog

  12. #12
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    You are a prick, George. You've come up with a much simpler solution in 10 minutes than the one I struggled to create in a week.

    You'll need more punctuation

  13. #13
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Oh goody - it failed!

  14. #14
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    This would be more efficient:
    Code:
    DECLARE @noise_words table (
       word varchar(50)
    )
    
    INSERT INTO @noise_words (word) VALUES ('is')
    INSERT INTO @noise_words (word) VALUES ('in')
    INSERT INTO @noise_words (word) VALUES ('it')
    
    DECLARE @punctuation table (
       mark char(1)
    )
    
    INSERT INTO @punctuation (mark) VALUES ('?')
    INSERT INTO @punctuation (mark) VALUES ('!')
    INSERT INTO @punctuation (mark) VALUES (',')
    INSERT INTO @punctuation (mark) VALUES ('.')
    
    DECLARE @search_string varchar(50)
        SET @search_string = 'it is in, is it? Awesome.'
    
    
    SET @search_string = ' ' + @search_string + ' '
    
    SELECT  @search_string = Replace(Replace(Replace(@search_string, ' ' +  word + ' ', ''), mark + word, ''), word + mark, '')
    FROM    @noise_words
          , @punctuation
    WHERE   @search_string LIKE '%[^a-zA-Z]' + word + '[^a-zA-Z]%'    --" XX "
         OR 
            @search_string LIKE '%[^a-zA-Z]' + word                   --" XX"
         OR 
            @search_string LIKE word + '[^a-zA-Z]%'                   --"XX "
    
    SELECT @search_string

  15. #15
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by pootle flump View Post
    You are a prick, George.
    Charming.
    Quote Originally Posted by pootle flump View Post
    You've come up with a much simpler solution in 10 minutes than the one I struggled to create in a week.
    Does not absolve the aforementioned insult.

    P.S. I've edited that response with something that doesn't fail
    George
    Home | Blog

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •