Page 1 of 3 123 LastLast
Results 1 to 15 of 33
  1. #1
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10

    Unanswered: Swear word replacement puzzle

    Apologies, can't think of a good title...

    An interesting puzzle; given a list of swear words, clean a column of data on presentation only (i.e. the filth is to be stored in the column but filtered on selection).
    A late addition to the problem is replacement words - i.e. if a replacement word exists then use that, otherwise slap some asterisks in to mask the rudeness.

    I've come up with what I consider to be a fairly nifty solution, but it doesn't scale well - so I thought I'd set this challenge to you lot to take a stab at and see what everyone would come up with.

    Here's the set up
    Code:
    CREATE TABLE dbo.swear_words (
       swear_word   varchar(10)
     , replace_with varchar(10)
    )
    
    INSERT INTO dbo.swear_words (swear_word, replace_with)
          SELECT 'ass'  , 'butt'
    UNION SELECT 'poo'  , NULL
    UNION SELECT 'flump', 'badger'
    
    CREATE TABLE dbo.posts (
       detail  varchar(100)
    )
    
    INSERT INTO dbo.posts (detail)
          SELECT 'A classic problem with hilarious consequences'
    UNION SELECT 'Pootle is a flump!'
    UNION SELECT 'Rudy secretly wants to be a flump too'
    
    /** Desired output
    +-----------------------------------------------+------------------------------------------------+
    | filth                                         | filtered                                       |
    +===============================================+================================================+
    | A classic problem with hilarious consequences | A clbuttic problem with hilarious consequences |
    | Pootle is a flump!                            | ***tle is a badger!                            |
    | Rudy secretly wants to be a flump too         | Rudy secretly wants to be a badger too         |
    +-----------------------------------------------+------------------------------------------------+
    */
    Spoiler (i.e. my solution)
    Code:
    
    CREATE FUNCTION dbo.filter_swear_words (
       @detail varchar(100)
    )
      RETURNS varchar(100)
    AS
      BEGIN
        SELECT @detail = Replace(@detail, swear_word, Coalesce(replace_with, Replicate('*', Len(swear_word))))
        FROM   dbo.swear_words
        WHERE  CharIndex(swear_word, @detail) > 0
    
        RETURN @detail
      END
    GO
    
    SELECT detail As [filth]
         , dbo.filter_swear_words(detail) As [filtered]
    FROM   dbo.posts
    
    Any one up for the challenge?
    George
    Home | Blog

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Ignoring the obvious attempt to publicly undermine & ridicule a certain highly respected and valuable member of the forum (I think Rudy deserves far more respect than that), do you really want classic to become clbuttic?

    Spoiler:
    BTW - the replicate\ len thing is how I would handle the '*' too.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Oh yeah - but I would defo not do this as a scalar function. Are you insane?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    but i do secretly want to become a flu..., er, badger
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    That's the point Flump - how else can I acheive this?!

    On small result sets this is fine, so as long as I implement paging I should be alright in the long term.

    Some quick tests show that 10K rows is ~ 2seconds on development server; so not as bad as first thought - but after this threshold things quickly turn sour.
    Quote Originally Posted by pootle flump
    do you really want classic to become clbuttic?
    Do you not read thedailywtf?
    George
    Home | Blog

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Ah - now do you know why it scales like a dog?
    Make it an in-line table function - it will be much more efficient. Do you not read Adam Machanic?

    tdwtf - I do - relevance?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Mate, I knew why it scaled like a dog
    Quote Originally Posted by georgev
    I've come up with what I consider to be a fairly nifty solution, but it doesn't scale well
    I'll have a look at making it an inline table function, but I'm not sure I know how (based on the method used). Report back soon


    http://thedailywtf.com/Articles/The-...-Mistake-.aspx
    George
    Home | Blog

  8. #8
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    I've not read much from AM before, but found the article you're speaking of (I think): http://sqlblog.com/blogs/adam_machan...ring-post.aspx

    That confirms exactly what I had thought before but has not led me to a solution yet
    George
    Home | Blog

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    i do read tdwtf but not daily

    that link is clbuttic

    regarding swear words, would a full text index help? at least it has the (theoretical) ability to identify whole words, without pbutting every word through a regex
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  10. #10
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    The issue here is that I need to perform multiple replacements, which I have ben unable to achieve without the inline solution (hence me submitting a challenge for you chaps and chapettes to solve!).

    The full text indexing suggestion is a good one if I end up with an in-line solution. There are lots of options other than CharIndex (such as Like and Replace) but I think FTI will be more efficient.
    George
    Home | Blog

  11. #11
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    My apologies - I didn't read it properly...... No inline solution....
    Testimonial:
    pootle flump
    ur codings are working excelent.

  12. #12
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Best I can hope for is to implement FTI and page my resultsets then?
    George
    Home | Blog

  13. #13
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    I tried something like this to scrub HTML. One solution was pretty close to yours. I did try recursive CTE but that was pretty slow IIRC. Never made it into prod in the end.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  14. #14
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Oooh - I've just remembered an idea I had for my HTML scrubber that would not work for that but could for this.

    It involves dynamic SQL though. Almost certainly much more efficient, assuming there is a high ratio of rows to be cleaned Vs rows in the swear replacement table...
    Testimonial:
    pootle flump
    ur codings are working excelent.

  15. #15
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    perf of SQL for string parsing isn't that great. Have you considered using the CLR for this?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •