Page 1 of 2 12 LastLast
Results 1 to 15 of 18
  1. #1
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1

    Unanswered: Find a sentence with words separated by multiple spaces

    Hi everyone,

    I am trying to find an efficient way to perform a special query. Let me explain what I want.

    Let's say we are looking for all description that match "this is the target". In fact, I want to find records that match those 4 words in this sequence disregarding the number of spaces (I mean spaces, tabs, Cr, Lf, etc) between them.

    This has to be done without REGEX (would be too easy!). Besides throwing a bunch of REPLACE(REPLACE(REPLACE())) to strip separators, anyone has a better idea on how to do that in SQL, only in SQL with no UDF, just plain DB2 SQL ???

  2. #2
    Join Date
    Nov 2005
    Location
    IL
    Posts
    557
    like "this%is%the%target"

    I think this should work
    --
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows

    DB2 v9.7.0.6 os 6.1.0.0

  3. #3
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1
    Quote Originally Posted by Cougar8000
    like "this%is%the%target"

    I think this should work
    No it won't... You'd get a match for something like "thisXisXthe55555target"

  4. #4
    Join Date
    Nov 2005
    Location
    IL
    Posts
    557
    isn't it what you were looking for?
    --
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows

    DB2 v9.7.0.6 os 6.1.0.0

  5. #5
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    .........
    ........
    Last edited by n_i; 07-02-08 at 13:20.
    ---
    "It does not work" is not a valid problem statement.

  6. #6
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1

    More detailed example

    In my example, if the searched string is "this is the target", I'd like to have results like :

    "this is the target"
    "this is the target"
    "this is the target"
    etc...

    Basically, I'm just ignoring the number of spaces between the words. it could be 1, 2, 3, 1000000 spaces between each word, it doesn't matter. Also, by space I mean separators (carriage return, tabs, line feed, etc).

  7. #7
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1

    More detailed example

    Sorry for the repost... The editor stripped the spaces between words! LOL

    To make it more clear, here's a more detailed example.

    If the searched string is "this is the target", I would expect results like :


    "this[9 spaces]is[1 tab]the[1 carriage return][1 line feed]target"
    "this[3 tabs][1 space]is[2 carrage returns]the[2 spaces]target"
    "this[1 space]is[16 spaces]the[2 tabs][11 spaces][1 carriage return][1 tab]target"
    "this[80 spaces]is[4 line feeds]the[5 spaces]target"
    "this[1 space]is[1 space]the[1 space]target"

    Well, you get the picture. For practical reasons, we can assume that no word is going to be

    separated by more than 80 separator characters. Separators are : space, tab, line feed,

    carriage return. The number of separators bewteen each word can be anything <= 80.

    As I said, all this has to be done in "plain" SQL, i.e. no UDF, no REGEX, just plain basic

    SQL.

  8. #8
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Have a look at this article: http://www.ibm.com/developerworks/db...03stolze1.html

    It describes a bit how you can parse strings with pure and simple SQL. You can use the same technique to analyze your strings and tolerate the different amount of spaces.

    However, what's wrong with a UDF? Unless this is an exercise, in which case you should try to solve it yourself, I would like to understand the requirement for this restriction.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  9. #9
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1
    Quote Originally Posted by stolze
    Have a look at this article: http://www.ibm.com/developerworks/db...03stolze1.html

    It describes a bit how you can parse strings with pure and simple SQL. You can use the same technique to analyze your strings and tolerate the different amount of spaces.

    However, what's wrong with a UDF? Unless this is an exercise, in which case you should try to solve it yourself, I would like to understand the requirement for this restriction.
    I know, makes no sense but they (the client), ahem, "don't like" UDF...

  10. #10
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Without UDFs you are lost. Trying to do such parsing in the main SQL statement only results in a convoluted mess. Thus, separating functionality out into UDFs (either SQL PL or external C/C++, Java, ... code) is really helping to make things easier to understand and keep it maintainable.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  11. #11
    Join Date
    Dec 2005
    Posts
    273
    consider a logic like this:

    1) Replace any valid separating character with blanc
    2) remove all blancs
    3) search the result for "thisisthetarget"

    this can be done by:

    ... WHERE REPLACE ( TRANSLATE ( searched-string , X'40404040', X'????????' ) , ' ' ) = 'thisisthetarget'

    Where ?? is the hexadecimal representation of the valid separators
    Last edited by umayer; 07-03-08 at 10:00.

  12. #12
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1
    Quote Originally Posted by umayer
    consider a logic like this:

    1) Replace any valid separating character with blanc
    2) remove all blancs
    3) search the result for "thisisthetarget"

    this can be done by:

    ... WHERE REPLACE ( TRANSLATE ( searched-string , X'40404040', X'????????' ) , ' ' ) = 'thisisthetarget'

    Where ?? is the hexadecimal representation of the valid separators
    That's what I came up with... A bunch of REPLACE to convert Tabs, CR and LF to spaces and other REPLACE to trim multiple spaces down to a single space... Really really ugly! Looks like Lisp code with all those parentheses!

  13. #13
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    That's what I came up with... A bunch of REPLACE to convert Tabs, CR and LF to spaces and other REPLACE to trim multiple spaces down to a single space... Really really ugly! Looks like Lisp code with all those parentheses!
    I think this strategy is not too ugry(somebody may not agree).
    1) TRANSLATE to convert Tab, CR and LF to space.
    2) Five repetitive REPLACEs to replace multiple blanks to one blank.
    REPLACE n5 blanks to one blank, REPLACE n4 blanks to one blank, ..., REPLACE n2 blanks to one blank, REPLACE two blanks to one blank.
    Some combinations of n5,n4,n3,n2 are
    Code:
    N2          N3          N4          N5          Maximum Length
    ----------- ----------- ----------- ----------- --------------
              2           3           6          21            460
              2           3           6          22            460
              2           3           7          21            460
              2           3           7          22            460
              2           4           6          21            460
              2           4           6          22            460
              2           4           7          21            460
              2           4           7          22            460
              3           3           6          21            460
              3           3           6          22            460
              3           3           7          21            460
              3           3           7          22            460
              3           4           6          21            460
              3           4           6          22            460
              3           4           7          21            460
              3           4           7          22            460
    
      16 record(s) selected.
    I couldn't find an answer to replace 80 blanks to one blank with 4 REPLACEs.
    From two to 40 blanks could be replaced by one blank with 4 REPLACEs.
    Code:
    N2          N3          N4          M          
    ----------- ----------- ----------- -----------
              2           3           6          40
              2           4           7          40
              2           4           6          40
              2           3           7          40
              3           3           6          40
              3           3           7          40
              3           4           6          40
              3           4           7          40
              2           3           8          38
              3           3           8          38
    
      10 record(s) selected.
    Last edited by tonkuma; 07-03-08 at 21:59.

  14. #14
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH Test_data(id, source) AS (
    VALUES 
     (1, 'this         is'||x'09'||'the'||x'0d0a'||'target')
    ,(2, 'this'||x'090909'||' '||'is'||x'0d0d'||'the  target')
    ,(3, 'this'||x'090909'||' '||'is'||x'0d0d'||'thetarget')
    ,(4, 'this is                the'||x'0909'||'           '||x'0d09'||'target')
    /*        1___5___10____5___20____5___30____5___40____5___50____5___60____5___70____5___80  */
    ,(5, 'this                                                                                is'||x'0a0a0a0a'||'the     target')
    ,(6, 'this                                     s                                          is'||x'0a0a0a0a'||'the     target')
    ,(7, 'this is the target')
    )
    /* End of Test data */
    SELECT id, source
      FROM Test_data
     WHERE REPLACE(
           REPLACE(
           REPLACE(
           REPLACE(
           REPLACE( TRANSLATE(source, '   ', x'090a0d')
                  , '                     ', ' ') /* 21 blanks -> one blank */
                  , '      ', ' ') /* 6 blanks -> one blank */
                  , '   ', ' ') /* 3 blanks -> one blank */
                  , '  ', ' ') /* 2 blanks -> one blank */
                  , '  ', ' ') /* 2 blanks -> one blank */
           = 'this is the target'
    ;
    ------------------------------------------------------------------------------
    
    ID          SOURCE                                                                                                    
    ----------- ----------------------------------------------------------------------------------------------------------
              1 this         is	the
    target                                                                               
              2 this			 isthe  target                                                                                   
              4 this is                the		           	target                                                           
              5 this                                                                                is
    
    
    
    the     target  
              7 this is the target                                                                                        
    
      5 record(s) selected.

  15. #15
    Join Date
    Dec 2005
    Posts
    273
    No need to replace 5 blancs by 1, and 4 blancs by 1, and 3 blancs by 1 ...

    simply replace any blanc by an empty string ( by omitting the third parameter of the REPLACE function )

    Then you only need ONE TRANSLATE() and ONE REPLACE()

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •