Results 1 to 2 of 2
  1. #1
    Join Date
    Jan 2003
    Posts
    1,629
    Provided Answers: 1

    Unanswered: SQL question: How to display only third word from character field for each of the rec

    Hi,
    using DB2 v11 fixpack 2 on Linux.

    PROBLEM: From character field I would like to get ONLY THIRD WORD from each of the record.

    From sample table:
    Code:
    COL1
    ------------------------
    aaa bbb ccc ddd
    aaaa bbbb cccc dddd
    aaaaa bbbbb ccccc ddddd
    I would like to get third word from each of the row:
    Code:
    COL1
    ------------------------
    ccc
    cccc
    ccccc
    Sample:
    Code:
    create table admin.tab (col1 char(30));
    insert into admin.tab values ('aaa bbb ccc ddd');
    insert into admin.tab values ('aaaa bbbb cccc dddd');
    insert into admin.tab values ('aaaaa bbbbb ccccc ddddd');
    So far I have tried to solve the problem using regular expression substring function REGEXP_SUBSTR. Details: https://www.ibm.com/support/knowledg.../r0061497.html
    Note: This is new function in DB2 v11.

    General syntax:
    Code:
    select
        regexp_substr(col1, 'REGULAR EXPRESSION', 1, 1, 'im') as data
    from
        admin.tab
    Theory of regular expressions:
    \S is non-space character
    \s is space character
    + is one or more characters

    Combining \S+ is word (all characters between spaces or better said: non-space characters)
    Combining \s+ is one or more spaces

    To get first word:
    \S+

    To get first and second word (separated by spaces)
    \S+\s+\S+
    Note: "\S+" first word "\s+" one or more spaces "\S+" second word

    To get first, second and third word (separated by spaces)
    \S+\s+\S+\s+\S+

    Above works fine. Now to my problem. I would only like to get ONLY THIRD WORD from each of the record. I have searched the web and found the following general (non-DB2) question How to extract nth word using regular expression: https://stackoverflow.com/questions/...lar-expression where solution is to use ?: inside parenthesis to remove first two words.

    I have checked DB2 documentation and according to Regular expression control characters https://www.ibm.com/support/knowledg.../r0061533.html there is info:
    (?: ... ) Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text.

    So back to my REGULAR EXPRESSION it should be:
    (?:\S+\s+\S+\s+)
    to remove first two words. An final:
    (?:\S+\s+\S+\s+)(\S)
    to remove first two words and display third word. But above does work like expected (remove first two words and display only third word), but instead it displays first three words. Looks like ?: does not have any effect.

    Any idea how to write regular expression to get only third word from each of the record?
    Thanks
    Last edited by grofaty; 08-07-17 at 05:09.

  2. #2
    Join Date
    Jul 2016
    Location
    Moscow
    Posts
    259
    Provided Answers: 39
    Hi,

    Try this:
    Code:
    select xmlcast(xmlquery('fn:tokenize($s, "\s+")[3]' passing s as "s") as varchar(50))
    from table(values 
      'aaa bbb ccc ddd'
    , 'aaaa bbbb cccc dddd'
    , 'aaaaa bbbbb ccccc ddddd'
    , 'aaaaa bbbbb ccccc'
    , 'aaaaa bbbbb'
    ) t(s)
    Regards,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •