Page 1 of 2 12 LastLast
Results 1 to 15 of 27
  1. #1
    Join Date
    Jul 2011
    Posts
    9

    Unanswered: Removing duplicate values while concatenating 4 columns

    Hello,

    Could any one please help in trying to find a solution for the problem below.
    There are the columns COL1,COL2,COL3,DESC
    After concatenating all the fields below, i should be able to avoid redundant data or remove duplicate data that is concatenated from COL1,COL2,COL3,DESC.
    The whole idea is to get a unique string that should contain the data from all the columns after concatenation
    COL1||COL2||COL3||DESC

    COL1 COL2 COL3 DESC
    MAZDA XYZ COMM MAZDA XYZ AWD COMM WITH UPGRADE
    BMW 3.25 9iJKpon BMW 325i REG 9iJKpon


    What i am looking for is a single string containing the unique values after concatenating COL1||COL2||COL3||DESC
    MAZDA XYZ AWD COMM WITH UPGRADE
    BMW 3.25 325I REG 9iJKpon

    Thanks for ur help in advance..

    Regards,
    --KA
    Last edited by karv; 07-19-11 at 06:41.

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    If I didn't misunderstand your requirements,
    here is a hint showing steps with reference documents to solve the problem.

    1) UNPIVOT
    unpivot col1, col2 and col3 to col_x
    http://sirdug.org/downloads/SQLonFire_1_SirDUG.pdf

    2) SELECT DISTINCT col_x ...
    select-clause in
    subselect - IBM DB2 9.7 for Linux, UNIX, and Windows


    3) concatenate strings in a column of multiple rows
    LISTAGG function.
    http://www.dbforums.com/db2/1668330-...ing-comma.html
    Last edited by tonkuma; 07-19-11 at 05:49. Reason: Add reference document to 2).

  3. #3
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    If you want to remove redundant data in col1, col2, col3
    and to remove redundant data in desc,
    then use step 1) and step 3) in my previous post.
    In the reference thread in step 3),
    I showed also techniques to remove redundant data from each columns(basic_skill2 and basic_skill1 in the examples).
    Last edited by tonkuma; 07-19-11 at 06:00. Reason: Change "a technique" to "techniques"

  4. #4
    Join Date
    Jun 2007
    Location
    germany
    Posts
    155
    select col1||col2||col3||desc as concatenated_columns
    from ?
    group by col1||col2||col3||desc
    Dick Brenholtz, Ami in Deutschland

  5. #5
    Join Date
    Jul 2011
    Posts
    9

    Removing duplicate values while concatenating 4 columns

    Thanks for the response tonkuma and Dick Brenholtz

    I have edited the requirement with example, will try the solutions u have suggested in the mean while..


    Thanks,
    KA

  6. #6
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    select col1||col2||col3||desc as concatenated_columns
    from ?
    group by col1||col2||col3||desc
    There may be
    1) same data in differnt columns in a row.
    2) same data in different row of same column or different columns.

    For example:
    With this sample data.
    Code:
    ID          COL1 COL2 COL3 DESC 
    ----------- ---- ---- ---- -----
              1 abc  def  abc  xxxxx
              1 ghi  jkl  mno  yyy  
              1 mno  abc  jkl  xxxxx
              2 pqr  stu  vwx  xxxxx
              2 pqr  vwx  def  zzzzz
    your query's result duplicates 'abc', 'mno' etc.
    Code:
    select col1||col2||col3||desc as concatenated_columns
    from sample_data
    group by col1||col2||col3||desc
    ;
    ------------------------------------------------------------------------------
    
    CONCATENATED_COLUMNS
    --------------------
    abcdefabcxxxxx      
    ghijklmnoyyy        
    mnoabcjklxxxxx      
    pqrstuvwxxxxxx      
    pqrvwxdefzzzzz      
    
      5 record(s) selected.

  7. #7
    Join Date
    Jun 2007
    Location
    germany
    Posts
    155
    yeah, the permutations will exist
    but, the TS said, and I qoute:
    unique string
    'abc' and 'cba' are unique strings.
    the GROUP BY satisfies that requirement.

    but if order is not important,
    then yes your examples are the solution,
    since the TS obviously does not want to clean-up his table.
    Dick Brenholtz, Ami in Deutschland

  8. #8
    Join Date
    Jul 2011
    Posts
    9
    I am i missing anything..
    since the query is still showing me the duplicate results in the same column.

    here i am trying with 2 columns..


    col1 col2
    ABC ABC123
    BCD BCD 1234

    SELECT COL1|| ' '||COL2 from table
    group by COL1|| ' '||COL2

    Result
    ABC ABC123
    BCD BCD 1234

  9. #9
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    COL1 COL2 COL3 DESC
    MAZDA XYZ COMM MAZDA XYZ AWD COMM WITH UPGRADE
    BMW 3.25 9iJKpon BMW 325i REG 9iJKpon
    Where are the boundaries of each columns?

    If you want to remove redundant words(and not redundant column values),
    I think you must separate words in each column(or each strings of concatenated columns).

    This needs additional technique.

  10. #10
    Join Date
    Jul 2011
    Posts
    9
    I don't have control on the data that is coming from legacy systems..
    But once in my staging i can manipulate the data..

  11. #11
    Join Date
    Jul 2011
    Posts
    9
    Below is the sample data..

    col1
    MAZDA
    BMW

    ****
    col2
    XYZ
    3.25

    ********
    col3
    COMM*
    REG 9iJKpon

    *******
    desc
    MAZDA XYZ AWD COMM WITH UPGRADE
    BMW 325i 9iJKpon
    Last edited by karv; 07-19-11 at 07:34.

  12. #12
    Join Date
    Jun 2007
    Location
    germany
    Posts
    155
    karv,

    you can not use the concatenation solution that I provided.
    and I don't think anything will remove one of these: '3.25' '325i'
    Dick Brenholtz, Ami in Deutschland

  13. #13
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    karv,

    please provide enough and complete sample data,
    if you want to receive significant and/or usefull responses.

    Your provided data were vague and not complete for me.
    For example:
    1) How to make this intermediate data from bare data?
    (How to know each of MAZDAs make one concatenated string and BMWs make another concatenated string?
    (Do you want to make a string from rows with same values of COL1?)
    COL1 COL2 COL3 DESC
    MAZDA XYZ COMM MAZDA XYZ AWD COMM WITH UPGRADE
    BMW 3.25 9iJKpon BMW 325i REG 9iJKpon
    or, concatenate columns in a row only and your data were these kind?
    row 1
    COL1: MAZDA XYZ COMM
    COL2: MAZDA XYZ AWD COMM WITH UPGRADE

    row 2
    COL1: BMW 3.25 9iJKpon
    COL2: BMW 325i REG 9iJKpon
    2) did some words in desc repeat data in col1, col2, col3?
    col1
    MAZDA
    BMW

    ****
    col2
    XYZ
    3.25

    ********
    col3
    COMM*
    REG 9iJKpon

    *******
    desc
    MAZDA XYZ AWD COMM WITH UPGRADE
    BMW 325i 9iJKpon
    3) If sequence of words were significant, how to get this result
    BMW 3.25 325I REG 9iJKpon
    from this data?
    BMW 3.25 9iJKpon BMW 325i REG 9iJKpon
    If I removed second and subsequent repeated words from left to right,
    the result may be
    BMW 3.25 9iJKpon 325i REG
    Last edited by tonkuma; 07-19-11 at 09:13.

  14. #14
    Join Date
    Jul 2011
    Posts
    9
    Hello tonkuma,
    I have attached a sample set of data from the Source.
    Hope this helps..
    Please let me know if you need more information.

    Thanks to all the members who have helped so far!!


    --KARV
    Attached Files Attached Files
    Last edited by karv; 07-19-11 at 09:25.

  15. #15
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    I couldn't see the data in your file.
    My unzip program recovered your file to Book1.xlsx.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •