If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Removing duplicate values while concatenating 4 columns

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 07-19-11, 02:46
karv karv is offline
Registered User
 
Join Date: Jul 2011
Posts: 9
Removing duplicate values while concatenating 4 columns

Hello,

Could any one please help in trying to find a solution for the problem below.
There are the columns COL1,COL2,COL3,DESC
After concatenating all the fields below, i should be able to avoid redundant data or remove duplicate data that is concatenated from COL1,COL2,COL3,DESC.
The whole idea is to get a unique string that should contain the data from all the columns after concatenation
COL1||COL2||COL3||DESC

COL1 COL2 COL3 DESC
MAZDA XYZ COMM MAZDA XYZ AWD COMM WITH UPGRADE
BMW 3.25 9iJKpon BMW 325i REG 9iJKpon


What i am looking for is a single string containing the unique values after concatenating COL1||COL2||COL3||DESC
MAZDA XYZ AWD COMM WITH UPGRADE
BMW 3.25 325I REG 9iJKpon

Thanks for ur help in advance..

Regards,
--KA

Last edited by karv; 07-19-11 at 05:41.
Reply With Quote
  #2 (permalink)  
Old 07-19-11, 04:06
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
If I didn't misunderstand your requirements,
here is a hint showing steps with reference documents to solve the problem.

1) UNPIVOT
unpivot col1, col2 and col3 to col_x
http://sirdug.org/downloads/SQLonFire_1_SirDUG.pdf

2) SELECT DISTINCT col_x ...
select-clause in
subselect - IBM DB2 9.7 for Linux, UNIX, and Windows


3) concatenate strings in a column of multiple rows
LISTAGG function.
Combine values from different records using comma

Last edited by tonkuma; 07-19-11 at 04:49. Reason: Add reference document to 2).
Reply With Quote
  #3 (permalink)  
Old 07-19-11, 04:48
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
If you want to remove redundant data in col1, col2, col3
and to remove redundant data in desc,
then use step 1) and step 3) in my previous post.
In the reference thread in step 3),
I showed also techniques to remove redundant data from each columns(basic_skill2 and basic_skill1 in the examples).

Last edited by tonkuma; 07-19-11 at 05:00. Reason: Change "a technique" to "techniques"
Reply With Quote
  #4 (permalink)  
Old 07-19-11, 05:30
dbzTHEdinosaur dbzTHEdinosaur is offline
Registered User
 
Join Date: Jun 2007
Location: germany
Posts: 96
select col1||col2||col3||desc as concatenated_columns
from ?
group by col1||col2||col3||desc
__________________
Dick Brenholtz, Ami in Deutschland
Reply With Quote
  #5 (permalink)  
Old 07-19-11, 05:46
karv karv is offline
Registered User
 
Join Date: Jul 2011
Posts: 9
Removing duplicate values while concatenating 4 columns

Thanks for the response tonkuma and Dick Brenholtz

I have edited the requirement with example, will try the solutions u have suggested in the mean while..


Thanks,
KA
Reply With Quote
  #6 (permalink)  
Old 07-19-11, 05:46
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Quote:
select col1||col2||col3||desc as concatenated_columns
from ?
group by col1||col2||col3||desc
There may be
1) same data in differnt columns in a row.
2) same data in different row of same column or different columns.

For example:
With this sample data.
Code:
ID          COL1 COL2 COL3 DESC 
----------- ---- ---- ---- -----
          1 abc  def  abc  xxxxx
          1 ghi  jkl  mno  yyy  
          1 mno  abc  jkl  xxxxx
          2 pqr  stu  vwx  xxxxx
          2 pqr  vwx  def  zzzzz
your query's result duplicates 'abc', 'mno' etc.
Code:
select col1||col2||col3||desc as concatenated_columns
from sample_data
group by col1||col2||col3||desc
;
------------------------------------------------------------------------------

CONCATENATED_COLUMNS
--------------------
abcdefabcxxxxx      
ghijklmnoyyy        
mnoabcjklxxxxx      
pqrstuvwxxxxxx      
pqrvwxdefzzzzz      

  5 record(s) selected.
Reply With Quote
  #7 (permalink)  
Old 07-19-11, 06:17
dbzTHEdinosaur dbzTHEdinosaur is offline
Registered User
 
Join Date: Jun 2007
Location: germany
Posts: 96
yeah, the permutations will exist
but, the TS said, and I qoute:
Quote:
unique string
'abc' and 'cba' are unique strings.
the GROUP BY satisfies that requirement.

but if order is not important,
then yes your examples are the solution,
since the TS obviously does not want to clean-up his table.
__________________
Dick Brenholtz, Ami in Deutschland
Reply With Quote
  #8 (permalink)  
Old 07-19-11, 06:18
karv karv is offline
Registered User
 
Join Date: Jul 2011
Posts: 9
I am i missing anything..
since the query is still showing me the duplicate results in the same column.

here i am trying with 2 columns..


col1 col2
ABC ABC123
BCD BCD 1234

SELECT COL1|| ' '||COL2 from table
group by COL1|| ' '||COL2

Result
ABC ABC123
BCD BCD 1234
Reply With Quote
  #9 (permalink)  
Old 07-19-11, 06:19
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Quote:
COL1 COL2 COL3 DESC
MAZDA XYZ COMM MAZDA XYZ AWD COMM WITH UPGRADE
BMW 3.25 9iJKpon BMW 325i REG 9iJKpon
Where are the boundaries of each columns?

If you want to remove redundant words(and not redundant column values),
I think you must separate words in each column(or each strings of concatenated columns).

This needs additional technique.
Reply With Quote
  #10 (permalink)  
Old 07-19-11, 06:21
karv karv is offline
Registered User
 
Join Date: Jul 2011
Posts: 9
I don't have control on the data that is coming from legacy systems..
But once in my staging i can manipulate the data..
Reply With Quote
  #11 (permalink)  
Old 07-19-11, 06:30
karv karv is offline
Registered User
 
Join Date: Jul 2011
Posts: 9
Below is the sample data..

col1
MAZDA
BMW

****
col2
XYZ
3.25

********
col3
COMM*
REG 9iJKpon

*******
desc
MAZDA XYZ AWD COMM WITH UPGRADE
BMW 325i 9iJKpon

Last edited by karv; 07-19-11 at 06:34.
Reply With Quote
  #12 (permalink)  
Old 07-19-11, 06:56
dbzTHEdinosaur dbzTHEdinosaur is offline
Registered User
 
Join Date: Jun 2007
Location: germany
Posts: 96
karv,

you can not use the concatenation solution that I provided.
and I don't think anything will remove one of these: '3.25' '325i'
__________________
Dick Brenholtz, Ami in Deutschland
Reply With Quote
  #13 (permalink)  
Old 07-19-11, 08:03
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
karv,

please provide enough and complete sample data,
if you want to receive significant and/or usefull responses.

Your provided data were vague and not complete for me.
For example:
1) How to make this intermediate data from bare data?
(How to know each of MAZDAs make one concatenated string and BMWs make another concatenated string?
(Do you want to make a string from rows with same values of COL1?)
Quote:
COL1 COL2 COL3 DESC
MAZDA XYZ COMM MAZDA XYZ AWD COMM WITH UPGRADE
BMW 3.25 9iJKpon BMW 325i REG 9iJKpon
or, concatenate columns in a row only and your data were these kind?
Quote:
row 1
COL1: MAZDA XYZ COMM
COL2: MAZDA XYZ AWD COMM WITH UPGRADE

row 2
COL1: BMW 3.25 9iJKpon
COL2: BMW 325i REG 9iJKpon
2) did some words in desc repeat data in col1, col2, col3?
Quote:
col1
MAZDA
BMW

****
col2
XYZ
3.25

********
col3
COMM*
REG 9iJKpon

*******
desc
MAZDA XYZ AWD COMM WITH UPGRADE
BMW 325i 9iJKpon
3) If sequence of words were significant, how to get this result
Quote:
BMW 3.25 325I REG 9iJKpon
from this data?
Quote:
BMW 3.25 9iJKpon BMW 325i REG 9iJKpon
If I removed second and subsequent repeated words from left to right,
the result may be
BMW 3.25 9iJKpon 325i REG

Last edited by tonkuma; 07-19-11 at 08:13.
Reply With Quote
  #14 (permalink)  
Old 07-19-11, 08:21
karv karv is offline
Registered User
 
Join Date: Jul 2011
Posts: 9
Hello tonkuma,
I have attached a sample set of data from the Source.
Hope this helps..
Please let me know if you need more information.

Thanks to all the members who have helped so far!!


--KARV
Attached Files
File Type: zip Book1.zip (7.7 KB, 8 views)

Last edited by karv; 07-19-11 at 08:25.
Reply With Quote
  #15 (permalink)  
Old 07-19-11, 09:05
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
I couldn't see the data in your file.
My unzip program recovered your file to Book1.xlsx.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On