If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Flag the data based on duplicate?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-12-11, 02:31
laknar laknar is offline
Registered User
 
Join Date: Jul 2008
Posts: 80
Flag the data based on duplicate?

i have duplicate data in table. where i need to flag

if Col1 is duplicate and Col2 is 20 or 30 then Flag 'N'

else if Col1 is duplicate and Col2 is 20 or 30 or 40 then Flag 'Y'

Col1,Col2
10,20
10,30
20,20
20,30
20,40

Expected Output

Col1,Col2,Col3

10,20,N
10,30,N
20,20,Y
20,30,Y
20,40,Y
Reply With Quote
  #2 (permalink)  
Old 08-12-11, 02:37
przytula_guy przytula_guy is offline
Registered User
 
Join Date: Apr 2006
Location: Belgium
Posts: 1,159
have you already tried some sql or waiting for the others to find a solution for you ??
have a look at group by - having - count...
IBM DB2 9.5 Information Center for Linux, UNIX, and Windows
__________________
Best Regards, Guy Przytula
Database Software Consultant
DB2 UDB LUW Certified V7-V8-V9-V9.7 DB Admin - Dprop..
Information Server Datastage Certified
http://www.infocura.be
Reply With Quote
  #3 (permalink)  
Old 08-12-11, 02:52
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Quote:
if Col1 is duplicate and Col2 is 20 or 30 then Flag 'N'

else if Col1 is duplicate and Col2 is 20 or 30 or 40 then Flag 'Y'
Your requirements also duplicated.

And you didn't wrote what to do
if NOT(Col1 is duplicate and Col2 is 20 or 30 or 40)
in other words...
if (Col1 is not duplicate) or (Col2 is other than 20 or 30 or 40) then do what?

Last edited by tonkuma; 08-12-11 at 02:55.
Reply With Quote
  #4 (permalink)  
Old 08-12-11, 03:58
laknar laknar is offline
Registered User
 
Join Date: Jul 2008
Posts: 80
Col1,Col2
10,20
10,30
20,20
20,30
20,40
01,20
02,30
03,40
03,50

Expected Output

Col1,Col2,Col3

10,20,N
10,30,N
20,20,Y
20,30,Y
20,40,Y
01,20,N
02,30,N
03,40,Y
03,50,Y


Code:
if Col1 is duplicate and Col2 is 20 or 30 then Flag 'N' 
else if Not(Col1 is duplicate) and Col2 is 20  then Flag 'N'
else if Not(Col1 is duplicate) and Col2 is 30  then Flag 'N'
else Col1 is duplicate and Not(Col2 is 20 or 30) then Flag 'Y'
Reply With Quote
  #5 (permalink)  
Old 08-12-11, 04:20
przytula_guy przytula_guy is offline
Registered User
 
Join Date: Apr 2006
Location: Belgium
Posts: 1,159
have you already tried some sql or waiting for the others to find a solution for you ??
I myself I don't like "spoonfeeding" requests...
__________________
Best Regards, Guy Przytula
Database Software Consultant
DB2 UDB LUW Certified V7-V8-V9-V9.7 DB Admin - Dprop..
Information Server Datastage Certified
http://www.infocura.be
Reply With Quote
  #6 (permalink)  
Old 08-12-11, 07:04
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Quote:
Code:
if Col1 is duplicate and Col2 is 20 or 30 then Flag 'N' 
else if Not(Col1 is duplicate) and Col2 is 20  then Flag 'N'
else if Not(Col1 is duplicate) and Col2 is 30  then Flag 'N'
else Col1 is duplicate and Not(Col2 is 20 or 30) then Flag 'Y'
How to flag following cases?
'Y' or 'N' or other value or null?

col1, col2, flag
111, 25, ?
121, 20, ?
121, 27, ?
122, 30, ?
122, 34, ?
141, 20, ?
141, 21, ?
141, 22, ?
141, 30, ?

col1 is not duplicate and col2 is not 20 nor 30.
col1 is duplicate(2 rows only) and col2 of one row is 20 and col2 of another row is not 20 nor 30.
col1 is duplicate(2 rows only) and col2 of one row is 30 and col2 of another row is not 20 nor 30.

Last edited by tonkuma; 08-12-11 at 07:32. Reason: Add rows col1=141
Reply With Quote
  #7 (permalink)  
Old 08-12-11, 09:47
laknar laknar is offline
Registered User
 
Join Date: Jul 2008
Posts: 80
col1, col2, flag
111, 25, Y
121, 20, Y
121, 27, Y
122, 30, Y
122, 34, Y
141, 20, Y
141, 21, Y
141, 22, Y
141, 30, Y

The requirement here is we have to check only for 20 or 30 and col1 is either duplicate or unique then we have to flag as 'N'.

142,20,N
143,30,N
144,20,N
144,30,N
145,20,Y
145,30,Y
145,40,Y
146,null,Y
null,null,Y

Last edited by laknar; 08-12-11 at 09:51.
Reply With Quote
  #8 (permalink)  
Old 08-12-11, 11:13
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Quote:
The requirement here is we have to check only for 20 or 30 and col1 is either duplicate or unique then we have to flag as 'N'.
I thought this was not enough, because the following case was unclear.
(Note: it is not easy to know rows with a condition are not exists.)
if col2 of a row is 20 or 30 and col1 is duplicate and there is another row with same col1 and col2 is not 20 nor 30.

My understandings of your requirements from examples are...
Code:
Conditions of flag 'N'
1) col1 is not duplicate and col2 is 20 or 30
2) col1 is duplicate two rows and one of col2 is 20 and another col2 is 30
in other words:
 all col2 with same col1 are 20 or 30. 

All other rows flag 'Y'
Reply With Quote
  #9 (permalink)  
Old 08-12-11, 11:19
laknar laknar is offline
Registered User
 
Join Date: Jul 2008
Posts: 80
yes, exactly.

Quote:
Conditions of flag 'N'
1) col1 is not duplicate and col2 is 20 or 30
2) col1 is duplicate two rows and one of col2 is 20 and another col2 is 30
in other words:
all col2 with same col1 are 20 or 30.

All other rows flag 'Y'
i have tried and still working on several queries with Analytical functions like (Row_Number()) and Aggregate functions.


any help is greatly appreciated.

thanks.
Reply With Quote
  #10 (permalink)  
Old 08-12-11, 11:35
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
One way may be...
Code:
Conditions of flag 'N'
1) col1 is not duplicate and col2 is 20 or 30
    COUNT(*) OVER(PARTITION BY col1) = 1
AND col2 IN (20 , 30)

2) col1 is duplicate two rows and one of col2 is 20 and another col2 is 30
    COUNT(*) OVER(PARTITION BY col1) = 2
AND MIN(col2) OVER(...) ...
AND MAX(col2) ... ...
Reply With Quote
  #11 (permalink)  
Old 08-12-11, 12:36
laknar laknar is offline
Registered User
 
Join Date: Jul 2008
Posts: 80
Thank you tonkuma for you hint.

Code:
create table testt(col1 number,col2 varchar(5));

insert into testt values(10,'20');
insert into testt values(10,'30');
insert into testt values(20,'20');
insert into testt values(30,'30');
insert into testt values(40,'20');
insert into testt values(40,'30');
insert into testt values(40,'40');
insert into testt values(50,'40');


select col1,col2,case when (count(*) Over(partition by col1))=1 and (col2='20' or col2='30') Then 'N' 
when (count(*) Over(partition by col1))=2 and ((Min(col2) Over(partition by col1))='20' or (Max(col2) Over(partition by col1))='30') Then 'N'
else 'Y' end as FLAG  from testt
I have developed this query. is this correct?
Reply With Quote
  #12 (permalink)  
Old 08-12-11, 12:57
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
I think
... ((Min(col2) Over(partition by col1))='20' AND (Max(col2) Over(partition by col1))='30') ...

For example:
121, '20', Y
121, '27', Y
Reply With Quote
  #13 (permalink)  
Old 08-12-11, 19:29
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Example of some other variations:
Code:
     , CASE
       WHEN COUNT(*)  OVER(PARTITION BY col1) <= 2
        AND MIN(col2) OVER(PARTITION BY col1) IN ('20' , '30')
        AND MAX(col2) OVER(PARTITION BY col1) IN ('20' , '30')
       THEN 'N'
       ELSE 'Y'
       END  AS flag1

     , CASE
       WHEN VARCHAR( COUNT(*) OVER(PARTITION BY col1) )
         || MIN(col2) OVER(PARTITION BY col1)
         || MAX(col2) OVER(PARTITION BY col1)
            IN ('12020' , '13030' , '22030')
       THEN 'N'
       ELSE 'Y'
       END  AS flag2

     , CASE
       WHEN SUM( CASE
                 WHEN col2 IN ('20' , '30')
                 THEN 0
                 ELSE 1
                 END
               ) OVER(PARTITION BY col1) = 0
       THEN 'N'
       ELSE 'Y'
       END  AS flag3

     , CASE
       WHEN '20:30:'
            LIKE '%' || VARCHAR( LISTAGG( col2 || ':%' )
                                 OVER( PARTITION BY col1 ORDER BY col2
                                       ROWS BETWEEN UNBOUNDED PRECEDING
                                                AND UNBOUNDED FOLLOWING )
                        , 3999 )
       THEN 'N'
       ELSE 'Y'
       END  AS flag4
Reply With Quote
  #14 (permalink)  
Old 08-12-11, 23:14
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
More examples:
Code:
     , SUBSTR(
          'NY'
        , SIGN( SUM( CASE
                     WHEN col2 IN ('20' , '30')
                     THEN 0
                     ELSE 1
                     END
                   ) OVER(PARTITION BY col1)
              ) + 1
        , 1
       )  AS flag5

     , CHR( 78 + SIGN( SUM( COALESCE( 1 - LOCATE('20' , col2)
                                        - LOCATE('30' , col2)
                                    , 1
                                    )
                          ) OVER(PARTITION BY col1)
                     ) * 11
          ) AS flag6

     , CHR( 78 + SIGN(  COUNT(*) OVER(PARTITION BY col1)
                      - COUNT( NULLIF(  LOCATE('20' , col2)
                                      + LOCATE('30' , col2)
                                      , 0
                                     )
                             ) OVER(PARTITION BY col1)
                     ) * 11
          ) AS flag7
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On