Results 1 to 12 of 12
  1. #1
    Join Date
    Jul 2003
    Posts
    34

    Unanswered: Eliminate duplicate records(range) from the query

    I am trying to elimate duplicate records from my table and identiy the key records which are unique. My comparision is many to many.

    In the below example,key1 (X00003) values all match with Key2(X00004) so they are duplicate records so I need only one key, but key3 (X00005) doesn't match with X000003 or X00004 so I need that in the output.

    TABLE1
    key VALUE
    X00003 00344
    X00003 00345
    X00003 00346
    X00003 00241
    X00003 00334

    X00004 00344
    X00004 00345
    X00004 00346
    X00004 00241
    X00004 00334

    X00005 00344
    X00005 00345
    X00005 00346
    X00005 00241
    X00005 00334
    X00005 00400


    Ouput

    X00004
    X00005

    Can we acheive this using DB2 SQL? Any help?

    Thans,
    Gopu

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    key1 (X00003) values all match with Key2(X00004) so they are duplicate records so I need only one key
    Why did you took X00004, and not X00003?
    I can say "Key_a(X00004) values all match with key_b(X00003)" (so, remove X00004 from the results).

  3. #3
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Some techniques used in Exact Relational Division(or Relational Division without Remainder) may be useful for this issue.

    Please search in this forum with those keywords.

  4. #4
    Join Date
    Jul 2003
    Posts
    34

    Eliminate duplicate records(range) from the query

    @tonkuma, thank you for your response.

    X00003 or X00004 should be fine for the output. I don't mind getting either or the keys as they both are the same.

  5. #5
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Please try this.

    Example 1:
    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH
     table1
    ( key , value ) AS (
    VALUES
      ( 'X00003' , '00344' )
    , ( 'X00003' , '00345' )
    , ( 'X00003' , '00346' )
    , ( 'X00003' , '00241' )
    , ( 'X00003' , '00334' )
    , ( 'X00004' , '00344' )
    , ( 'X00004' , '00345' )
    , ( 'X00004' , '00346' )
    , ( 'X00004' , '00241' )
    , ( 'X00004' , '00334' )
    , ( 'X00005' , '00344' )
    , ( 'X00005' , '00345' )
    , ( 'X00005' , '00346' )
    , ( 'X00005' , '00241' )
    , ( 'X00005' , '00334' )
    , ( 'X00005' , '00400' )
    )
    SELECT MAX(key) AS result
     FROM  (SELECT key
                 , LISTAGG(value , ',') WITHIN GROUP(ORDER BY value) AS values
             FROM  table1
             GROUP BY
                   key
           )
     GROUP BY
           values
    ;
    ------------------------------------------------------------------------------
    
    RESULT
    ------
    X00004
    X00005
    
      2 record(s) selected.

  6. #6
    Join Date
    Jul 2003
    Posts
    34
    tonkuma,
    I am trying your query in DB2 and getting error around WITHIN GROUP. We have DB2 version 10 for Z/OS. Will it work in DB2 Z/OS.

    Also , I changed the query to retrieve the data from the table instead of hardcoding.

    WITH X AS
    ( SELECT KEY1, CAT_NO
    FROM R1.SPCG
    WHERE KEY1 IN (
    SELECT KEY1 FROM R2.CONTR_SPCG)
    )
    SELECT MAX(KEY1) AS result
    FROM (SELECT KEY1
    , LISTAGG(CAT_NO , ',')
    WITHIN GROUP(ORDER BY CAT_NO) AS values
    FROM X
    GROUP BY KEY1
    )
    GROUP BY VALUES
    SQL error at or before GROUP
    (line 10, position 21).

    Explanation:
    This error occurred because QMF encountered invalid SQL syntax in the
    area of the query referenced in the message above. The statement at this
    location is:

    WITHIN GROUP(ORDER BY SPCG_NO) AS VALUES

  7. #7
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Quote Originally Posted by gopurs View Post
    ...
    We have DB2 version 10 for Z/OS. Will it work in DB2 Z/OS.

    ...
    So, please use XMLAGG insead of LISTAGG.
    More concretely, nest of funcions like XMLSERIALIZE( XMLAGG( XMLTEXT(...) ... ) ... ) may be necessary.

    Please see manuals for exact syntax:
    DB2 10 - DB2 SQL - XMLAGG

    DB2 10 - DB2 SQL - XMLSERIALIZE

    DB2 10 - DB2 SQL - XMLTEXT
    Last edited by tonkuma; 04-27-14 at 01:26.

  8. #8
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Example 2: Find duplicate keys.
    Code:
    WITH
     with_rn_cnt AS (
    SELECT key , value
         , ROW_NUMBER()
              OVER( PARTITION BY key
                        ORDER BY value ) AS rn
         , COUNT(*)
              OVER( PARTITION BY key   ) AS cnt
     FROM  table1
    )
    /*
    SELECT DISTINCT
           key
     FROM  with_rn_cnt /*table1*/
    EXCEPT /*ALL*/
    */
    SELECT a.key
         , b.key
     FROM  with_rn_cnt AS a
     INNER JOIN
           with_rn_cnt AS b
      ON   b.key > a.key
      AND  b.cnt = a.cnt
      AND  b.rn  = a.rn
     GROUP BY
           a.key
         , b.key
     HAVING
           COUNT(CASE a.value WHEN b.value THEN 0 END)
         = MAX(a.cnt)
    ;

    Example 3: Eliminate duplicate keys.
    Code:
    WITH
     with_rn_cnt AS (
    SELECT key , value
         , ROW_NUMBER()
              OVER( PARTITION BY key
                        ORDER BY value ) AS rn
         , COUNT(*)
              OVER( PARTITION BY key   ) AS cnt
     FROM  table1
    )
    SELECT DISTINCT
           key
     FROM  with_rn_cnt /*table1*/
    EXCEPT /*ALL*/
    SELECT a.key
    /*
         , b.key
    */
     FROM  with_rn_cnt AS a
     INNER JOIN
           with_rn_cnt AS b
      ON   b.key > a.key
      AND  b.cnt = a.cnt
      AND  b.rn  = a.rn
     GROUP BY
           a.key
         , b.key
     HAVING
           COUNT(CASE a.value WHEN b.value THEN 0 END)
         = MAX(a.cnt)
    ;

  9. #9
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    I tested Example 2 and 3 on DB2 9.7.5 for Windows with additional test data, like...

    Test Data:
    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH
     table1
    ( key , value ) AS (
    VALUES
    /* same set of values with X00004 */
      ( 'X00003' , '00344' )
    , ( 'X00003' , '00345' )
    , ( 'X00003' , '00346' )
    , ( 'X00003' , '00241' )
    , ( 'X00003' , '00334' )
    /* same set of values with X00003 */
    , ( 'X00004' , '00344' )
    , ( 'X00004' , '00345' )
    , ( 'X00004' , '00346' )
    , ( 'X00004' , '00241' )
    , ( 'X00004' , '00334' )
    /* extra 1 value than X00003 and X00004 */
    , ( 'X00005' , '00344' )
    , ( 'X00005' , '00345' )
    , ( 'X00005' , '00346' )
    , ( 'X00005' , '00241' )
    , ( 'X00005' , '00334' )
    , ( 'X00005' , '00400' )
    /* less 1 value than X00005 */
    /* less 1 value and extra 1 value than X00003 and X00004 */
    , ( 'X00006' , '00344' )
    , ( 'X00006' , '00345' )
    , ( 'X00006' , '00241' )
    , ( 'X00006' , '00334' )
    , ( 'X00006' , '00400' )
    /* extra 1 value than X00003 and X00004 */
    /* less 1 value and extra 1 value than X00005 */
    , ( 'X00007' , '00344' )
    , ( 'X00007' , '00345' )
    , ( 'X00007' , '00346' )
    , ( 'X00007' , '00241' )
    , ( 'X00007' , '00334' )
    , ( 'X00007' , '00500' )
    /* subset of(less 1 value than) X00003 and X00004 */
    , ( 'X00008' , '00344' )
    , ( 'X00008' , '00345' )
    , ( 'X00008' , '00241' )
    , ( 'X00008' , '00334' )
    /* same set of values with X00003 and X00004 */
    , ( 'X00002' , '00344' )
    , ( 'X00002' , '00345' )
    , ( 'X00002' , '00346' )
    , ( 'X00002' , '00241' )
    , ( 'X00002' , '00334' )
    )

    Example 2: Find duplicate keys.
    Code:
    WITH
     with_rn_cnt AS (
    SELECT key , value
         , ROW_NUMBER()
              OVER( PARTITION BY key
                        ORDER BY value ) AS rn
         , COUNT(*)
              OVER( PARTITION BY key   ) AS cnt
     FROM  table1
    )
    /*
    SELECT DISTINCT
           key
     FROM  with_rn_cnt /*table1*/
    EXCEPT /*ALL*/
    */
    SELECT a.key
         , b.key
     FROM  with_rn_cnt AS a
     INNER JOIN
           with_rn_cnt AS b
      ON   b.key > a.key
      AND  b.cnt = a.cnt
      AND  b.rn  = a.rn
     GROUP BY
           a.key
         , b.key
     HAVING
           COUNT(CASE a.value WHEN b.value THEN 0 END)
         = MAX(a.cnt)
    ;

    Result:
    Code:
    ------------------------------------------------------------------------------
    
    KEY    KEY   
    ------ ------
    X00002 X00003
    X00002 X00004
    X00003 X00004
    
      3 record(s) selected.

    Example 3: Eliminate duplicate keys.
    Code:
    WITH
     with_rn_cnt AS (
    SELECT key , value
         , ROW_NUMBER()
              OVER( PARTITION BY key
                        ORDER BY value ) AS rn
         , COUNT(*)
              OVER( PARTITION BY key   ) AS cnt
     FROM  table1
    )
    SELECT DISTINCT
           key
     FROM  with_rn_cnt /*table1*/
    EXCEPT /*ALL*/
    SELECT a.key
    /*
         , b.key
    */
     FROM  with_rn_cnt AS a
     INNER JOIN
           with_rn_cnt AS b
      ON   b.key > a.key
      AND  b.cnt = a.cnt
      AND  b.rn  = a.rn
     GROUP BY
           a.key
         , b.key
     HAVING
           COUNT(CASE a.value WHEN b.value THEN 0 END)
         = MAX(a.cnt)
    ;

    Result:
    Code:
    ------------------------------------------------------------------------------
    
    KEY   
    ------
    X00004
    X00005
    X00006
    X00007
    X00008
    
      5 record(s) selected.

  10. #10
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    I thought that Example 2 and 3 were vorbose.
    Here are some more compact examples.

    Example 4: Find duplicate keys.
    Code:
    WITH
     with_cnt AS (
    SELECT key , value
         , COUNT(*)
              OVER( PARTITION BY key   ) AS cnt
     FROM  table1
    )
    /*
    SELECT DISTINCT
           key
     FROM  with_cnt /*table1*/
    EXCEPT
    */
    SELECT a.key
         , b.key
     FROM  with_cnt AS a
     INNER JOIN
           with_cnt AS b
      ON   b.key   > a.key
      AND  b.value = a.value 
      AND  b.cnt   = a.cnt
     GROUP BY
           a.key
         , b.key
     HAVING
           COUNT(*) = MAX(a.cnt)
    ;

    Example 5: Eliminate duplicate keys.
    Code:
    WITH
     with_cnt AS (
    SELECT key , value
         , COUNT(*)
              OVER( PARTITION BY key   ) AS cnt
     FROM  table1
    )
    SELECT DISTINCT
           key
     FROM  with_cnt /*table1*/
    EXCEPT
    SELECT a.key
    /*
         , b.key
    */
     FROM  with_cnt AS a
     INNER JOIN
           with_cnt AS b
      ON   b.key   > a.key
      AND  b.value = a.value
      AND  b.cnt   = a.cnt
     GROUP BY
           a.key
         , b.key
     HAVING
           COUNT(*) = MAX(a.cnt)
    ;

  11. #11
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    This may be another solution.

    Example 6a: Eliminate duplicate keys.
    Code:
    WITH
     with_cnt AS (
    SELECT key , value
         , COUNT(*)
              OVER( PARTITION BY key   ) AS cnt
     FROM  table1
    )
    SELECT a_key
     FROM  (SELECT a.key AS a_key
                 , b.key AS b_key
                 , COUNT(*)    AS cnt_all
                 , MAX(a.cnt)  AS cnt
             FROM  with_cnt AS a
             INNER JOIN
                   with_cnt AS b
              ON   b.key   >= a.key
              AND  b.value =  a.value
              AND  b.cnt   =  a.cnt
             GROUP BY
                   a.key
                 , b.key
           )
     GROUP BY
           a_key
     HAVING
           COUNT( CASE cnt_all
                  WHEN cnt     THEN 0
                  END
                ) = 1
    ;

    Example 6b: Eliminate duplicate keys.
    Note: Replace HAVING clause in Example 6a by the following.
    Code:
     HAVING
           MAX( CASE
                WHEN a_key < b_key THEN
                     cnt_all
                ELSE 0
                END
              ) < MAX(cnt)
    ;
    Last edited by tonkuma; 04-30-14 at 10:00. Reason: Simplified HAVING clause in Example 6a.

  12. #12
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    If upper limit of number of different value(s) in the table was known and small(say 63),
    this might be possible.
    Note: The result of a RANK, DENSE_RANK, or ROW_NUMBER specification is BIGINT.

    Example 7:
    Code:
    SELECT key
     FROM  (SELECT key
                 , ROW_NUMBER()
                      OVER( PARTITION BY SUM( POWER(2 , d_rank - 1) )
                                ORDER BY key DESC
                          ) AS rn
             FROM  (SELECT key
                         , DENSE_RANK()
                              OVER( ORDER BY value ) AS d_rank
                     FROM  table1
                   )
             GROUP BY
                   key
           )
     WHERE rn = 1
     ORDER BY
           key
    ;
    Last edited by tonkuma; 05-18-14 at 17:17. Reason: Replace "upper limit of number of value(s) for a key" with "upper limit of number of different value(s) in the table"

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •