Results 1 to 14 of 14
  1. #1
    Join Date
    Jul 2014
    Posts
    6

    Question Unanswered: Case-Sensitive Search for Duplicate Entries

    Hello,
    I'm facing a unique issue with a DB2 database query. I have a table called ITEMS within which exists a column that contains the names for a set of items. Some of these items are duplicates of each other (with matching casing) and some others are duplicates with cases not matching (e.g Chair, cHair, CHAIR, etc.).

    I need to Look for the items that match the second scenario, change their casing all consistent with only having their first letters capitalized (i.e. all be converted to Chair) AND get numbered (i.e. Chair 1, Chair 2, Chair 3, etc.).

    I have an example of such scenario below for your reference:

    Code:
       ID    |     NAME     |     IS_TYPE
    ---------------------------------------
    1        | chair        |      1
    2        | Chair        |      1
    3        | cHair        |      1
    4        | chAir        |      1
    5        | CHAIR        |      1
    6        | Table        |      1
    7        | TAble        |      1
    8        | TaBle        |      1
    9        | table        |      1
    10       | Auto_Info    |      1
    11       | Auto_Info    |      1
    12       | Auto_Info    |      1
    13       | Auto_Info    |      1
    14       | Auto_Info    |      1
    converted to:

    Code:
       ID    |     NAME     |     IS_TYPE
    ---------------------------------------
    1        | Chair 1      |      1
    2        | Chair 2      |      1
    3        | Chair 3      |      1
    4        | Chair 4      |      1
    5        | Chair 5      |      1
    6        | Table 1      |      1
    7        | Table 2      |      1
    8        | Table 3      |      1
    9        | Table 4      |      1
    10       | Auto_Info    |      1
    11       | Auto_Info    |      1
    12       | Auto_Info    |      1
    13       | Auto_Info    |      1
    14       | Auto_Info    |      1
    What I have so far:

    I'm very new to databases and I've only been able to search for duplicates within the column and number them:

    Code:
    SELECT DISTINCT ID
          ,CASE WHEN (count(*) OVER (PARTITION BY NAME)) > 1 THEN
              replace(NAME, NAME,  
                       NAME || ' ' || row_number() OVER (PARTITION BY NAME ORDER BY ID))
           ELSE NAME END AS NAME
    FROM   ITEMS;

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Check out the UPPER and LOWER scalar functions.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Jul 2014
    Posts
    6
    Marcus,
    Would you please elaborate on how those commands could be used. I am already familiar with the LOWER and UPPER functions and through about using a select for duplicates with LOWER and EXCEPT it from a selection with duplicates. This however will not work since Lower will change the casing of all the entries and therefore not work with an EXCEPT function.

  4. #4
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    If you are using Oracle and you already asked on Oracle forum,
    why are you asking same issue on DB2 forum?

    http://www.dbforums.com/oracle/17025...racle-11g.html

  5. #5
    Join Date
    Jul 2014
    Posts
    6
    Tonkum,
    I appreciate your investigative work on this question. The reason why I asked the same question in the Oracle forum is that I am trying to solve this problem in Oracle, DB2, and SQL Server, both for personal and work related reasons. As a result, I thought it would be appropriate (and proper) to ask my question for each database type in the appropriate place. Hence the Oracle was asked within the Oracle support forum and DB2 landed here, in the DB2 section of dbforums.com.

    Now if you are going to ask that why I didn't use that solution for DB2? the answer is that I already did. But couldn't get it to work and my lack of knowledge in this area didn't help with this matter either.

    I hope my response helps with your understanding of the legitimacy of my intentions on this forum.

  6. #6
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Here is an example on DB2.

    Test data:
    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH
     items
    ( id , name , is_type ) AS (
    VALUES
      (  1 , 'chair'     , 1 )
    , (  2 , 'Chair'     , 1 )
    , (  3 , 'cHair'     , 1 )
    , (  4 , 'chAir'     , 1 )
    , (  5 , 'CHAIR'     , 1 )
    , (  6 , 'Table'     , 1 )
    , (  7 , 'TAble'     , 1 )
    , (  8 , 'TaBle'     , 1 )
    , (  9 , 'table'     , 1 )
    , ( 10 , 'Auto_Info' , 1 )
    , ( 11 , 'Auto_Info' , 1 )
    , ( 12 , 'Auto_Info' , 1 )
    , ( 13 , 'Auto_Info' , 1 )
    , ( 14 , 'Auto_Info' , 1 )
    )
    SELECT * FROM items;
    ------------------------------------------------------------------------------
    
    ID          NAME      IS_TYPE    
    ----------- --------- -----------
              1 chair               1
              2 Chair               1
              3 cHair               1
              4 chAir               1
              5 CHAIR               1
              6 Table               1
              7 TAble               1
              8 TaBle               1
              9 table               1
             10 Auto_Info           1
             11 Auto_Info           1
             12 Auto_Info           1
             13 Auto_Info           1
             14 Auto_Info           1
    
      14 record(s) selected.

    Query Example 1:
    Code:
    SELECT id
         , CASE
           WHEN MAX(
                   DENSE_RANK() OVER( PARTITION BY LOWER(name)
                                          ORDER BY name        )
                ) OVER( PARTITION BY LOWER(name) )
                > 1
           THEN INITCAP(name)
                || ' ' || VARCHAR(
                             CHAR(
                                ROW_NUMBER() OVER( PARTITION BY LOWER(name)
                                                       ORDER BY id          )
                             )
                           , 2
                          )
           ELSE name
           END  AS name
         , is_type
     FROM  items AS t
     ORDER BY
           id
    ;

    Result:
    Code:
    ------------------------------------------------------------------------------
    
    ID          NAME         IS_TYPE    
    ----------- ------------ -----------
              1 Chair 1                1
              2 Chair 2                1
              3 Chair 3                1
              4 Chair 4                1
              5 Chair 5                1
              6 Table 1                1
              7 Table 2                1
              8 Table 3                1
              9 Table 4                1
             10 Auto_Info              1
             11 Auto_Info              1
             12 Auto_Info              1
             13 Auto_Info              1
             14 Auto_Info              1
    
      14 record(s) selected.
    Last edited by tonkuma; 07-12-14 at 03:56. Reason: Removed sub-query from Query Example.

  7. #7
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    A variation.

    Example 2:
    Code:
    SELECT id
         , CASE
           WHEN RANK() OVER( PARTITION BY LOWER(name)
                                 ORDER BY name ASC    )
              + RANK() OVER( PARTITION BY LOWER(name)
                                 ORDER BY name DESC   )
                > 2
           THEN INITCAP(name)
                || ' ' || ROW_NUMBER() OVER( PARTITION BY LOWER(name)
                                                 ORDER BY id          )
           ELSE name
           END  AS name
         , is_type
     FROM  items AS t
     ORDER BY
           id
    ;

    Result:
    Code:
     
    ID          NAME                           IS_TYPE    
    ----------- ------------------------------ -----------
              1 Chair 1                                  1
              2 Chair 2                                  1
              3 Chair 3                                  1
              4 Chair 4                                  1
              5 Chair 5                                  1
              6 Table 1                                  1
              7 Table 2                                  1
              8 Table 3                                  1
              9 Table 4                                  1
             10 Auto_Info                                1
             11 Auto_Info                                1
             12 Auto_Info                                1
             13 Auto_Info                                1
             14 Auto_Info                                1
    
      14 record(s) selected.

  8. #8
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    This may be more effective than Example 1 or 2.

    Because, Example 1 and 2 include OLAP specifications having different ORDER BY clauses in a query.
    So, extra sort(s) corresponding to each ORDER BY clause might be necessary.

    Note: Make the query 1 line shorter than before.
    Example 3:
    Code:
    SELECT id
         , CASE MIN(name) OVER( PARTITION BY LOWER(name) )
           WHEN MAX(name) OVER( PARTITION BY LOWER(name) )
           THEN name
           ELSE INITCAP(name)
                || ' ' || ROW_NUMBER() OVER( PARTITION BY LOWER(name)
                                                 ORDER BY id          )
           END  AS name
         , is_type
     FROM  items AS t
     ORDER BY
           id
    ;
    Last edited by tonkuma; 07-13-14 at 00:33. Reason: Note: Make the query 1 line shorter than before.

  9. #9
    Join Date
    Jul 2014
    Posts
    6
    Tonkuma,
    Thanks for your reply. Would you happen to know how I can update the table with the results?

    Thanks

  10. #10
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    NeuDBUser,
    Do you have a billing address for Tonkuma?
    Dave

  11. #11
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    sorry about that, but the forum is to help determine what a problem is and possible resolutions not to do the work for you. Tonkuma put in quite a bit of his time giving you detailed instructions and asking you questions along the way. I think you should look at turning the select statement into an update statement in the same manner you would change any select into an update. Start with the key word, then the set statement.
    Dave

  12. #12
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Quote Originally Posted by NeuDBUser View Post
    Tonkuma,
    Thanks for your reply. Would you happen to know how I can update the table with the results?

    Thanks
    Please try to use UPDATE (fullselect).
    IBM Knowledge Center: DB2 for Linux UNIX and Windows 9.7.0>Database reference>SQL>Statements>UPDATE

    Syntax

    searched-update:

    Code:
    >>-UPDATE--+-table-name-----------------+----------------------->
               +-view-name------------------+   
               +-nickname-------------------+   
               +-ONLY--(--+-table-name-+--)-+   
               |          '-view-name--'    |   
               '-(--fullselect--)-----------'   
    
    ...
    ...
    For example:
    Code:
    UPDATE
    (/* put modified Example 3 here. */
    SELECT id
         , name
         , MIN(name) OVER( PARTITION BY LOWER(name) ) AS min_name
    ...
    ...
     FROM  items
    )
     SET   name = ...
     WHERE ...
    ;

  13. #13
    Join Date
    Jul 2014
    Posts
    6
    Tonkuma,
    Thanks so much for your help with this problem. I hope one day I can be confident enough in database scripts that I too can come on this forum and provide assistance to others on these wonderful and highly educational forums.

    This code will later assist hundreds of others that might face the same issue as I did, and educate them on how it can be solved.

    It is only with the spread of knowledge that we can make better versions of ourselves, which to me provides much more value than a check sent to a "billing address". You, without any sarcasm and cynicism, chose to volunteer your time to assist hundreds of hopeful programmers, and I once again thank you for that.

    My note to dav1mo and the other trolls: If you have nothing to contribute to this community, then I suggest that you simply back off and do not dilute these forums with sarcasm or cynicism.

    ---------------------------------------------------------------------------------

    Ok back to the code. Let me share what I came up with for the solution to this problem (using what you gave me initially). I solved it before you replied back to my question, so it is a bit different:

    Code:
    --------------------------------------------------------------------------
    --Create a "temporary" table (TEMP_TABLE) to store a modified copy of the ITEMS table. This column will have its duplicates with different
    --casing normalized into a uniform format, and numbered from 1...n.
    --After this column is copied back into the original ITEMS table then 
    --TEMP_TABLE will be dropped.
    ---------------------------------------------------------------------------------
    
    create table TEMP_ITEMS as (
    SELECT ID,
          CASE MIN(NAME) OVER( PARTITION BY LOWER(NAME) )
           WHEN MAX(NAME) OVER( PARTITION BY LOWER(NAME) )
           THEN NAME
           ELSE INITCAP(NAME)
                || ' ' || ROW_NUMBER() OVER( PARTITION BY LOWER(NAME)
                                                 ORDER BY ID          ) 
           END AS NAME
    FROM  ASSERTION AS t
    )WITH NO DATA;
    
    INSERT INTO TEMP_ITEMS (
    SELECT ID,
          CASE MIN(NAME) OVER( PARTITION BY LOWER(NAME) )
           WHEN MAX(NAME) OVER( PARTITION BY LOWER(NAME) )
           THEN NAME
           ELSE INITCAP(NAME)
                || ' ' || ROW_NUMBER() OVER( PARTITION BY LOWER(NAME)
                                                 ORDER BY ID          ) 
           END AS NAME
    FROM  ITEMS AS t
    );
    
    ---------------------------------------------------------------------------------
    --Use "merge" command to replace the NAME in ITEMSwith the one on 
    --TEMP_TABLE
    ---------------------------------------------------------------------------------
    
    merge into ITEMS target
    using (SELECT ID, NAME FROM TEMP_TABLE) source
    on (target.ID = source.ID)
    when matched then
    update set target.NAME = source.NAME;
    
    ---------------------------------------------------------------------------------
    --Drop TEMP_TABLE since we are done with it.
    ---------------------------------------------------------------------------------
    
    DROP TABLE TEMP_TABLE;

  14. #14
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Neu,
    Nothing to do with being a troll. Has to do with as I mentioned we are here to help and give advice, not do the work for you. Yes, you should post your end result so that folks in the future can see how you handled your issue, as you did above.
    I do take issue with folks being given extensive assistance from anyone on here, then saying, well you already did all that now how about you just finish it for me, so that I can implement it.
    Tonkuma is probably one of the most helpful people on dbforums(all categories as well, not just DB2), I give advice on these as well, but no where near the detail or depth that he goes to, as I normally do not have that kind of time. I'm, also, more of the belief that you will get more out of something with assistance and pointers rather than it being done for you and that if I am doing your job for you then why should you be the only one making a profit from my work. We do the same for people with homework assignements, we try not to give them a finished product, but rather point them in a general direction.
    Dave

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •