Results 1 to 3 of 3
  1. #1
    Join Date
    Feb 2014
    Posts
    1

    Question Unanswered: SQL - Join 2 tables with duplicates

    Hello SQL experts!

    I'm having some trouble with a SQL query on PowerPivot (Excel), nobody could find the solution thus far.. I need your help!!!


    Table A:
    ID | Field A | Field B | Field C
    x1 | A1 | B1 | C1
    x1 | A2 | B2 | C2
    x1 | A3 | B3 | C3
    x1 | A4 | B4 | C4
    x1 | A5 | B5 | C5
    x2 | A6 | B6 | C6
    x2 | A7 | B7 | C7
    x2 | A8 | B8 | C8
    x2 | A9 | B9 | C9
    x2 | A10 | B10 | C10
    ...
    [All IDs have 5 duplicates with different values (it can also be the same) in the other fields]

    Table B:
    ID | Field D
    x1 | D1
    x1 | D1
    x1 | D1
    x1 | D1
    x1 | D1
    x2 | D2
    x2 | D2
    x2 | D2
    x2 | D2
    x2 | D2
    ...
    [All IDs have 5 duplicated with the same value in Field D]

    What I want:
    ID | Field A | Field B | Field C | Field D
    x1 | A1 | B1 | C1 | D1
    x1 | A2 | B2 | C2 | D1
    x1 | A3 | B3 | C3 | D1
    x1 | A4 | B4 | C4 | D1
    x1 | A5 | B5 | C5 | D1
    x2 | A6 | B6 | C6 | D2
    x2 | A7 | B7 | C7 | D2
    x2 | A8 | B8 | C8 | D2
    x2 | A9 | B9 | C9 | D2
    x2 | A10 | B10 | C10 | D2
    ...


    Table A has 500 000 rows, Result MUST have EXACTLY 500 000 rows.
    Table B has 800 000 rows, some IDs that I don't need.
    I need to keep ALL the rows from Table A, and add "Field D".
    All the "ID" are present 5 times in both tables.

    A basic left join will create duplicates, since every ID from Table A has 5 matches in Table B.


    Can someone help me? Thank you!
    Regards.

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    I'm having some trouble with a SQL query on PowerPivot (Excel), ...
    If it was a trouble specific to Excel(or a tool on Excel),
    you may want to ask on Microsoft Excel - dBforums


    If it was a general SQL question,
    the design of Table B(i.e. [All IDs have 5 duplicated with the same value in Field D]) is stupid.
    By removing duplicated rows, you can avoid your trouble.
    So, try to remove the duplication on Table B,
    or create a new table(e.g. a table named B_norm) which removed duplicated rows from Table B, like...
    Code:
    CREATE TABLE B_norm
    ( id    CHAR(2) NOT NULL
    , col_d CHAR(2) NOT NULL
    , PRIMARY KEY ( id , col_d )
    );
    
    INSERT INTO B_norm
    ( id , col_d )
    SELECT DISTINCT
           id
         , field_d
     FROM  table_b
    ;

    Third possibility:
    An idea might be to use "ROW_NUMBER() OVER(PARITION BY id) AS row_num" in sub-queries for Table A and Table B,
    and add a condition of equal of the row_mun(s) to join condition.

  3. #3
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    select a.*, b.d1
    from table_a a
    inner join (select distinct c.id, c.d1
    from table_b c) as b
    on a.id = b.id

    Don't blame me for the performance. To speed it up, look at Tonkuma's suggestion above.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •