Results 1 to 10 of 10
  1. #1
    Join Date
    Apr 2012
    Posts
    32

    Unanswered: GROUP BY on multiple tables

    Hello all,

    I have 2 tables with a common key resembling:

    Table A
    Key No. of Lines (result of a count(*) group by key)
    A 12
    B 40
    C 33
    D 45
    E 60

    Table B
    Key No. of Lines (result of a count(*) group by key)
    B 12
    C 39
    D 44
    F 65

    What I need is a full outer join on these two tables so as to produce a result set resembling:

    Key No. of lines in Table A No. of lines in Table B
    A 12 0
    B 40 12
    C 33 39
    D 45 44
    E 60 0
    F 0 65

    I seem to be unable to do a full outer join and then execute a count(*) on the separate tables.

    Can someone please help me?

    Thanks a lot

    S. BASU

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Two ideas.

    (1) full outer join two subqueries doing "count(*) group by key"(one for Table A, another for Table B).

    (2a) If there were some column(s)
    with which each row in Table A having same Key correspond to a row in Table B having same Key,
    do a full outer join adding the columns in ON condition,
    and get results by "COUNT(Table_a.Key) AS No. of lines in Table A"(same as for Table B).

    (2b) If there were not such columns,
    add "ROW_NUMBER() OVER(PARTITION BY key)" in subqueries and do a full outer join adding the ROW_NUMBER columns in ON condition,
    and get results same as (2a).

  3. #3
    Join Date
    Apr 2012
    Posts
    32
    Hey

    Thanks a lot for the tips.
    First, I admit that I'm not an SQL guru so your 2nd solution is kinda Greek to me. That said I implemented the first and its more or less fine except for NULL columns, i.e. for the lines in each table where the key value is null.
    Thus if in table A I have 100 lines for which the key column (the one I'm grouping on and not a key column of the table) is null then the subquery for table A returns:

    CTDERE_TAB TOTAL
    NULL 2861

    And the subquery for table B returns:
    CTDERE_TAB TOTAL
    NULL 2861

    Which is actually the same. However on full outer joining these two I get:

    CTDERE_TAB TOTAL CTDERE_HISTO TOTAL_HISTO
    NULL NULL NULL 2861
    NULL 2861 NULL NULL

    Ideally I'd have liked to have one NULL and both totals on the same line.

    Any idea where I'm going wrong?

    Thanks again

  4. #4
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Key No. of lines in Table A No. of lines in Table B
    A 12 0
    B 40 12
    C 33 39
    D 45 44
    E 60 0
    F 0 65
    To get the results like that, use COALESCE functions.
    Here is an example:
    Code:
    SELECT COALESCE(A.key , B.key)     AS key
         , COALESCE(A.no_of_lines , 0) AS "No. of lines in A"
         , COALESCE(B.no_of_lines , 0) AS "No. of lines in B"
     FROM  ...

  5. #5
    Join Date
    Apr 2012
    Posts
    32
    This is exactly what I can't seem to get to.
    To get A.no_of_lines and B.no_of_lines I am trying count(A.*) and count(B.*) but am getting an error since it cannot seem to decipher * in this context.

    Also I presume that this solution has nothing to do with the solution 1 you suggested in your earlier post? ie full outer join two subqueries doing "count(*) group by key"(one for Table A, another for Table B).?

  6. #6
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    I felt that the major reason of the difficulty of communication between you and me might be in the facts that
    (a) I don't know/understand your knowledge and capabilities about relational database and (specially) DB2
    and
    (b) my poor English capability (and my ASD?).

    So, I thought that it might be better to make conversations by concrete examples(i.e. sample data and expected results from the data).
    Note: You provided intermediate result(i.e. results of group by key). But you didn't provide base table data.

    Could you provide that(sample data(CREATE TABLE statements and INSERT statements to the tables) and expected results from the data) ?



    Anyway,
    this was my tested query based on
    (1) full outer join two subqueries doing "count(*) group by key"(one for Table A, another for Table B).
    Example 1:
    Code:
    SELECT COALESCE(a.key , b.key)     AS "Key"
         , COALESCE(a.no_of_lines , 0) AS "No. of lines in A"
         , COALESCE(b.no_of_lines , 0) AS "No. of lines in B"
     FROM  (SELECT key
                 , COUNT(*) AS no_of_lines
             FROM  Table_A
             GROUP BY
                   key
           ) AS a
     FULL  OUTER JOIN
           (SELECT key
                 , COUNT(*) AS no_of_lines
             FROM  Table_B
             GROUP BY
                   key
           ) AS b
      ON   a.key = b.key
    ;
    Last edited by tonkuma; 02-26-14 at 10:21. Reason: Add Note:

  7. #7
    Join Date
    Apr 2012
    Posts
    32
    Hi

    Sorry for the late reply. The DB server was down last evening so I've just managed to test your solution now.

    Please don't apologize.... It was my mistake that i didn't understand. I was substituting wrong values. However, now I've corrected it now and this is what I get:

    SELECT COALESCE(a.CTDERE , b.CTDERE) AS "CTDERE"
    , COALESCE(a.no_of_lines , 0) AS "No. of lines in A"
    , COALESCE(b.no_of_lines , 0) AS "No. of lines in B"
    FROM (SELECT CTDERE
    , COUNT(*) AS no_of_lines
    FROM CORP.NIEMBT
    GROUP BY
    CTDERE
    ) AS a
    FULL OUTER JOIN
    (SELECT CTDERE
    , COUNT(*) AS no_of_lines
    FROM CORP.NIEMBT_HISTO WHERE COD_PRD_REF = (SELECT MAX(COD_PRD_REF) FROM NIEMBT_HISTO)
    GROUP BY
    CTDERE
    ) AS b
    ON a.CTDERE = b.CTDERE
    ;

    The result however is same as before with the exception that the key column is mentioned once, which is great:
    CTDERE No. of lines in A No. of lines in B
    A 106505 106505
    R 8164 8164
    NULL 2861 0
    NULL 0 2861

    However I'd ideally need the NULL keys grouped together and both the 2861 values on the same line.

    The create table statements are as follows:

    Parent table NIEMBT:

    CREATE TABLE CORP.NIEMBT
    (
    CTDERE CHARACTER (1),
    YLINU1 VARCHAR (15) NOT NULL
    );

    Archive table NIEMBT_HISTO:

    CREATE TABLE CORP.NIEMBT_HISTO
    (
    COD_PRD_REF VARCHAR (6) NOT NULL,
    CTDERE CHARACTER (1),
    YLINU1 VARCHAR (15) NOT NULL
    );

    Due to the volume of the tables though it is impossible for me to include all the INSERT statements for the 2 tables since the number of lines run into thousands.

    However to explain a little more clearly I can say that:
    • In both tables the field CTDERE contains different values including NULL.
    • The archive table has one field extra - COD_PRD_REF which is basically the year and month in YYYYMM format. Thus when we join on the archive table we join on the subset of that table for which the date is the latest and thus the WHERE condition therein.


    Thus what is happening is, while all the non null values are correctly grouped together, the NULL values are not.

    I hope I'm clear? I'd be most happy to provide further clarifications. This is getting interesting

  8. #8
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    The essential cause of your issue must be "CTDERE contains NULL".

    Add an OR condition like
    Code:
      ON   a.CTDERE = b.CTDERE
      OR   a.CTDERE IS NULL AND b.CTDERE IS NULL

  9. #9
    Join Date
    Apr 2012
    Posts
    32
    Perfect. Finally its good.

    Deeply indebted to you mate.

    Thanks a lot:-)

  10. #10
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Another example based on
    (2b) If there were not such columns,
    add "ROW_NUMBER() OVER(PARTITION BY key)" in subqueries and do a full outer join adding the ROW_NUMBER columns in ON condition,
    and get results same as (2a).
    Example 2:
    Note: "COUNT(a.YLINU1)" was used instead of "COUNT(Table_a.Key)".
    Because, Key(i.e. CTDERE) contains NULL, and YLINU1 is NOT NULL.

    Code:
    SELECT COALESCE(a.CTDERE , b.CTDERE) AS CTDERE
         , COUNT(a.YLINU1) AS "No. of lines in A"
         , COUNT(b.YLINU1) AS "No. of lines in B"
     FROM  (SELECT t.*
                 , ROW_NUMBER() OVER(PARTITION BY CTDERE) AS row_no
             FROM  NIEMBT AS t
           ) AS a
     FULL  OUTER JOIN
           (SELECT t.*
                 , ROW_NUMBER() OVER(PARTITION BY CTDERE) AS row_no
             FROM  NIEMBT_HISTO AS t
           ) AS b
      ON
       (   a.CTDERE = b.CTDERE
        OR a.CTDERE IS NULL AND b.CTDERE IS NULL
       )
       AND a.row_no = b.row_no
     GROUP BY
           COALESCE(a.CTDERE , b.CTDERE)
    ;

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •