Results 1 to 4 of 4
  1. #1
    Join Date
    Feb 2011
    Posts
    43
    Provided Answers: 1

    Unanswered: Distinct LISTAGG in subquery

    How does one do a distinct LISTAGG in a subquery in which an element in the WHERE clause relates it to the parent query?

    I know how to do a distinct LISTAGG here, in a simple query:

    SELECT
    LISTAGG(DATA_COLUMN, ', ') WITHIN GROUP(ORDER BY DATA_COLUMN)
    FROM
    (
    SELECT DISTINCT
    TABLE1.DATA_COLUMN
    FROM
    TABLE1
    WHERE
    TABLE1.PRIMARY_KEY = 123456
    )

    And I can get a simple LISTAGG of all DATA_COLUMN for TABLE1 records related to a TLORDER record, like this:

    SELECT
    TABLE1.DATA_COLUMN,
    (
    SELECT
    LISTAGG(TABLE2.DATA_COLUMN, ', ') WITHIN GROUP(ORDER BY TABLE2.DATA_COLUMN)
    FROM
    TABLE2
    WHERE
    TABLE2.FOREIGN_KEY = TABLE1.PRIMARY_KEY
    )
    FROM
    TABLE1
    WHERE
    TABLE2.PRIMARY_KEY = 123456


    But how do I get a distinct LISTAGG when the subquery needs to tie back to the parent query? I tried this:

    SELECT
    TABLE1.DATA_COLUMN,
    (
    SELECT
    LISTAGG(TABLE2.DATA_COLUMN, ', ') WITHIN GROUP(ORDER BY TABLE2.DATA_COLUMN)
    FROM
    (
    SELECT DISTINCT
    TABLE2.DATA_COLUMN
    FROM
    TABLE2
    WHERE
    TABLE2.FOREIGN_KEY = TABLE1.PRIMARY_KEY
    )
    )
    FROM
    TABLE1
    WHERE
    TABLE2.PRIMARY_KEY = 123456

    This fails because it sees TABLE1.PRIMARY_KEY in the WHERE clause of the subquery as being referred to out of context. I suppose I can see that, since there is no reference to TABLE1 in the LISTAGG that calls the SELECT DISTINCT; TABLE1 exists only in the parent query. But how do I get the SELECT DISTINCT in the source of the LISTAGG to filter to an element in the primary query (TABLE1 in my example) rather than a hard filter on an explicit value? I may well be off here in my basic approach, but the only options I have seen elsewhere are to use regular expressions to filter duplicates out of the results of the LISTAGG.

  2. #2
    Join Date
    Jul 2016
    Location
    Moscow
    Posts
    261
    Provided Answers: 39
    This doesn't work:
    Code:
    SELECT
    TABLE1.DATA_COLUMN,
    (
    SELECT LISTAGG(TABLE2.DATA_COLUMN, ', ') WITHIN GROUP(ORDER BY TABLE2.DATA_COLUMN)
    FROM
    TABLE2
    WHERE
    TABLE2.FOREIGN_KEY = TABLE1.PRIMARY_KEY
    )
    FROM
    TABLE1
    WHERE
    TABLE2.PRIMARY_KEY = 123456
    Let's try the example which anyone can reproduce.
    This does work:
    Code:
    SELECT T.TABNAME,
    (
    SELECT
    LISTAGG(C.COLNAME, ', ') WITHIN GROUP(ORDER BY C.COLNAME)
    FROM SYSCAT.COLUMNS C
    WHERE C.TABSCHEMA = T.TABSCHEMA AND C.TABNAME=T.TABNAME
    )
    FROM SYSCAT.TABLES T
    WHERE T.TABSCHEMA = 'SYSCAT' AND T.TABNAME='TABLES'
    So, can you explain what you want to achieve on this example?
    Regards,
    Mark.

  3. #3
    Join Date
    Feb 2011
    Posts
    43
    Provided Answers: 1
    Very good point, Mark. I mis-typed one thing when I replaced my actual table/column names with generic names TABLE1, TABLE2, etc. My final WHERE clause should have referred to TABLE1. It should have been this:

    Code:
    SELECT
    TABLE1.DATA_COLUMN,
    (
    SELECT LISTAGG(TABLE2.DATA_COLUMN, ', ') WITHIN GROUP(ORDER BY TABLE2.DATA_COLUMN)
    FROM
    TABLE2
    WHERE
    TABLE2.FOREIGN_KEY = TABLE1.PRIMARY_KEY
    )
    FROM
    TABLE1
    WHERE
    TABLE1.PRIMARY_KEY = 123456
    But to answer your example/question:

    Code:
    SELECT T.TABNAME,
    (
    SELECT
    LISTAGG(C.COLNAME, ', ') WITHIN GROUP(ORDER BY C.COLNAME)
    FROM SYSCAT.COLUMNS C
    WHERE C.TABSCHEMA = T.TABSCHEMA AND C.TABNAME=T.TABNAME
    )
    FROM SYSCAT.TABLES T
    WHERE T.TABSCHEMA = 'SYSCAT' AND T.TABNAME='TABLES'
    My goal would never occur in the result of this query because it is impossible for SYSCAT.COLUMNS to ever have two instances of the same column for the same table, unlike my TABLE1/TABLE2 example. But if it could occur with SYSCAT TABLES/COLUMNS, my goal would be to list this:

    COLNAME#1

    instead of this

    COLNAME#1, COLNAME#1

    in any case where the LISTAGG produces a duplicate, while, for a table not having any duplicates, showing this:

    COLNAME#1, COLNAME#2

    In my case, unlike SYSCAT, I have a TABLE1/TABLE2 combination for which LISTAGG (from TABLE2, related to TABLE1) produces a result like that below. Where the first two columns come from TABLE1 and the third is a LISTAGG from TABLE1 related to via FK/PK relationship, I get this:

    789152 321987 ABC, DEF
    546876 519875 DEF, GHI
    123456 987654 ABC, ABC

    I would like this instead, returning the duplicate in the third row as a de-duplicated entry.

    789152 321987 ABC, DEF
    546876 519875 DEF, GHI
    123456 987654 ABC (i.e. distinct on the LISTAGG result here so the same thing is not )

    Or this rather cluttered result:

    789152 321987 ABC, ABC, ABC, DEF, GHI, GHI, JKL

    needs to instead be this de-duplicated version:

    789152 321987 ABC, DEF, GHI, JKL

  4. #4
    Join Date
    Jul 2016
    Location
    Moscow
    Posts
    261
    Provided Answers: 39
    Try this:
    Code:
    with 
      t1 (pk, d) as (values 
      (1, '1')
    , (2, '2')
    )
    , t2 (fk, d) as (values 
      (1, '1')
    , (1, '1')
    , (1, '2')
    , (1, '3')
    , (2, '1')
    , (2, '2')
    )
    select t1.*, g.d_agg
    from t1
    join (
    select fk, listagg(d, ',') within group (order by d) d_agg
    from (select distinct fk, d from t2)
    group by fk
    ) g on g.fk=t1.pk
    where t1.pk=1;
    Regards,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •