Page 1 of 2 12 LastLast
Results 1 to 15 of 26
  1. #1
    Join Date
    Jan 2007
    Posts
    33

    Unanswered: SQL Optimization - Multiple element

    Hi

    We have following query which insert


    -- Element=ADD1

    INSERT INTO STAGE.CLIENT_DATA
    (SELECT C.ADDRESSID, NEXTVAL FOR STAGE.ID_SEQ,'ADD1', C1.RA_STNBR
    FROM STAGE.CLIENT C1,STAGE.CLIENT_ADDRESSDATA1 C
    WHERE
    C1.RID=C.RID AND NOT EXISTS (SELECT * FROM DB2INST3.ELEMENT A WHERE A.ADDRESSID=C.ADDRESSID
    AND A.ELEMENTTYPE='ADD1'));

    INSERT INTO STAGE.CLIENT_DATA
    (SELECT D.ADDRESSID, D.ID,'ADD1',C1.RA_STNBR
    FROM STAGE.CLIENT_ADDRESSDATA1 C, DB2INST3.ELEMENT D,STAGE.CLIENT C1
    WHERE C.ADDRESSID=D.ADDRESSID AND D.ELEMENTTYPE='ADD1' AND C1.RID=C.RID);

    PS: First query to update existing records, second to insert new record.

    The above queries will be repeated for each address element such as ADD2, ADD3, CITY, STATE,ZIP etc....

    The # of element= 2 times the query for single table.

    If table is having huge data then we are runnning the same query multiple times, which will take long time and definately is not advisable solution.. Is there any alternative in DB2 UDB to write single query for insert /update which takes care of all address element as listed above. We are using db2 udb 8.2 for linux.

    Thanks
    Diwakar

  2. #2
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    The first statement won't update anything. It is an INSERT statement and, thus, can only insert new rows. If you need to perform an UPDATE and INSERT statement together, then have a look at the MERGE statement.

    If I got this right, you have only different behavior for the INSERT operation depending on the existence of rows in the ELEMENT table, right? I'd say that a regular LEFT OUTER JOIN should do what you want:
    Code:
    INSERT
    INTO   stage.client_data
    SELECT ca.addressid,
           coalesce(e.id, NEXTVAL FOR stage.id_seq),
           'ADD1',
           c.ra_stnbr
    FROM   stage.client AS c JOIN
           stage.client_addressdata1 AS ca ON ( c.rid = ca.rid ) LEFT OUTER JOIN
           ( SELECT id
             FROM   db2inst3.element
             WHERE  elementtype = 'ADD1 ) AS e ON ( ca.addressid = e.addressid )
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  3. #3
    Join Date
    Jan 2007
    Posts
    33
    Knut,

    Thanks for quick response.... I am using MERGE for actual prod table.. This is stage table and used as intermediate table between denormalized table and actual prod table.

    So, if I have multiple address element, then do I need to repeat the same query that many times (# of address elements)?

    How do I take care multiple adress element within single query?

    e.g. 'ADD2',
    'ADD3' and so on.

    Thnaks
    Diwakar

  4. #4
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    What are the semantics of the ADDx values? When will you use ADD1, ADD2, and so on? Are you always querying the same 3 tables with the same conditions (just that the ADDx varies)?
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  5. #5
    Join Date
    Jan 2007
    Posts
    33
    Knut,

    Actually requirement is to polulate all address element eg.

    ADD1 = 4560 E
    ADD2 = NW ?
    CITY= Avon
    STATE=NJ
    COUTNY ?? and so on..

    All records will have these fields in address and we would need to polulate above table with all address element. I had originally provided only 2 SQLs for ADD1, but I am having 12 SQLs for 6 address element.But after LEFT OUTER JOIN the SQLs will still be 6 for 6 address element..

    Is there anyway to combine all 6 together w/o affecting the data.

    Thanks
    Diwakar

  6. #6
    Join Date
    Jan 2007
    Posts
    33
    Yes, ADDx will change and actual column name [for values of source table].

    otherwise query will remain same for all 6 address element.

    Thanks
    Diwakar

  7. #7
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Could you provide an example of 2 SQL statements, i.e. two ADDx's? I'm still having trouble grasping what you actually want to achieve.

    p.s: My general rule is that you can combine many, many things in a single SQL statement - if you know how. So my answer would be that we should find something to handle all 6 ADDx parts in a single statement (and hopefully with a single table scan only).
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  8. #8
    Join Date
    Jan 2007
    Posts
    33
    Knut,

    Here are 2 out of 6 address lement SQLs.
    --ADD1
    INSERT INTO STAGE.CLIENT_DATA
    (SELECT C.ADDRESSID, NEXTVAL FOR STAGE.ID_SEQ,'ADD1', C1.RA_STNBR
    FROM STAGE.CLIENT C1,STAGE.CLIENT_ADDRESSDATA1 C
    WHERE
    C1.RID=C.RID AND NOT EXISTS (SELECT * FROM DB2INST3.ELEMENT A WHERE A.ADDRESSID=C.ADDRESSID
    AND A.ELEMENTTYPE='ADD1'));

    INSERT INTO STAGE.CLIENT_DATA
    (SELECT D.ADDRESSID, D.ID,'ADD1',C1.RA_STNBR
    FROM STAGE.CLIENT_ADDRESSDATA1 C, DB2INST3.ELEMENT D,STAGE.CLIENT C1
    WHERE C.ADDRESSID=D.ADDRESSID AND D.ELEMENTTYPE='ADD1' AND C1.RID=C.RID);

    --ADD2

    INSERT INTO STAGE.CLIENT_DATA
    (SELECT C.ADDRESSID, NEXTVAL FOR STAGE.ID_SEQ,'ADD2', C1.RA_DIR
    FROM STAGE.CLIENT C1,STAGE.CLIENT_ADDRESSDATA1 C
    WHERE
    C1.RID=C.RID AND NOT EXISTS (SELECT * FROM DB2INST3.ELEMENT A WHERE A.ADDRESSID=C.ADDRESSID
    AND A.ELEMENTTYPE='ADD2'));

    INSERT INTO STAGE.CLIENT_DATA
    (SELECT D.ADDRESSID, D.ID,'ADD1',C1.RA_DIR
    FROM STAGE.CLIENT_ADDRESSDATA1 C, DB2INST3.ELEMENT D,STAGE.CLIENT C1
    WHERE C.ADDRESSID=D.ADDRESSID AND D.ELEMENTTYPE='ADD2' AND C1.RID=C.RID);

    Similarly it will CITY, STATE, ZIP, COUNTY and more. After using LEFT outer join, two SQLs are combined but still there will be 6 SQLs [1 per address element].

    Thanks
    Diwakar

  9. #9
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    If I got this rigth, then each address element (ADD1, ADD2, CITY, STATE, ...) is stored in a separate row, correct? If so, then you could use a simple UNION operator (or better yet: UNION ALL to avoid sort operations for duplicate elimination):
    Code:
    INSERT INTO ...
    ( SELECT ..., 'ADD1', ... FROM ... )
    UNION ALL
    ( SELECT ..., 'ADD2', ... FROM ... )
    UNION ALL
    ( SELECT ..., 'CITY' ... FROM ... )
    It is then the DB2 optimizer's job to come up with the best plan for that.

    An alternative would be to build this step-by-step to avoid the UNION operatiors and, thus, do this in a single sweep over the underlying tables. The critical piece is that each entry in table STAGE.CLIENT_ADDRESSDATA1 has to fan out to 6 rows (one for each address element). Those 6 elements may or may not exist in the ELEMENTS table.
    Code:
    INSERT
    INTO   stage.client_data
    SELECT ca.addressid,
           COALESCE(e.id, NEXTVAL FOR stage.id_seq),
           types.name,
           c.ra_stnbr
    FROM   stage.client AS c JOIN
           stage.client_addressdata1 AS ca ON ( c.rid = ca.rid ),
           -- cross join with static table of address elements to get the fan-out
           TABLE ( VALUES ( 'ADD1', 'ADD2', 'CITY', 'STATE', 'ZIP', 'COUNTY' ) ) AS types(name)
           -- now the outer join to add any existing ID information
           LEFT OUTER JOIN
           ( SELECT id, elementtype
             FROM   db2inst3.element ) AS e ON ( ca.addressid = e.addressid AND e.elementtype = types.name)
    We introduced a cross join for the fan-out. After the cross join (but before the outer join), each row in CLIENT_ADDRESSDATA1 causes 6 rows to be produced for the final insert, i.e. one row for each address element. The last step is still the left outer join, which determines if there is an element with that type name and for that address already in the ELEMENT table.

    I suggest that you try this statement by adding first one join, executing the query, then add the other join and verify the results again - because I'm not sure I got everything correct at the first shot.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  10. #10
    Join Date
    Jan 2007
    Posts
    33
    Knut,

    Thanks a lot for quick response ..Yes, each address element will have seperate row...

    The column "c.ra_stnbr" in above query will populate the value for ADD1, but since there is sequence generating the values, how can we take care of all address element into one... If yes, how...

    Secondly, how to add actual columns like "c.ra_stnbr" for other lement such as c.dir, c.ra_city etc...

    thanks
    Diwakar

  11. #11
    Join Date
    Jan 2007
    Posts
    33
    I mean ,there are multiple source column to populate the values for address element. In original SQL we are using:

    ra_stnbr to get ADD1
    ra_dir=ADD2 and for other element we will have
    ra_city=CITY
    ra_state=STATE
    ra_zip=ZIP and ra_country=COUNTY

    Thanks
    Diwakar

  12. #12
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Sorry, I missed that part. It's just a trivial issue that can be solved with a CASE expression like this:
    Code:
    CASE types.name
       WHEN 'ADD1' THEN ra_stnbr
       WHEN 'ADD2' THEN ra_dir
       WHEN 'CITY' THEN ra_city
       WHEN 'STATE' THEN ra_state
       WHEN 'ZIP' THEN ra_zip
       WHEN 'COUNTY' THEN ra_county
    END
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  13. #13
    Join Date
    Jan 2007
    Posts
    33
    I have updated with all the chanes, but I am getting:

    SQL0338N An ON clause associated with a JOIN operator or in a MERGE statement is not valid.

    error..

    I hv verified syntax but could not figured it out.

    Thanks
    Diwakar

  14. #14
    Join Date
    Jan 2007
    Posts
    33
    and when use:
    TABLE ( VALUES ( 'ADD1','ADD2'........) ) AS types(name) LEFT OUTER JOIN, it gives error:
    SQL0158N The number of columns specified for "TYPES " is not the same as the number of columns in the result table.

    but if used only 'ADD1' only it works.

  15. #15
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Try this instead:
    Code:
    TABLE ( VALUES ('ADD1'), ('ADD2'), ('CITY'), ...) ) AS types(name)
    This will produce a table with 6 rows and 1 column instead of a table with 6 columns and 1 rows.

    As for the SQL0338, I guess that the problem is that the last ON clause accesses column CA.ADDRESSID and that is over a cross join (instead of an INNER join). Move this predicate to the WHERE clause, i.e. change the last ON clause to:
    Code:
    ON ( e.elementtype = types.name)
    and add this at the very end:
    Code:
    WHERE ca.addressid = e.addressid
    p.s: As long as you agree with my argumentation, I believe that we only have to deal with the syntax issues. I didn't try/verify the statements myself and just typed them...
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •