Results 1 to 8 of 8
  1. #1
    Join Date
    Sep 2012
    Posts
    20

    Unanswered: SQL Query - Question as my one runs forever

    Hello,

    Hope anybody here can help me with following (The following shows the table structure and the issue with the SQL Query I am facing)

    Table: Entity (with around 59000 rows)
    Account-------EntityID--------Firstname--------Lastname
    1000-------------23-----------Karl--------------Heinz
    1000-------------67-----------Thorsten----------Peters
    1001-------------45-----------Karl--------------Heinz
    ......

    Table: Account (36000)
    Acount----------Accounttype-----Key
    1000------------10--------------50
    1000------------10--------------26
    1002------------20--------------32
    .....

    Table: Address (36000 rows)
    Key--------Addressstreest------Postal------Country
    25-----------Mainstreet---------58561------USA
    32-----------Street2------------56345------Germany
    ........



    What I need to get now is the following: I need to show from the above 3 tables all information, where the Firstname, Lastname and Postalcode combination appears more than once.

    I tried it in 2 different ways, but non of it worked
    First way (simplified version (without the address link), but this one runs already endless)
    Code:
    SELECT Entity1.ACCOUNT AS ACCOUNT_1ST
    	,Entity1.ENTITYID AS ENTITYID_1ST
    	,Entity1.DOFB AS DATE_OF_BIRTH_1ST
    	,Entity1.FIRSTNAME
    	,Entity1.LASTNAME
    	,Entity2.ACCOUNT AS ACCOUNT_2ND
    	,Entity2.ENTITYID AS ENTITYID_2ND
    	,Entity2.DOFB AS DATE_OF_BIRTH_2ND
    	,Entity2.FIRSTNAME
    	,Entity2.LASTNAME
    FROM Entity AS Entity1
    	,Entity AS Entity2
    WHERE Entity1.FIRSTNAME = Entity2.FIRSTNAME
    	AND Entity1.LASTNAME = Entity2.LASTNAME
    	AND Entity1.ENTITYID <> Entity2.ENTITYID
    ORDER BY Entity1.ENTITYID


    Therefore I tired the second one but this one runs as well endless (waited 15 min)
    Code:
    SELECT ACCOUNT.ACCOUNT
    	,Entity.ENTITYID
    	,Entity.FIRSTNAME
    	,Entity.LASTNAME
    	,ACCOUNTADDR.ADDRESSSTREET
    	,ACCOUNTADDR.POSTAL
    	,ACCOUNTADDR.Country
    FROM (
    	(
    		ACCOUNT AS ACCOUNT INNER JOIN Entity AS Entity ON ACCOUNT.ACCOUNT = Entity.ACCOUNT
    		) INNER JOIN Address AS ACCOUNTADDR ON ACCOUNT.Key = ACCOUNTADDR.Key
    		AND ACCOUNTADDR.DEFF <= CURDATE()
    		AND ACCOUNTADDR.STOPDATE > CURDATE()
    	)
    WHERE TRIM(CONCAT (
    			Entity.FIRSTNAME
    			,Entity.LASTNAME
    			)) IN (
    		SELECT TRIM(CONCAT (
    					Entity_DUPPLICATE.FIRSTNAME
    					,Entity_DUPPLICATE.LASTNAME
    					))
    		FROM Entity AS Entity_DUPPLICATE
    		GROUP BY Entity_DUPPLICATE.FIRSTNAME
    			,Entity_DUPPLICATE.LASTNAME
    		HAVING count(*) > 1
    		)
    ORDER BY ACCOUNT.ACCOUNT
    	,Entity.ENTITYID


    FYI - The where clause part below runs quite quick, just the complete query runs forever
    Code:
    SELECT TRIM(CONCAT (
    					Entity_DUPPLICATE.FIRSTNAME
    					,Entity_DUPPLICATE.LASTNAME
    					))
    		FROM Entity AS Entity_DUPPLICATE
    		GROUP BY Entity_DUPPLICATE.FIRSTNAME
    			,Entity_DUPPLICATE.LASTNAME
    		HAVING count(*) > 1



    Does anyone can help me here?

    Thanks very much in advance!
    Regards,

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    At least at first glance, the SELECT statement in your first query looks fine to me. I suspect that it needs an index on FIRSTNAME, LASTNAME, ENTITYID in order to perform well.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    (1) I agree with PatP,
    appropriate index(es) need in order to perform well.

    (2) It is not neccesary to concatenate columns into one string to use IN predicate.
    IN predicate - IBM DB2 9.7 for Linux, UNIX, and Windows
    IN predicate
    Code:
    >>-+-expression1--+-----+--IN--+-(fullselect1)-------------+----+-><
       |              '-NOT-'      |    .-,---------------.    |    |   
       |                           |    V                 |    |    |   
       |                           +-(------expression2---+--)-+    |   
       |                           '-expression2---------------'    |   
       '-(----row-value-expression----)--+-----+--IN--(fullselect2)-'   
                                         '-NOT-'
    Try some modified queries with appropriate indexes.

    Example 1: I prefer to use short aliases and not to use unnecessary parentheses.
    Code:
    SELECT act.ACCOUNT
         , ent.ENTITYID
         , ent.FIRSTNAME
         , ent.LASTNAME
         , adr.ADDRESSSTREET
         , adr.POSTAL
         , adr.Country
     FROM  Entity  AS Ent
     INNER JOIN
           ACCOUNT AS act
      ON   act.ACCOUNT = Ent.ACCOUNT
     INNER JOIN
           Address AS adr
      ON   adr.Key      =  act.Key
       AND adr.DEFF     <= CURDATE()
       AND adr.STOPDATE >  CURDATE()
     WHERE (  ent.FIRSTNAME
            , ent.LASTNAME
           ) IN (
            SELECT FIRSTNAME
                 , LASTNAME
             FROM  Entity
             GROUP BY
                   FIRSTNAME
                 , LASTNAME
             HAVING
                   count(*) > 1
           )
     ORDER BY
           act.ACCOUNT
         , Ent.ENTITYID
    Example 2:
    Code:
    SELECT act    .ACCOUNT
         , Ent    .ENTITYID
         , ent_dup.FIRSTNAME
         , ent_dup.LASTNAME
         , adr    .ADDRESSSTREET
         , adr    .POSTAL
         , adr    .Country
     FROM  (SELECT FIRSTNAME
                 , LASTNAME
             FROM  Entity
             GROUP BY
                   FIRSTNAME
                 , LASTNAME
             HAVING
                   count(*) > 1
           ) ent_dup
     INNER JOIN
           Entity  AS Ent
      ON   ent.FIRSTNAME = ent_dup.FIRSTNAME
       AND ent.LASTNAME  = ent_dup.LASTNAME
     INNER JOIN
           ACCOUNT AS act
      ON   act.ACCOUNT = Ent.ACCOUNT
     INNER JOIN
           Address AS adr
      ON   adr.Key = act.Key
       AND CURDATE() BETWEEN adr.DEFF AND adr.STOPDATE - 1 DAY
     ORDER BY
           act.ACCOUNT
         , Ent.ENTITYID

  4. #4
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Example 3:
    Code:
    SELECT c.account
         , e.entityid
         , e.firstname
         , e.lastname
         , d.addressstreet
         , d.postal
         , d.country
     FROM  (SELECT e.*
                 , COUNT(*)
                      OVER( PARTITION BY firstname , lastname ) AS count_names
             FROM  entity AS e
           ) AS e
     INNER JOIN
           account AS c
      ON   c.account = e.account
     INNER JOIN
           address AS d
      ON   d.Key     = c.Key
       AND CURDATE() BETWEEN d.deff AND d.stopdate - 1 DAY
     WHERE e.count_names > 1
     ORDER BY
           account
         , entityid
    By the way,
    is CURDATE() an UDF(user defined funcion)?
    Why not use the CURRENT DATE (or CURRENT_DATE) special register?

  5. #5
    Join Date
    Jan 2013
    Posts
    358
    Provided Answers: 1
    >> Hope anybody here can help me with following (The following shows the table structure and the issue with the SQL Query I am facing) <<

    Where is your DDL?? Does your boss make you programming without any idea of what the data looks like? What you did post is garbage.

    Why do you think that “Entity” (singular name so there is only one of them!) is clear precise specific data element name? How could you be more vague? Remember your first course in Logic and the Law of Identity?? “To be is to be something in particular. To be nothing in particular or to be everything in general is to be NOTHING AT ALL.” This is the foundation of all Western Thought and you missed it.

    What you have done is build a 1970's pointer chain database in SQL! You just say “key” and I had to laugh at you. This is wrong. Not a little wrong. TOTALLY wrong.

    CREATE TABLE Accounts –-more than one!! Not in your DDL
    (account_nbr CHAR(5) NOT NULL PRIMARY KEY,
    account_type CHAR(2) NOT NULL
    CHECK (account_type IN (..)),
    ..);

    The following table is total crap. Each country needs a table. In properly designed schema, you will not be correcting your data with 1960's COBOL string functions.

    CREATE TABLE Addresses
    (street_address VARCHAR(35) NOT NULL,
    postal_code VARCHAR(12) NOT NULL,
    country_code CHAR(3) NOT NULL, --iso codes, please
    ..);

    >> What I need to get now is the following: I need to show from the above 3 tables all information, where the First_name, Last_name and Postal_code combination appears more than once. <<

    Gee, a real SQL programmer would have prevented redundancy with DDL. Which you did not post. You have just been told by someone with good credits in SQL (wrote the Standards, nine books, 1200+ articles, etc) that you need to throw out everything you have and the way you do it. Please consider this.

  6. #6
    Join Date
    Sep 2012
    Posts
    20
    @Celko
    a) Just to let you know, this is a database that is absolutely fix meaning it cannot be amended as I am not the system host and was developed that way long time ago. I just wanted to extract some information out of it.
    b) The column names + tablenames where chosen that time like this. Again not by me. In addition I simplified some tables +columns that just the data for this question is reflected.
    So this question here was not about "Did I develop a good tablestructure, did I choose nice names etc, etc" it was just about the SQL Query I wrote.


    @tonkuma
    Thanks very much for your help. Will try your examples and let you know. But I think it really has to do with the indexing of the table+columns why the SQL is so slowly executed.

  7. #7
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Celko View Post
    blah blah blah
    Is there a smiley for raised eyebrows?
    ---
    "It does not work" is not a valid problem statement.

  8. #8
    Join Date
    Sep 2012
    Posts
    20
    @tonkuma.
    The slowness really comes from the indexing of the tablecolumns. But thanks for the info that "It is not neccesary to concatenate columns into one string to use IN predicate.
    ", cause this will speed up the query execution as well.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •