Results 1 to 15 of 15
  1. #1
    Join Date
    Mar 2003
    Posts
    97

    Unanswered: Deleting dupes in special cases

    I need to delete all rows that match at least one of the account_id values of another row *and* that has the same email address. However, if they have the same email address and none of the account_id values then I need to keep it. I've attached a sample dataset along with the expected results.

    I have this:
    DELETE [acctID_emailAddress_tmp] FROM [acctID_emailAddress_tmp]
    JOIN
    (select emailaddress, account_id, max(contact_id_tmp) max_cid
    from [acctID_emailAddress_tmp]
    group by emailaddress, account_id) AS tempImportTable
    ON tempImportTable.[emailaddress] = [acctID_emailAddress_tmp].[emailaddress]
    WHERE [acctID_emailAddress_tmp].[contact_id_tmp] < tempImportTable.[max_cid]
    AND tempImportTable.[account_id] = [acctID_emailAddress_tmp].[account_id];

    but it doesn't work since it's keeping the subset of the dupe row(s).

    Can someone shed some light?

    TIA
    Attached Files Attached Files

  2. #2
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    So why do you keep 2 instead of 1?

    EDIT: You won't be able to answer that one..unless you say that because it comes "first"...is that it?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  3. #3
    Join Date
    Mar 2003
    Posts
    97
    Because it comes first

  4. #4
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    OK

    Code:
    USE Northwind
    GO
    
    SET NOCOUNT ON
    CREATE TABLE myTable99(temp_id int, contact_id_tmp int, account_id int, emailAddress varchar(255))
    GO
    
    INSERT INTO myTable99(temp_id, contact_id_tmp, account_id, emailAddress)    
    SELECT 1,           1,              3,           'test@acme.com' UNION ALL
    SELECT 2,           1,              5,           'test@acme.com' UNION ALL
    SELECT 3,           1,              10,          'test@acme.com' UNION ALL
    SELECT 4,           2,              12,          'test@acme.com' UNION ALL
    SELECT 5,           2,              3,           'test@acme.com' UNION ALL
    SELECT 6,           2,              9,           'test@acme.com'
    GO
    
    SELECT * 
      FROM ( SELECT * FROM myTable99 l 
     	  WHERE contact_id_tmp IN (
    	SELECT MAX(contact_id_tmp) 
    	FROM (
    		  SELECT contact_id_tmp, emailAddress
    		    FROM myTable99
    		GROUP BY contact_id_tmp, emailAddress
    		  HAVING COUNT(*) > 1) AS XXX)
    	) AS L
      JOIN ( SELECT * FROM myTable99 l 
     	  WHERE contact_id_tmp NOT IN (
    	SELECT MAX(contact_id_tmp) 
    	FROM (
    		  SELECT contact_id_tmp, emailAddress
    		    FROM myTable99
    		GROUP BY contact_id_tmp, emailAddress
    		  HAVING COUNT(*) > 1) AS XXX)
    	) AS R
        ON L.account_id = R.account_Id
    GO
    
    SET NOCOUNT OFF
    DROP TABLE myTable99
    GO
    You can convert that in to a delete...let me know if you can't
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  5. #5
    Join Date
    Mar 2003
    Posts
    97
    Thanks very much. Looks like this will do the trick if I ever figure out the delete statement for it =\

  6. #6
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Did you cut and paste the example in to QA and run it?

    Also, I would use that as SELECT to see the population BEFORE you unleash the delete...oh and make sure you back up the table BEFORE you do the DELETE
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  7. #7
    Join Date
    Mar 2003
    Posts
    97
    Yes, I ran it in QA and got the expected results. I need to delete R.contact_id_tmp

    Currently trying to figure out the delete statement...

  8. #8
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Come on...that's too easy

    Code:
    DELETE FROM myTable99 WHERE contact_id_tmp IN (
    SELECT R.contact_id_tmp
      FROM ( SELECT * FROM myTable99 l 
     	  WHERE contact_id_tmp IN (
    	SELECT MAX(contact_id_tmp) 
    	FROM (
    		  SELECT contact_id_tmp, emailAddress
    		    FROM myTable99
    		GROUP BY contact_id_tmp, emailAddress
    		  HAVING COUNT(*) > 1) AS XXX)
    	) AS L
      JOIN ( SELECT * FROM myTable99 l 
     	  WHERE contact_id_tmp NOT IN (
    	SELECT MAX(contact_id_tmp) 
    	FROM (
    		  SELECT contact_id_tmp, emailAddress
    		    FROM myTable99
    		GROUP BY contact_id_tmp, emailAddress
    		  HAVING COUNT(*) > 1) AS XXX)
    	) AS R
        ON L.account_id = R.account_Id
    )
    
    GO
    
    SELECT * FROM myTable99
    GO
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  9. #9
    Join Date
    Mar 2003
    Posts
    97
    OK. This just got uglier... Turns out accounts will be stored as a comma separated list in one field. See the attachment for an example.

    Now, is it possible to de-dupe from a comma separated list all in one field? I know I could create a new table and loop over each account-emailAddress combo, but I'd like to know if it's possible to avoid doing this (since it will be much slower) and de-dupe straight from the source table.

    Thanks again
    Attached Files Attached Files

  10. #10
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Read this

    http://www.sqlteam.com/forums/topic....udf,csv,string

    Then create a new table where all the values are on 1 row, then use the query in this thread
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  11. #11
    Join Date
    Mar 2003
    Posts
    97
    I tried the UDF but I'm getting 0 rows back. I've attached the sample dataset (1 row) and the UDF call.
    Attached Files Attached Files

  12. #12
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    You really should normalized your data....I feel I'm doing more harm than good...

    EDIT: This is pure theft from this Article written by A most Valuable Yak , Rob Volk


    Code:
    USE Northwind
    GO
    
    SET NOCOUNT ON
    SET NOCOUNT ON
    CREATE TABLE myTable99(temp_id int IDENTITY(1,1), contact_id_tmp int, account_id int, emailAddress varchar(255))
    CREATE TABLE myTable00(contact_id int, accounts varchar(200), emailAddress varchar(255))
    CREATE TABLE Tally(ID int)
    GO
    
    INSERT INTO myTable00(contact_id, accounts, emailAddress)
    SELECT 1,		'1,2,3,10,15',		'test@acme.com' UNION ALL
    SELECT 2,		'5,10,20,30',		'test@acme.com' UNION ALL
    SELECT 3,		'88,42',		'test@acme.com' UNION ALL
    SELECT 4,		'1,2,3',		'test2@acme.com'
    GO
    
    DECLARE @x int
    SELECT @x = 1
    WHILE @x < 100
      BEGIN
    	INSERT INTO Tally(ID) SELECT @x
    	SELECT @x = @x + 1
      END
    GO
    
    INSERT INTO myTable99(contact_id_tmp, Account_Id, emailAddress)
    SELECT contact_id AS contact_id_tmp, 
    NullIf(SubString(',' + accounts + ',' , ID , CharIndex(',' , ',' + accounts + ',' , ID) - ID) , '') AS Account_Id,
    emailAddress
    FROM Tally, myTable00 
    WHERE ID <= Len(',' + accounts + ',') AND SubString(',' + accounts + ',' , ID - 1, 1) = ',' 
    AND CharIndex(',' , ',' + accounts + ',' , ID) - ID > 0  --remove this line to keep NULL rows
    
    SELECT * FROM myTable99
    GO
    
    -- Now just add the other SQL
    
    SET NOCOUNT OFF
    DROP TABLE myTable99
    DROP TABLE myTable00
    DROP TABLE Tally
    GO
    Last edited by Brett Kaiser; 02-08-05 at 13:01.
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  13. #13
    Join Date
    Mar 2003
    Posts
    97
    This is exactly what I needed, but...users have decided to go back and separate each account in individual rows!!

    Thanks a lot for your help. I guess I'll check out sqlteam.com from now on before posting here.

  14. #14
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Even if they do...the data is still not normalized if you need an operation like this...anyway, it was a neat exercise...
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  15. #15
    Join Date
    Mar 2003
    Posts
    97
    This is just 1/3 of the entire process. The data is normalized in its final state.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •