Results 1 to 4 of 4
  1. #1
    Join Date
    Jan 2002
    Posts
    189

    Unanswered: Finding duplicates - is there a more efficient way?

    Hello again

    My table:

    Code:
    CREATE TABLE `emlContacts` (
      `id` int(10) unsigned NOT NULL auto_increment,
      `email` varchar(150) default NULL,
      `firstname` varchar(45) default NULL,
      `lastname` varchar(45) default NULL,
      `unsubscribed` tinyint(1) default '0',
      `client_id` int(10) unsigned default NULL,
      `list_id` int(10) unsigned default NULL,
      `delivery_failure` int(10) unsigned default '0',
      PRIMARY KEY  (`id`)
    )
    currently contains just under 13,000 records.

    I'm trying to select all the duplicate email addresses for a given list ID. I'm using this query:

    Code:
    SELECT id, email
    FROM emlContacts c
    WHERE c.id IN(
    	SELECT c2.id
    	FROM emlContacts c2
    	WHERE Exists (
    		SELECT email, Count(id)
    		FROM emlContacts
    		WHERE emlContacts.email = c2.email
    		AND emlContacts.list_id = c2.list_id
    		AND emlContacts.list_id = 23
    		GROUP BY emlContacts.email
    		HAVING Count(emlContacts.id) > 1
    	)
    )
    AND c.id NOT IN(
    	SELECT Min(id)
    	FROM emlContacts AS c3
    	WHERE Exists (
    		SELECT email, Count(id)
    		FROM emlContacts
    		WHERE emlContacts.email = c3.email
    		AND emlContacts.list_id = c3.list_id
    		AND emlContacts.list_id = 23
    		GROUP BY emlContacts.email
    		HAVING Count(emlContacts.id) > 1
    	)
    	GROUP BY email
    )
    ... and it's simply churning away without actually returning anything. Has been doing for about ten minutes now. Is there anything I can do to make it... well, "work" would be nice

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Code:
    SELECT c.id
         , c.email
         , c.firstname
         , c.lastname
         , c.unsubscribed
         , c.client_id
         , c.list_id
         , c.delivery_failure
      FROM emlContacts AS c
    INNER
      JOIN (
           SELECT list_id
                , email
             FROM emlContacts
           GROUP 
               BY list_id
                , email
           HAVING COUNT(*) > 1
           ) AS dupes
        ON dupes.list_id = c.list_id
       AND dupes.email   = c.email
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jan 2002
    Posts
    189
    Aha. Well, I've read it through over and over and I still can't say I'm entirely sure what it's doing, but it definitely works a treat Many thanks.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    the subquery produces a derived table which contains exactly what you need -- all the emails in each list_id which are duplicated

    then this derived table is joined back to the original table to list each individual row for every duplicate

    it's actually quite simple

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •