If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Finding duplicates - is there a more efficient way?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-16-09, 08:58
Spudhead Spudhead is offline
Registered User
 
Join Date: Jan 2002
Posts: 189
Finding duplicates - is there a more efficient way?

Hello again

My table:

Code:
CREATE TABLE `emlContacts` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `email` varchar(150) default NULL,
  `firstname` varchar(45) default NULL,
  `lastname` varchar(45) default NULL,
  `unsubscribed` tinyint(1) default '0',
  `client_id` int(10) unsigned default NULL,
  `list_id` int(10) unsigned default NULL,
  `delivery_failure` int(10) unsigned default '0',
  PRIMARY KEY  (`id`)
)
currently contains just under 13,000 records.

I'm trying to select all the duplicate email addresses for a given list ID. I'm using this query:

Code:
SELECT id, email
FROM emlContacts c
WHERE c.id IN(
	SELECT c2.id
	FROM emlContacts c2
	WHERE Exists (
		SELECT email, Count(id)
		FROM emlContacts
		WHERE emlContacts.email = c2.email
		AND emlContacts.list_id = c2.list_id
		AND emlContacts.list_id = 23
		GROUP BY emlContacts.email
		HAVING Count(emlContacts.id) > 1
	)
)
AND c.id NOT IN(
	SELECT Min(id)
	FROM emlContacts AS c3
	WHERE Exists (
		SELECT email, Count(id)
		FROM emlContacts
		WHERE emlContacts.email = c3.email
		AND emlContacts.list_id = c3.list_id
		AND emlContacts.list_id = 23
		GROUP BY emlContacts.email
		HAVING Count(emlContacts.id) > 1
	)
	GROUP BY email
)
... and it's simply churning away without actually returning anything. Has been doing for about ten minutes now. Is there anything I can do to make it... well, "work" would be nice
Reply With Quote
  #2 (permalink)  
Old 01-16-09, 09:18
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Code:
SELECT c.id
     , c.email
     , c.firstname
     , c.lastname
     , c.unsubscribed
     , c.client_id
     , c.list_id
     , c.delivery_failure
  FROM emlContacts AS c
INNER
  JOIN (
       SELECT list_id
            , email
         FROM emlContacts
       GROUP 
           BY list_id
            , email
       HAVING COUNT(*) > 1
       ) AS dupes
    ON dupes.list_id = c.list_id
   AND dupes.email   = c.email
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 01-16-09, 13:20
Spudhead Spudhead is offline
Registered User
 
Join Date: Jan 2002
Posts: 189
Aha. Well, I've read it through over and over and I still can't say I'm entirely sure what it's doing, but it definitely works a treat Many thanks.
Reply With Quote
  #4 (permalink)  
Old 01-16-09, 13:27
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
the subquery produces a derived table which contains exactly what you need -- all the emails in each list_id which are duplicated

then this derived table is joined back to the original table to list each individual row for every duplicate

it's actually quite simple

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On