Results 1 to 2 of 2
  1. #1
    Join Date
    Jan 2002
    Posts
    189

    Unanswered: deleting duplicates

    I have the following table:

    Code:
    CREATE TABLE  `emlContacts_backup` (
      `id` int(10) unsigned NOT NULL auto_increment,
      `email` varchar(150) default NULL,
      `firstname` varchar(45) default NULL,
      `lastname` varchar(45) default NULL,
      `unsubscribed` tinyint(1) default '0',
      `client_id` int(10) unsigned default NULL,
      `list_id` int(10) unsigned default NULL,
      `delivery_failure` int(10) unsigned default '0',
      PRIMARY KEY  (`id`)
    )
    I need to find all the entries with the same email and list_id and, for any duplicates found, delete all but the most recent (ie: the one with the highest id)

    I've got this far:

    Code:
    SELECT c2.id, c2.email, c2.list_id
    FROM emlContacts_backup c2
    INNER JOIN (
      SELECT email, list_id, COUNT(*) FROM emlContacts_backup GROUP BY email, list_id HAVING COUNT(*) > 1
    ) AS c
    ON c.email = c2.email AND c.list_id = c2.list_id
    WHERE c2.list_id = 9
    ORDER BY c2.list_id, c2.email
    ... which gets me a list of all the duplicate entries. However, I'm stuck on the second bit: turning that into something that'll delete all but the most recent of each group of duplicates found.

    I've got a feeling that it's just a case of wrapping that select twice, in something like this:

    Code:
    DELETE FROM emlContacts_backup WHERE id IN(
    
      SELECT c2.id
      FROM emlContacts_backup c2
      INNER JOIN (
        SELECT email, list_id, COUNT(*) FROM emlContacts_backup GROUP BY email, list_id HAVING COUNT(*) > 1
      ) AS c
      ON c.email = c2.email AND c.list_id = c2.list_id
      WHERE c2.list_id = 9
      ORDER BY c2.list_id, c2.email
    
    )
    
    AND id not in(
    
      SELECT MAX(c2.id)
      FROM emlContacts_backup c2
      INNER JOIN (
        SELECT email, list_id, COUNT(*) FROM emlContacts_backup GROUP BY email, list_id HAVING COUNT(*) > 1
      ) AS c
      ON c.email = c2.email AND c.list_id = c2.list_id
      WHERE c2.list_id = 9
      ORDER BY c2.list_id, c2.email
    
    )
    ... but I'm not sure and I thought I'd ask. Besides, it looks nastily inefficient to me and I was wondering if there was a faster way.

    ETA: yes, it IS horribly, horribly slow. There are 10000 records in that table and it's been running ten minutes already. I really hope there's another way
    Last edited by Spudhead; 09-25-08 at 09:03.

  2. #2
    Join Date
    Sep 2008
    Posts
    8
    If you only have 10000 this might not be that bad

    Code:
    DELETE e1 FROM emlContacts_backup e1, emlContacts_backup e2  
    	WHERE e1.email = e2.email 
    	AND e1.list_id = e2.list_id 
    	AND e1.id < e2.id
    Last edited by BargainPredator; 09-27-08 at 01:02.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •