Results 1 to 6 of 6
  1. #1
    Join Date
    Oct 2003
    Posts
    38

    Unanswered: Remove Duplicates From Table

    I have an Organization table which has over 1400 names of organizations we work with or worked with at one time or another. But we have a lot of duplicates like Advanced Solutions International and Advanced Solutions International, Inc. just small differences and I would like to clean this table up. How can I remove the duplicates and how can I keep this from happening again?
    Currently, the users select the organizations from a drop down list and when they add an organization they have a pop up message that asks if they are sure they want to add a new organization. How do I restrict this enough to stop duplicates but flexible enough to allow them to add org. names?

  2. #2
    Join Date
    Sep 2001
    Location
    Chicago, Illinois, USA
    Posts
    601
    This is not a programming issue, this is a management policy issue.

    1,400 records is not that many records. How often do they add a record? Once, twice a day?

    You should have one person, an administrator, who can add new records.

    Everyone else, hands off.

    Using a SOUNDEX algorithm is another alternative. Using a SOUNDEX algorithm, if someone enters a new record, you could pop up a form with other records which are similar and ask the user if the record they are creating duplicates one of the previous records?

  3. #3
    Join Date
    Oct 2003
    Posts
    38
    Thanks, I'll look into the SOUNDEX algorithm.
    But is there a way to clean up the duplicates that I currently have. They all have different AutoIDs so the duplicate query I ran didn't find anything. So that's why I figure I would have to do this through code.

  4. #4
    Join Date
    Oct 2003
    Posts
    38
    Sorry, to answer your question.
    It's not very often they add records maybe 2 or 3 times a month or maybe every other month.

  5. #5
    Join Date
    Sep 2003
    Location
    The extremely Royal borough of Kensington, London
    Posts
    778
    delete from table
    where id IN (select min(id) from table group by columns having count(*) >1);
    Bessie Braddock: Winston, you are drunk!
    Churchill: And Madam, you are ugly. And tomorrow, I'll be sober, and you will still be ugly.

  6. #6
    Join Date
    Sep 2001
    Location
    Chicago, Illinois, USA
    Posts
    601
    The problem is that there are NO duplicates in your table.

    You may think they are duplicates, but how is the computer supposed to know they are duplicates ?

    Can you give us a rule to follow ? I don't think so.

    As long as one character is different, they are different, they are not duplicates.

    Now, you might come up with a rule that says, if they are the same, except for a trailing 'Inc', then they are duplicates. And that would be okay. But then two rows are different because someone added a trailing "inc." (note the period).

    You are asking to do something that is almost impossible for a computer.

    The best you can do is establish a set of rules as to what constitutes a duplicate. But to establish those rules would require more time than just deleting the duplicates by hand.

    The second best is to use SOUNDEX. Also time consuming.

    But, the bottom line is that this is a management issue.

    Someone is doing a poor job of estabhinshing who and how can add records.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •