Results 1 to 4 of 4
  1. #1
    Join Date
    May 2012

    Unanswered: Removing duplicates?

    We have a process in my place of work where some data is placed into a staging table. a batch job runs and takes this data (tracking numbers, delivery times etc) and attempts to put it into another table, but the job often fails because for some reason there's duplicate data, ie often the delivery guy presses the enter button twice, so it has two records for one delivery.

    In the staging table, we have about 8 columns, but the 3 important ones we look for are the tracking number, action code and action date. If there are 2 or more instances of the same tracking number, action code and action date, the job fails due to duplicates.


    Tracking Number Action Code action date
    12345 C 14052012 1106
    12345 C 14052012 1106

    So the process we have at the moment is to run this code against the staging table -

    SELECT COUNT(*) AS Expr1, TrackingNumber + ActionCode + ActionDate AS Expr2
    FROM stgtable
    GROUP BY TrackingNumber + ActionCode + ActionDate

    This will then highlight the duplicates, like below -

    Expr1 Expr2
    2 12345
    2 67575
    1 45322

    So the duplicates here are highlighted as the top 2. The way we remove these duplicates is a bit complicated. When the job tries to move the data from the staging table to another table, when it fails it outputs a .txt file with all the data. We take the tracking numbers, and one by one do a 'find' in the .txt file, look for each tracking number and delete one of the duplicate rows.

    What i'd like to be able to do, is to just run some SQL code that would identify the duplicates, which we're already doing, but then for the next step it would delete the duplicate data without us having to go and search for 80 duplicate tracking numbers in a file with over 1000 rows!

    Is this possible? Does this make sense?

  2. #2
    Join Date
    Nov 2004
    Provided Answers: 4
    In the INSERT script to copy data from the staging table to the production table, you could include DISTINCT to remove doubles.

    INSERT INTO myProductionTaable (....) 
    FROM myStagingTable
    This only works when all the columns in the duplicate rows are equal, if only the PK is the same, this simple solution won't work.
    With kind regards . . . . . SQL Server 2000/2005/2012

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  3. #3
    Join Date
    Apr 2012
    Try something like this:

    with CTE as
            ROW_NUMBER() OVER(PARTITION BY TrackingNumber, ActionCode, ActionDate 
                              ORDER BY stgID) as RowNo
        FROM stgtable
    INSERT INTO myProductionTaable (....) 
        SELECT ... FROM CTE where RowNo = 1
    Hope this helps.

  4. #4
    Join Date
    Jun 2003
    Provided Answers: 1
    Add an Identity column to your staging table.
    Then run this code to remove the duplicates, keeping only the latest entry:
    delete	A
    from	stagingtable as A
    	inner join stagingtable as B
    		on A.TrackingNumber = B.TrackingNumber
    		and A.ActionCode = B.ActionCode
    		and A.ActionDate = B.ActionDate
    		and A.IdentityValue < B.IdentityValue
    If it's not practically useful, then it's practically useless.

    blindman "sqlblindman"

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts