Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Dec 2002
    Location
    Préverenges, Switzerland
    Posts
    3,740

    Unanswered: SQL help - remove duplicates (with rules)

    i am trying to beat some flatfile data into shape
    data is of the form

    string1, string2, string3, date4, integer5, fieldX, fieldY, fieldZ

    in principle the combination
    string1, string2, string3, date4
    is supposed to be unique.

    it is not!
    i have a thousand repeated records with same or different integer5

    i would like to keep the record with the highest integer5 and delete the other replicates.

    Code:
    <---- supposedly unique ------>							
    string1	string2	string3	date4	    integer5	fieldX	              
    A	A	A	01/01/07	1	this one is unique
    B	B	B	01/01/07	3	keep either one of this pair with identical integer5
    B	B	B	01/01/07	3	keep either one of this pair with identical integer5
    C	C	C	02/02/07	5	this one is unique
    D	D	D	03/03/07	6	one of a trio - keep this one (highest integer5)
    D	D	D	03/03/07	5	one of a trio - drop this one
    D	D	D	03/03/07	4	one of a trio - drop this one
    anyone got some clever SQL ?

    i don't care how many steps or how ugly the SQL is - this is a one-time import so it can take as long as it likes.

    izy
    currently using SS 2008R2

  2. #2
    Join Date
    Apr 2005
    Location
    Zagreb - Croatia
    Posts
    372

    Hello izy!

    I suggest;
    Make a new Table with the same fields.
    Put the PRIMARY KEY on String1, String2, String3, Date4 fields.
    Make an Append query (append in this new Table), and put
    Descending sort on Integer5 field. Run this query.
    Put Ascending sort on all Primari key fields.
    Last edited by MStef-ZG; 06-19-08 at 04:48.

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Hi Izy

    You can't use aggregates for update queries but maybe you can for deletes. If not you will need to insert the return from the derived table into a physical table.

    Code:
    DELETE naughty_file.*
    FROM naughty_file
    LEFT OUTER JOIN
        (
            SELECT      string1
                , string2
                , string3
                , date4
                , MAX(integer5) AS max_interger5
            FROM    naughty_file
            GROUP BY string1
                , string2
                , string3
                , date4
        ) AS maxywaxy
    ON     maxywaxy.string1     = naughty_file.string1
    AND    maxywaxy.string2     = naughty_file.string2
    AND    maxywaxy.string3     = naughty_file.string3
    AND    maxywaxy.date4         = naughty_file.date4
    AND    maxywaxy.max_interger5    = naughty_file.interger5
    WHERE    maxywaxy.string1 IS NULL
    HTH

  4. #4
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Code:
    SELECT unique_x.string1
         , unique_x.string3
         , unique_x.string3
         , unique_x.date4
         , unique_x.integer5
    FROM  (
           SELECT string1
                , string2
                , string3
                , date4 As [date4]
                , Max(integer5) As [integer5]
           FROM   @x
           GROUP
               BY string1
                , string2
                , string3
                , date4
          ) As [unique_x]
    INNER
     JOIN @x x
       ON x.string1  = unique_x.string1
      AND x.string2  = unique_x.string2
      AND x.string3  = unique_x.string3
      AND x.date4    = unique_x.date4
      AND x.integer5 = unique_x.integer5
    Last edited by gvee; 06-19-08 at 05:18. Reason: changed one alias
    George
    Home | Blog

  5. #5
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Shoulda refreshed... Poots - why LEFT over INNER?

    Oh and to explain why mine's a SELECT - it's because you should create a new table with RI and then INSERT-SELECT the results in to it (which is what poots was getting at methinks)
    George
    Home | Blog

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Nope - mine was to delete the dupies from the source table. Then one may start adding PKs and constraints. Creating a new table using your query is another viable solution.

    What comes first? The data or the constraint? Hmmm.......

  7. #7
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by pootle flump
    What comes first? The data or the constraint?
    Is that a rhetoric

    ...In fact, you're effectively using an INNER join considering your WHERE clause
    George
    Home | Blog

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    No I'm not you plonker. I'm effectively creating a NOT EXISTS clause.

  9. #9
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    *feels like a plonker*
    I don't quite see it, but I do see why it's not a INNER now .
    George
    Home | Blog

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    If you changed mine to a select statement and UNIONED it to yours we would recreate the table.

    Your query selects the records Izy wants to retain. My query selects the records Izy wants to delete.

    My where clause says "include only records that do not appear in the derived table". Since the derived table constains all the records we want to keep - vwalah - the naughty records are deletified.

  11. #11
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Ohhhhhhhh!
    George
    Home | Blog

  12. #12
    Join Date
    Apr 2004
    Location
    metro Detroit
    Posts
    634
    I run into this with dates quite often. I create a MaxDate query then link it to the table in a second query.

    In your case, create a query that groups by strings 1-3 and date4 and grabs the max of integer5. Then create an append query with the Max query and the original table linked on all 5 fields.

  13. #13
    Join Date
    Dec 2002
    Location
    Préverenges, Switzerland
    Posts
    3,740
    thanks much people, but here's how it is after a quick look...

    poots:
    your original wont fly
    replacing interger with integer and destroying some of your patient indenting produces a "cant delete from table" message.
    switching to SELECT instead of DELETE almost works - it correctly finds the lower two integer5 in a trio, but doesnt return either of a pair having the same integer5 (?? Max can't decide what to return from two identical values ??)

    george:
    Access barfs at your SQL. cant even save. tried several modifs and still couldnt save. bah!

    rogue:
    you are in the same direction as poots & george but they are trying it in a single query. i think i tried your approach over the weekend but i'm a bit dizzy after staring at this for so long - i'll try again.

    if anyone wants to have a go, i attach a dummy db with some test data and three versions of poots' query. i'm looking for four records retained (or three delete candidates identified)

    izy
    Attached Files Attached Files
    currently using SS 2008R2

  14. #14
    Join Date
    Apr 2004
    Location
    metro Detroit
    Posts
    634
    Copy the structure of demoTable to tblGoodRecs.

    Create query qryMaxInt:
    SELECT demoTable.string1, demoTable.string2, demoTable.string3, demoTable.date4, Max(demoTable.integer5) AS MaxOfinteger5
    FROM demoTable
    GROUP BY demoTable.string1, demoTable.string2, demoTable.string3, demoTable.date4;


    Create query qryApdGoodRecs:
    INSERT INTO tblGoodRecs ( string1, string2, string3, date4, integer5, fieldX, fieldY, fieldZ )
    SELECT demoTable.string1, demoTable.string2, demoTable.string3, demoTable.date4, demoTable.integer5, First(demoTable.fieldX) AS FirstOffieldX, First(demoTable.fieldY) AS FirstOffieldY, First(demoTable.fieldZ) AS FirstOffieldZ
    FROM demoTable INNER JOIN qryMaxInt ON (demoTable.string1 = qryMaxInt.string1) AND (demoTable.string2 = qryMaxInt.string2) AND (demoTable.string3 = qryMaxInt.string3) AND (demoTable.date4 = qryMaxInt.date4) AND (demoTable.integer5 = qryMaxInt.MaxOfinteger5)
    GROUP BY demoTable.string1, demoTable.string2, demoTable.string3, demoTable.date4, demoTable.integer5;

    Run qryApdGoodRecs.
    Last edited by rogue; 06-19-08 at 14:47.

  15. #15
    Join Date
    Dec 2002
    Location
    Préverenges, Switzerland
    Posts
    3,740


    bravo rogue!
    it works exactly as advertised and exactly as desired.

    tomorrow i will dig out my weekend attempts and see what i did stupid that stopped my attempt.

    thanks, izy
    currently using SS 2008R2

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •