Results 1 to 8 of 8
  1. #1
    Join Date
    Jul 2007
    Posts
    3

    Unanswered: Removing dups programmatically

    Hi folks!
    Hoping someone in here has dealt with this sort of thing before......

    The problem:
    I have a table with several columns, none of which can be defined as "primary key" (reasons available on request). What I'd like to do is eliminate all but 1 copy of any record that is the same as any other record in 3 of these fields (Date, Time, and MachineName). Sort of like a "sort unique" on those 3 columns, if that makes any sense. This seems like it shouldn't be too tough, but being a novice in VBScript (or really ANY VB dialect), I'm not sure how to go about it... If only this were on a Mainframe running VM, it'd be about 5 lines of Pipe, but...

    Anyway, if any of you folks could point me in the right direction on this, I'd be very grateful....

    PS - I've already got part of the process running in a VBScript, so if this could be done in the same VBScript, that'd be a BIG plus!

    Thanks loads for any help you folks can offer,

    Joe (frustrated Mainframe guy) Parker

    JP

  2. #2
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941
    Looks like this is just a one time deal?
    I don't suppose you can access the database directly, so that you can write queries against it, rather than writing a script to execute a query?

    What I would do is create a new table, pull the DISTINCT records out of the old table into the new table. Something like:
    (this will create the second table on the fly)
    Code:
    SELECT DISTINCT FIELD1, FIELD2, ....
    INTO TABLE2
    FROM TABLE1
    Then delete the records from the old table and append the records from table2 back into table1. After that, put together a multi-column primary key to prevent the dup's in the future.

    Another method would be to add another column to the table that will autoincrement to an integer value, giving each row a distinct ID. Then you
    can write a "delete duplicate" query against the table.

    What platform is the database? (Sql Server, Access, ...)
    Inspiration Through Fermentation

  3. #3
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Code:
    select Date, Time, MachineName
     , max(field4), max(field5), ... max(fieldN)
    from daTable
    group by Date, Time, MachineName
    this result set is your de_duped table, you can either save it, print it, export it, whatever

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  4. #4
    Join Date
    Jul 2007
    Posts
    3
    Thanks for the replies, folks!

    Would that it were a "one-time deal"! Unfortunately, it's an on-going thing, so it needs to be done programmatically, preferrably in VBScript, since that's already doing part of the process....

    Basically what I'm trying to accomplish is to "select distinct" on the combination of those three fields, and save ALL the fields for records that meet the "distinct" requirements, and (here's another wrinkle) whack all the others from the table. Preferrably NOT writing them to a separate table, although, I could work with that if I HAD to...

    Thanks again for your attention/assistance

    Joe (where's VM and CMS Pipes when I need 'em?) Parker

  5. #5
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    and you can't define date, time, and machine name as a compound Primary Kay?
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  6. #6
    Join Date
    Jul 2007
    Posts
    3
    Well, in fact I have "mooshed" the three fields together, and named the column "KeyField", if that helps any.

    The situation is that there's a VBScript that runs and stuffs .csv files into a database table. These files WILL contain records that duplicate existing ones in the table. Problem is, that if I define that "KeyField" as a Primary Key, the entire "stuff the records in the database" process fails when a duplicate is encountered. For this reason I chose to NOT have a Primary Key in this table.
    It seems like I should be able to do something like a "Select distinct" on that "KeyField" column, and take the results and write them to another table, replacing whatever content was already there..... However this raises a bit of any issue with the first table becoming huge over time....

    Anyhow, that's a problem for another day....

    Thanks for any help you folks can offer!

    Joe

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    suggestion:

    have your vbscript stuff your csv files into your database as a new table

    then use sql to insert or update into your main table

    that way, the dupes never occur in the first place

    simple, really, and no mooshing required
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Quote Originally Posted by oldmainframehack
    Well, in fact I have "mooshed" the three fields together, and named the column "KeyField", if that helps any.
    FYI - you can specify all three existing fields to be part of a compound primary key. No "mooshing" needed...

    (Plus, I agree with r937 about importing to a temporary table first, and then move the data into the live table.)
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •