Results 1 to 4 of 4
  1. #1
    Join Date
    Aug 2010
    Posts
    2

    Unanswered: text file remove duplicates ?

    I have a large text file, over 50MB with some entries and I want to remove the duplicates. I have tried using a localhost mysql server to add it as unique values but it loaded my processor too high and the temperature of my Mac going nearly 90 C. Moreover it sorted the final result alphabetically but I wanted to keep it in same order.

    If there is other way of doing this, even slow but not so processor intensive, what do you suggest ?

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    how do you know what is the valid row, and what is the invalid row(s)?

    the order a row is inserted inot a db has no intrinsic value or sort order, it is how the db stuffs the data into the db. if you want a specific order then you have to expressly tell the db what order you want when you extract the data using an order by sub clause on your select statement.

    assuming this is a one off..
    I'd be tempted to stuff the data into the db making certain there is an autonumber or some other form means of identifying what the original sequence is. then identify what the duplicates are, get rid of 'em, then put on your unique constraint. I'm ussing your current allpha sort order is beacue your primary key is an alpha one.

    another trechniquer mya be to write a data take on programme, that load data, checks for dupliocates, applies any corrections /deletions as required.. however this will be a lot slower than a bulk file upload.

    as to whether you apple gets hot or not that sounds more like a problem with that specific box. either the device is defective (so you may need a new iFan, or perhaps some iCables), or you may need to opent he box up and make sure the cooling system isn't clagged up with iDust
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    Aug 2010
    Posts
    2
    thank you for replying and excuse my low knowledge.

    The duplicate values are identical so it does not matter which is deleted. I would like to keep the order in which they come first in the list.

    I dont understand the meaning of "unique constraint" and for example I have created a table with id and name, both unique. However after adding some data into it, I can still insert duplicates, and that makes me confuse. Isnt unique constrain supposed not to allow to add/insert/import duplicates ?

  4. #4
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    are the ID and name columns unique
    how have you defined the ID

    a unique constraint is where that value (or combination of values if you are using more than one column in the key definition) must be unique.

    so if your ID column is an autonumber column then it will always be unique as the db engine makes the ID different each time.
    I'd rather be riding on the Tiger 800 or the Norton

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •