Results 1 to 11 of 11
  1. #1
    Join Date
    Feb 2005
    Location
    Puerto Rico
    Posts
    16

    Question Unanswered: How to delete repeated group of rows

    Peers,

    I have a table which has no constraint at all (not supposed). By mistake, I have balances triplicated for the same month and site.

    I need to delete the last two loading of data made to the table for that specific month and preserve the first load of data.

    Does Oracle has an internal row id or row num. that would allow me to delete the records I don't want?

    I tried rownum within a query but this field appears to identify the query records, not the id of the record in the table.

    Regards,

    J.C.

  2. #2
    Join Date
    Feb 2005
    Location
    Leesburg, VA
    Posts
    42
    Oracle has a ROWID you can select out, but if you've done ANY deletes or updates to the data since the initial insertion it is not a valid tool for basing your deletes on.

    It's very possible to delete the duplicates, but not always knowing which one was the original unless you have a create_date column or have not changed your rows.
    Steven Karam
    Oracle 10g Certified Master
    Web: OrclDBA.com
    Email: steve@orcldba.com

  3. #3
    Join Date
    Feb 2005
    Location
    Leesburg, VA
    Posts
    42
    Actually, I'm sorry...even then it might be a problem if you have been deleting or updating any other tables that exist on the same DB file in the same data blocks...not sure what you will be able to do in this case.
    Steven Karam
    Oracle 10g Certified Master
    Web: OrclDBA.com
    Email: steve@orcldba.com

  4. #4
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    If it's just a "remove duplicates" problem, as I understand it, an easy way to do what you want would be to
    Code:
    CREATE TABLE temptable AS 
    SELECT DISTINCT (your columns) 
    FROM yourtable 
    WHERE (conditions on month and site);
    so as to put the data you want to keep in another table (only one time ), then
    Code:
    DELETE FROM yourtable
    WHERE (conditions on month and site)
    so as to delete the same data, and finally
    Code:
    INSERT INTO yourtable 
    SELECT (your columns) 
    FROM temptable;
    
    Commit;
    
    Drop table temptable;
    HTH & Regards,

    RBARAER
    Last edited by RBARAER; 03-01-05 at 11:11.

  5. #5
    Join Date
    May 2004
    Location
    Dominican Republic
    Posts
    721
    "I need to delete the last two loading of data made to the table for that specific month and preserve the first load of data."

    And how do you identify the "first" from the "last" there ?

  6. #6
    Join Date
    Feb 2005
    Location
    Puerto Rico
    Posts
    16

    Cool Clarifying my case

    Isn't the loading to the table sequential? It is an append process from the ETL tool. Am I correct in my statement?

    Plus, My staging table is equal in total records, sum of qty and sum of costs, as the first 13,579 records for the specific period in my DW Target table, the one in question.

    Let me know your thoughts Martinez...I'll appreciate your feedback.

    J.C.

  7. #7
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Isn't the loading to the table sequential?
    There is nothing in Oracle that would allow you to say, between two records that have the exact same data, "this one was inserted before the other", NOTHING unless YOU add a column that says so (insert datetime, an integer incremented at each insert...). So, there is nothing here to help you determine which rows belonged to the first, second or third load. Hence my solution (I supposed the 3 loads inserted exactly the same data) : just insert ONE copy od these data (thanks to DISTINCT) in another table ("temptable"), delete all data from these 3 loads from the first table ("yourtable"), and then insert everything from "temptable" into "yourtable". You will then have only one copy of these data left in your original table. Isn't it what you want ? If it isn't, please clarify your needs.

    Regards,

    RBARAER

  8. #8
    Join Date
    Feb 2005
    Location
    Puerto Rico
    Posts
    16

    Talking This is my preliminar solution under test

    1. I created another table called BK_INVENTORY from the original table
    with only the records that belong to the period in question.

    2. Then I identified the first 13547 records of BK_INVENTORY and
    updated its accounting_period to 200512.

    3. I inserted these records with accounting_period 200512 from
    BK_INVENTORY to the original table.

    4. Deleted all records for the accounting period in question from
    original table.

    5. In original table, updated the inserted records to
    accounting_period in question.

    6. Now it has the original records, the ones that were inserted first.

    Thanks SteveKaram and RBARAER...teamwork always has better results...the brainstorming is broad.

  9. #9
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    OK, I'm glad if you got what you wanted, but how did you "identified the first 13547 records of BK_INVENTORY" in step 2 ?

    On which criteria ?

    Did you have a field that allowed you to identify the order of inserts ?

  10. #10
    Join Date
    Feb 2005
    Location
    Puerto Rico
    Posts
    16

    Unhappy Confused

    Well.

    Now you gave me something to think about. I totally assumed that the loading was sequentila based on some browsing through the table. I compared with my staging table and the pattern of records and values was the same in the sample I compared from both.

    I will give it more thinking. The distinct didn't returned the same number of records. The second and third loadings have different costs for some items, as they changed during the month.

    I have the following fields that uniquely identify a record:
    data_set_code acc_period status_date
    site_A 200501 1-30-05

    But the status date is the timestamp in ETL Tool when extraction step occured. Not an Oracle timestamp when record was inserted. A peer by mistake loaded the "same" data.

    I'll keep you informed.

    J.C.

  11. #11
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Maybe you did not put the right conditions for the SELECT DISTINCT... As I understand it, you inserted data for "site_A" and month "200501" twice, and you want to keep them without duplicates. Status_date is a "load time" and is the only difference between the duplicates. If it's right, then I'll give you more precisions about my solution (I assume BK_INVENTORY is what I called "temptable"):

    Code:
    CREATE TABLE BK_INVENTORY(data_set_code, acc_period, status_date, otherfield1, otherfield2) AS 
    SELECT DISTINCT data_set_code, acc_period, TRUNC(sysdate), otherfield1, otherfield2 
    FROM originaltable 
    WHERE data_set_code = 'site_A'
        AND acc_period='200501';
    
    DELETE FROM originaltable 
    WHERE data_set_code = 'site_A'
        AND acc_period='200501';
    
    INSERT INTO originaltable 
    SELECT data_set_code, acc_period, status_date, otherfield1, otherfield2
    FROM BK_INVENTORY;
    
    Commit;
    
    Drop table BK_INVENTORY;
    Replace otherfield1 and otherfield2 with your fields, add the other ones you need, they're just here for the example. Inserting DISTINCT with TRUNC(SYSDATE) in place of status_date insures that you'll get each line of a load only one time.

    Of course it may not be the only solution, and yours may be correct, it's just that I don't clearly see how you "discriminated" duplicates.

    HTH & Regards,

    RBARAER

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •