Results 1 to 7 of 7
  1. #1
    Join Date
    Dec 2010
    Posts
    3

    Unanswered: Deleting duplicates -Updates running long

    Hi All,


    I have tables as below:

    Table: phone

    Columns:
    Phone_id
    phone_type
    phone_num
    extension

    Rows : phone_id phone_type phone_num extension

    123 OFFICE 9875378930 345
    124 HOME 2345789400

    Table: company_phone

    company_id (pk -->company(company_id))
    phone_id(pk -->phone(phone_id))
    create_date
    modified_date

    table: company

    company_id (pk)
    company_name
    company_number

    We have duplicates in phone table for combination of phone_type,phone_num and extension.

    I am trying to retain one unique record in phone table :

    I am updating company_phone with the phone id of the unique row retained and then
    deleting the duplicates from phone table.

    I am using a PLSQL block as :

    cursor
    fetch all
    duplicate phone records

    here if we have 3 duplicated records,I will keep 1 row
    and for other 2 id's,I will go and update the company phone table
    with the id kept above.

    get all the company phone records for
    phone id got in above step
    and update

    Here I have arouund 2 lakhs of duplicate records out of which I have to
    keep only 3500 records and delete rest.

    this requires updation of company_phone table records of around 2 lakhs rows.

    This approach is taking houurs of time.

    Could anyone suggest any alternate abd better way of doing this ?

    Thanks in advance.
    Vijay

  2. #2
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    >keep only 3500 records and delete rest.
    create "temp_phone" table with 3500 rows
    rename phone table to new/different name
    rename temp_phone to phone table
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  3. #3
    Join Date
    Dec 2010
    Posts
    3
    Hi,

    Thanks for the reply.I cant use your approach.I have to update child records before deleting the duplicates from master table (phone).

    I am using a PLSQL block as :

    DECLARE

    CURSOR phone_duplicate_cur IS
    SELECT ph.phone_id,
    temp.*
    FROM phone ph,
    (SELECT COUNT(*) count,
    phone_type,
    phone_number,
    extension
    FROM PHONE
    GROUP BY phone_type,
    phone_number,
    extension
    HAVING COUNT(*) > 1 ) temp
    WHERE ph.phone_type = temp.phone_type
    AND ph.phone_num = temp.phone_num


    CURSOR intermediate_cur(in_phone_id NUMBER) IS
    SELECT 'COMPANY' intermediate,
    company_id intermediate_id,
    phone_id
    FROM company_phone
    WHERE phone_id = in_phone_id
    UNION ALL
    SELECT 'CONTACT' intermediate,
    cp.contact_id intermediate_id,
    phone_id
    FROM contact_phone cp
    WHERE cp.phone_id = in_phone_id

    BEGIN


    FOR rec_duplicate IN phone_duplicate_cur
    LOOP

    IF NVL(prev_phone_num,0) <> rec_duplicate.phone_num THEN

    v_phone_id := rec_duplicate.phone_id;
    END IF;

    IF v_phone_id <> rec_duplicate.phone_id THEN

    FOR rec_intermediate IN intermediate_cur(rec_duplicate.phone_id)
    LOOP
    --This is to update child records
    IF rec_intermediate.intermediate = 'COMPANY' THEN

    UPDATE company_phone
    SET phone_id = v_phone_id
    WHERE company_id = rec_intermediate.intermediate_id
    AND phone_id = rec_intermediate.phone_id;


    ELSIF rec_intermediate.intermediate = 'CONTACT' THEN

    UPDATE contact_phone
    SET phone_id = v_phone_id
    WHERE contact_id = rec_intermediate.intermediate_id
    AND phone_id = rec_intermediate.phone_id;
    END IF;

    DELETE FROM phone
    WHERE phone_id = rec_duplicate.phone_id;

    END IF;

    prev_phone_num := rec_duplicate.phone_num;


    END LOOP;

    COMMIT;

    EXCEPTION
    WHEN OTHERS THEN
    RAISE;
    END;

    this script is taking around 50 minutes.
    Please advise...any alternate ways

  4. #4
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    >this script is taking around 50 minutes.
    >Please advise...any alternate ways

    post EXPLAIN PLAN for SQL so we can see where time is being spent.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  5. #5
    Join Date
    Dec 2010
    Posts
    3
    Hi anacedent,

    Thanks for reply again.I think the sql's in first cursor is taking around 17 seconds.
    second cursor sql will be executed for each phone id retrieved from 1st cursor.
    first cursor fetches around 2 lakhs records. so the second SQL is executed for 170000 times.
    I think we have to some how tune this section.
    any way we can do select duplicates and update of child tables and delete of duplicates using SQL ?
    I will post the explain plan on monday ?
    Thanks ..
    Vijay

  6. #6
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    row by row is slow by slow.
    a pure SQL solution will be MUCH faster than PL/SQL
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  7. #7
    Join Date
    Feb 2006
    Posts
    173
    Provided Answers: 1
    This is some sample sql you can modify/use to remove duplicates:

    Code:
    -- SQL to remove duplicate rows from database table.
    
    
    -- Drop Views will fail the first time this runs because the views do not exist yet!!
    Drop View your_schema.ViewD1;
    Drop View your_schema.ViewD2;
    Drop View your_schema.ViewD3;
    Drop View your_schema.ViewD4;
    
    
    -- This view groups and only includes rows that have the same value in the selected columns on more than 1 row.
    Create View your_schema.ViewD1 ( phone_type, phone_num, extension, count ) AS
     Select phone_type, phone_num, extension, Count(*)
      From your_schema.company_phone
     Group by phone_type, phone_num, extension
     Having Count(*) > 1;
    
    
    -- View is created by joining original table to first view created to retrieve
    -- the rowid of rows that have the dupilcate values in the selected columns.
    Create View your_schema.ViewD2 ( phone_type, phone_num, extension, Dup ) AS
     Select a.phone_type, a.phone_num, a.extension, a.rowid
      From your_schema.company_phone a inner join your_schema.ViewD1 b
        on a.phone_type  =  b.phone_type
       and a.phone_num   =  b.phone_num
       and a.extension   =  b.extension;
    
    
    -- View is created to get the last rowid of a duplicate row in a group of duplicate rows.
    Create View your_schema.ViewD3 ( phone_type, phone_num, extension, Last_rowid ) AS
     Select phone_type, phone_num, extension, MAX( rowid )
      From your_schema.ViewD2
     Group by phone_type, phone_num, extension;
    
    
    -- Create view on original table making the rowid of each row a column so it can be looked
    -- up in the following "Delete" SQL statement.
    Create View your_schema.ViewD4 AS
     Select phone_type, phone_num, extension, rowid AS Ori_rowid
      From your_schema.company_phone;
    
    
    -- Delete last duplicate row from per group of dupilcate rows.
    DELETE From your_schema.ViewD4
     WHERE Ori_rowid in ( Select Last_rowid From your_schema.ViewD3 );

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •