Results 1 to 4 of 4
  1. #1
    Join Date
    Sep 2002
    Location
    Hong Kong
    Posts
    13

    Unanswered: Fix legacy data - Missing primary key + duplication record + large table

    We have a large table which is very old and not much ppl take care about, recently there is a performance problem from the report need to query to this table. Eventally we find that this table have primary key missing and there is duplicate data which make "alter table add primary key" don't work

    Besides the data size of this table require unacceptable time to execute something like "insert into new_table_with_pk from select distinct * from old table"

    Do you have any recommendation of fixing this? As the application run on oracle , sybase and sql server, is that cross database approace will work?

  2. #2
    Join Date
    Jun 2003
    Posts
    269

    Thumbs up Re

    How big is ur table?.have u tried to create non clustered index on the field(s) of the tables which ur report using in filter condition.U have to do this modification when ur database have less load.
    I love cursor,though I never use it.Cos' I dont want to misuse it.
    ----------------------------------------------

    ----cheers
    ----http://mallier.blogspot.com

  3. #3
    Join Date
    Sep 2002
    Location
    Hong Kong
    Posts
    13
    Quote Originally Posted by mallier
    How big is ur table?.have u tried to create non clustered index on the field(s) of the tables which ur report using in filter condition.U have to do this modification when ur database have less load.
    It is about 2 millions rows. We haven't create non clustered index as we like to remove duplication and create primary key. So you mean first create index then do [insert into new_table from select distinct * from old_table]?
    Visit my homepage at http://www.carfield.com.hk

  4. #4
    Join Date
    Jun 2003
    Posts
    269

    Thumbs up Re

    first of all 2m records is not that big.distinct keyword is the killer here,it will affect the perfomance of ur query
    Some options are,

    change ur database option to 'true' for 'select into/bulkcopy' before u doing select * into.
    Code:
    exec sp_dboption 'urdatabase','select into/bulkcopy',true
    this will improve the perfomance of the query 'select into' .

    u should try checksum function in sql to remove duplicate.
    before that add new column in exist table with identity,primary key
    Code:
    alter table urtable add [id] int identity(1,1)
    CONSTRAINT id_pk PRIMARY KEY
    -- select checksumvalues
    eg:count duplicate in city,state,zip,contract in authors table.And delete duplicates
    Code:
    use pubs
    go
    delete authors from (
    select max(au_id) as au_id,checksumvalue from
    (
    select au_id,au_lname,checksum(city,state,zip,contract) as checksumvalue from authors
    ) as t group by checksumvalue
    having count(*)>1
    ) as t1 where authors.au_id=t1.au_id and authors.au_lname=t1.au_lname
    --Note u should compare atleast one column from duplicate columns in where --clause(because checksum functiont is not 100% acuurate) as u can see , I --used 'au_lname' field in the above query
    Same way u should compare with ur new [id] column which u created.



    finally u have to build index on that table for better perfomance.

    come back to me,if have doubts on how to remove duplicates using checksum function. try to put tables DDL also.


    NB: dont forgot change back database option to,
    Code:
    exec sp_dboption 'urdatabase','select into/bulkcopy',false
    after finish this job.
    Last edited by mallier; 11-18-05 at 08:12.
    I love cursor,though I never use it.Cos' I dont want to misuse it.
    ----------------------------------------------

    ----cheers
    ----http://mallier.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •