Results 1 to 5 of 5
  1. #1
    Join Date
    Mar 2003
    Posts
    10

    Unanswered: Problem getting rid of nondistinct records

    Hello,

    I have a problem with which I need some help. I have a table called "user" that contains the userid, name, and address. However for users with multiple listed addresses, there are duplicate records...example:

    userid, name, address:

    0001, John Smith, 123 main rd.
    0001, John Smith, 456 second rd.
    0002, Bob Jones, 435 another rd.

    I need to combine all the duplicate records into a single entry by concatenating the adresses. (I.E. "0001, John Smith, 123 main rd/456 second rd")

    I can select the duplicate rows by doing:

    SELECT * FROM user
    WHERE userid in (
    SELECT userid FROM user
    GROUP BY userid
    HAVING COUNT(*) > 1 )
    ORDER BY userid ASC

    Does anyone know how I could concat all the duplicate rows into one and then delete the duplicates? I dont care which order the addresses are concatinated in. Any ideas?

    Thanks in advance,
    Andrei Girenkov

  2. #2
    Join Date
    Feb 2003
    Location
    Milano, Italy
    Posts
    80
    For normalization concerns you should AVOID to do what you ask. A better solution, but requires you to modify the architecture of your database, is to make an external address table:

    User Table:

    0001, John Smith
    0002, Bob Jones

    Address Table:

    0001, 123 main rd.
    0001, 456 second rd.
    0002, 435 another rd.

    To solve your problem, anyway, i think you have no other way to use cursors, since the number of equals row you need to concatenate will vary.
    Davide Mauri
    http://www.davidemauri.it

  3. #3
    Join Date
    Mar 2003
    Posts
    10
    Thanks for the reply, but I already solved the problem. I ended up doing it with for loops. It took a long time to execute but it's a one time operation.

    ~Andrei

  4. #4
    Join Date
    Oct 2003
    Posts
    1
    Originally posted by Andrei_Girenkov
    Thanks for the reply, but I already solved the problem. I ended up doing it with for loops. It took a long time to execute but it's a one time operation.

    ~Andrei
    This could be for you or anyother reader.. its an example I put together that would show you efficient ways to point out duplicate records from within a database table. Also applies to most relational database that support the Structured QUery language (SQL)



    create table tmp_awahs_dupes_killer
    (
    record_id varchar(100),
    name varchar(100)
    )
    go
    insert into tmp_awahs_dupes_killer values(100, 'In')
    insert into tmp_awahs_dupes_killer values(200, 'These')
    insert into tmp_awahs_dupes_killer values(300, 'Rows')
    insert into tmp_awahs_dupes_killer values(400, ',')
    insert into tmp_awahs_dupes_killer values(500, 'These')
    insert into tmp_awahs_dupes_killer values(500, 'Are')
    insert into tmp_awahs_dupes_killer values(500, 'The')
    insert into tmp_awahs_dupes_killer values(500, 'Dupes')
    go

    select * from tmp_awahs_dupes_killer t1 where (select count(t2.record_id) from tmp_awahs_dupes_killer t2 where t2.record_id= t1.record_id)> 1

    drop table tmp_awahs_dupes_killer
    go

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Manowar is right. Your database is already denormalized enough. I predict a post six months from now requesting help parsing all those concatenated addresses into separate elements again.

    awahteh: I suspect your query would be more efficient using Andrei_Girenkov's linked subquery method, though the optimizer might convert your syntax to this plan prior to execution anyway. If it doesn't, then you will end up executing your subquery once for every record in your main table, while Andrei_Girenkov's method only runs the aggregate once.

    blindman

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •