Results 1 to 5 of 5
  1. #1
    Join Date
    Mar 2004
    Location
    Glasgow, Scotland
    Posts
    107

    Unanswered: Remove Duplicate Data

    I have a query that for one reason or another produces duplicate information in the result set. I have tried using DISTINCT and GROUP BY to remove the duplicates but because of the nature of the data I cannot get this to work, here is an example fo the data I am working with

    ID Name Add1 Add2
    1 Matt 16 Nowhere St Glasgow
    1 Matt 16 Nowhere St Glasgow, Scotland
    2 Jim 23 Blue St G65 TX
    3 Bill 45 Red St
    3 Bill 45 red St London

    The problem is that a user can have one or more addresses!! I would like to be able to remove the duplicates by keeping the first duplicate ID that appears and getting rid of the second one. Any ideas?

    Cheers

  2. #2
    Join Date
    Dec 2004
    Posts
    5
    1. think about normalizing your data ...
    2. add some primary key (identity column primaryid?)
    3. then try following

    select * from yourtable y
    where y.primaryid = (select min(yourtable.primaryid) from yourtable y2 where y2.id = y.id)

    or

    select * from yourtable y1
    join (select min(primaryid) as primaryid from yourtable group by id) y2 on y2.primaryid = y1.primaryid

    or something like that...

  3. #3
    Join Date
    Mar 2004
    Location
    Glasgow, Scotland
    Posts
    107
    Hi Ludenka,

    Thanks for that. Unfortunately Normalising the data is not an option, I just have to work with it the way it is...Managed to get it to work using a rather long winded stored procedure but I'll have a look at your suggestion and see if I can do it a better way:

    DECLARE @patno int
    DECLARE @patfname varChar(255)
    DECLARE @patsname varChar(255)
    DECLARE @DOB DATETIME
    DECLARE @patadd1 varChar(255)
    DECLARE @patadd2 varChar(255)
    DECLARE @PreviousPatientID int

    CREATE TABLE #tmp_tblPrivDemo (
    patno int,
    patfname varChar(255),
    patsname varChar(255),
    DOB DATETIME,
    patadd1 varChar(255),
    patadd2 varChar(255)
    )

    --First of all load all Patient details into a cursor
    DECLARE PatientIDCursor CURSOR FOR
    SELECT AxTbl_Patient.Patient_ID,
    AxTbl_Patient.Forename1,
    AxTbl_Patient.Surname,
    AxTbl_Patient.Date_Of_Birth,
    AxTbl_Patient_Address.Address1,
    AxTbl_Patient_Address.Address2
    FROM
    AxTbl_Patient
    LEFT JOIN AxTbl_Patient_Address ON AxTbl_Patient_Address.Patient_ID = AxTbl_Patient.Patient_ID
    LEFT JOIN AxTbl_Patient_GP ON AxTbl_Patient_GP.Patient_ID = AxTbl_Patient.Patient_ID
    LEFT JOIN [SpFrm_Follow_Up_Death_Details_v1-0] ON [SpFrm_Follow_Up_Death_Details_v1-0].Patient_ID = AxTbl_Patient.Patient_ID
    INNER JOIN [SpFrm_Diagnostic_Procedures_v1-1] ON [SpFrm_Diagnostic_Procedures_v1-1].Patient_ID = AxTbl_Patient.Patient_ID
    WHERE
    (([SpFrm_Diagnostic_Procedures_v1-1].[Date of Diagnosis] >= '01-Apr-2003' AND [SpFrm_Diagnostic_Procedures_v1-1].[Date of Diagnosis] <= '30-Sep-2003') AND (AxTbl_Patient_Address.IsCurrentAddress = 'yes' OR AxTbl_Patient_Address.IsCurrentAddress IS NULL) AND (AxTbl_Patient_GP.Date_GP_Changed IS NULL))

    OPEN PatientIDCursor
    FETCH NEXT FROM PatientIDCursor INTO @patno, @patfname, @patsname, @DOB, @patadd1, @patadd2
    SET @PreviousPatientID = 0

    WHILE @@FETCH_STATUS = 0
    BEGIN
    --Then only add to temp table if not already added, so no dups.
    IF @PreviousPatientID <> @patno
    BEGIN
    INSERT INTO #tmp_tblPrivDemo
    SELECT @patno, @patfname, @patsname, @DOB, @patadd1, @patadd2
    END
    SET @PreviousPatientID = @patno
    FETCH NEXT FROM PatientIDCursor INTO @patno, @patfname, @patsname, @DOB, @patadd1, @patadd2

    END --end loop.
    CLOSE PatientIDCursor
    DEALLOCATE PatientIDCursor

    SELECT * FROM #tmp_tblPrivDemo ORDER BY patno
    DROP TABLE #tmp_tblPrivDemo

  4. #4
    Join Date
    Dec 2004
    Posts
    5
    well, normalizing is not important for the task, it was only a hint...

    are you able to add the identity column?
    if yes, then my solution should work
    if not, you could replace all of those

    y1.primaryid = y2.primaryid

    with comparison off all in table included columns, but ... it's not nice at all

  5. #5
    Join Date
    Feb 2004
    Posts
    88
    Quote Originally Posted by mmcdonald
    I have a query that for one reason or another produces duplicate information in the result set. I have tried using DISTINCT and GROUP BY to remove the duplicates but because of the nature of the data I cannot get this to work, here is an example fo the data I am working with

    ID Name Add1 Add2
    1 Matt 16 Nowhere St Glasgow
    1 Matt 16 Nowhere St Glasgow, Scotland
    2 Jim 23 Blue St G65 TX
    3 Bill 45 Red St
    3 Bill 45 red St London

    The problem is that a user can have one or more addresses!! I would like to be able to remove the duplicates by keeping the first duplicate ID that appears and getting rid of the second one. Any ideas?

    Cheers
    perhaps a quicker way of doing it than your cursor would be:

    declare @temp_name_addr TABLE (
    id int not null,
    name char(n) not null,
    add1 char(n) null,
    add2 char(n) null
    )

    insert @temp_name_addr (id, name)
    select distinct id, name
    from permanent_table

    update T
    set T.add1 = P.add1,
    T.add2 = P.add2
    from @temp_name_addr T,
    permanent_table P
    where T.id = P.id

    select * from @temp_name_addr

    ...of course which address you get will be arbitrary (both in your solution and mine) there's no rigid "first" without an order by clause...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •