Results 1 to 15 of 15
  1. #1
    Join Date
    Feb 2004
    Location
    Alpine Califormia
    Posts
    1,789

    Unanswered: Deleting Duplicate Records

    Hi guys I am trying to delete some duplicate records out of a table. So far I have been able to identify them. Just trying to do some clean up, of uneccessary updates.

    Code:
    CREATE TABLE [dbo].[Student_tbl](
    	[FamilyID] [nvarchar](255) NOT NULL,
    	[Parent ID] [nvarchar](255) NULL,
    	[Referral Location] [nvarchar](255) NULL,
    	[Referral Date] [nvarchar](255) NULL,
    	[Student First Name] [nvarchar](255) NULL,
    	[Student Last Name] [nvarchar](255) NULL,
    	[Student SS#] [nvarchar](255) NULL,
    	[Student Sex] [nvarchar](255) NULL,
    	[Student Date of Birth] [nvarchar](255) NULL,
    	[StudentID] [int] IDENTITY(1,1) NOT NULL,
     CONSTRAINT [PK_Student_tbl] PRIMARY KEY CLUSTERED 
    (




    Code:
    SELECT [Parent ID],Count([Parent ID]) AS NumberOfDups
    From Student_tbl
    GROUP BY [Parent ID]
    HAVING COUNT(*) >1

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Whoa there lady!!!

    Based on what I see of your query, I think you might be headed for huge trouble. What exactly is a [Parent ID] in this table?

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Feb 2004
    Location
    Alpine Califormia
    Posts
    1,789
    Quote Originally Posted by Pat Phelan
    Whoa there lady!!!

    Based on what I see of your query, I think you might be headed for huge trouble. What exactly is a [Parent ID] in this table?

    -PatP

    the [Parent ID] is not the pk. its nvarchar(255) which has a combination of the first and last name. This is a database that someone else created, so I'm doing allot of clean up. an example of it is

    [Parent ID] Numberofdups

    alecyvonne 2


    allendeena 5
    Last edited by desireemm; 05-14-09 at 20:16.

  4. #4
    Join Date
    Nov 2004
    Location
    Canada
    Posts
    58
    Could you not just create a table with a primary key and select distinct into?

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    So a quick sniff test of that query tells me that allendeena has five children, which tells me that there might be duplicate records but doesn't tell me if there are any duplicate records. Am I understanding that correctly?

    What do you get if you run:
    Code:
    SELECT *
       FROM [Student_tbl] AS a
       WHERE 1 < (SELECT Count(*)
          FROM [Student_tbl] AS b
          WHERE  b.[Parent ID] = a.[Parent ID])
       ORDER BY a.[Parent ID], a.[StudentID]
    Do you find any siblings?

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  6. #6
    Join Date
    Feb 2004
    Location
    Alpine Califormia
    Posts
    1,789
    Quote Originally Posted by Pat Phelan
    So a quick sniff test of that query tells me that allendeena has five children, which tells me that there might be duplicate records but doesn't tell me if there are any duplicate records. Am I understanding that correctly?

    What do you get if you run:
    Code:
    SELECT *
       FROM [Student_tbl] AS a
       WHERE 1 < (SELECT Count(*)
          FROM [Student_tbl] AS b
          WHERE  b.[Parent ID] = a.[Parent ID])
       ORDER BY a.[Parent ID], a.[StudentID]
    Do you find any siblings?

    -PatP

    Hi pat, when I run the query here is what I get. I am trying to get these two tables Parent_tbl and Student_tbl to have a relationship but its not working. Because a student can be a child or an adult. and if its the adult is the childs parent they want ot track the parent because a parent can also be a student. Also they have different activities they do. Which is another table to track thier activities. I hope that makes sense
    Attached Thumbnails Attached Thumbnails results.jpg  
    Last edited by desireemm; 05-15-09 at 13:02.

  7. #7
    Join Date
    Jul 2007
    Location
    Belgium
    Posts
    15
    So if I understand correctly, a student can exist without a parent? If that is your "business rule", then I think you might be going about the modeling process from the wrong angle.

    You may want to consider having a bridge table between Student_tbl and Parent_tbl, called StudentWithParent or something. In StudentWithParent, you will have 2 columns: Student_ID and Parent_ID. If a student has a parent, a row will exist in the bridge table to show the relationship. If the student does not have a parent (because the student is an adult, or an orphan I guess), then no row will exist in the bridge table.

    So,

    Student_tbl 1-to-n StudentWithParent n-to-1 Parent_tbl

    where n >= 0

    And while you're at it, consider creating surrogate primary keys for each row in Student_tbl and Parent_tbl to use in StudentWithParent. A surrogate key is simply a meaningless, usually sequential, integer number assigned uniquely to each row in a table. They're perfect for joins and representing relationships as mentioned.

    There are other ways to model this problem. If the above doesn't work for you, then we can try another approach.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Based on the results that you posted, none of those look like duplicate students to me.

    This kind of relationship gets complex as you get into multi-generational situations. One person cab be both a child and a parent (where they were enrolled as a child, then they have childern and enroll those children too).

    I'd approach this with a single table for people. Build one view from that table to show parents (people that have children in the system) and another view to show children (people that have parents in the system). Keep in mind that it is possible for any given person in the people table to show up in either the parent or the children views and possibly in both of the views.

    Becasue family relationships are complex, I'd have a separate table something like what Tod suggested. It isn't uncommon these days for some children to have biological parents as well as one or more step parents, and foster parents often come into the picture too. In order to keep your sanity when modeling this, consider the child to be the "anchor" in the family table, since you will always have a child in every family relationship that matters for your program, but you won't always have all of the possible adults.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    Feb 2004
    Location
    Alpine Califormia
    Posts
    1,789
    Quote Originally Posted by Pat Phelan
    Based on the results that you posted, none of those look like duplicate students to me.

    This kind of relationship gets complex as you get into multi-generational situations. One person cab be both a child and a parent (where they were enrolled as a child, then they have childern and enroll those children too).

    I'd approach this with a single table for people. Build one view from that table to show parents (people that have children in the system) and another view to show children (people that have parents in the system). Keep in mind that it is possible for any given person in the people table to show up in either the parent or the children views and possibly in both of the views.

    Becasue family relationships are complex, I'd have a separate table something like what Tod suggested. It isn't uncommon these days for some children to have biological parents as well as one or more step parents, and foster parents often come into the picture too. In order to keep your sanity when modeling this, consider the child to be the "anchor" in the family table, since you will always have a child in every family relationship that matters for your program, but you won't always have all of the possible adults.

    -PatP

    thank you so much guys, thats what I needed to know. That helps allot becasue I was not sure as to how to handle this one.

  10. #10
    Join Date
    Feb 2004
    Location
    Alpine Califormia
    Posts
    1,789
    Quote Originally Posted by Pat Phelan
    Based on the results that you posted, none of those look like duplicate students to me.

    This kind of relationship gets complex as you get into multi-generational situations. One person cab be both a child and a parent (where they were enrolled as a child, then they have childern and enroll those children too).

    I'd approach this with a single table for people. Build one view from that table to show parents (people that have children in the system) and another view to show children (people that have parents in the system). Keep in mind that it is possible for any given person in the people table to show up in either the parent or the children views and possibly in both of the views.

    Becasue family relationships are complex, I'd have a separate table something like what Tod suggested. It isn't uncommon these days for some children to have biological parents as well as one or more step parents, and foster parents often come into the picture too. In order to keep your sanity when modeling this, consider the child to be the "anchor" in the family table, since you will always have a child in every family relationship that matters for your program, but you won't always have all of the possible adults.

    -PatP

    I ran that query on the Student Table and there was allot of duplicates there, it does not have a relationship with parent table (could not enforce referential integrity).

  11. #11
    Join Date
    Jul 2007
    Location
    Belgium
    Posts
    15
    If no relationship exists, then creating a relationship to the parent table is optional (n = 0) and is best represented using Pat's technique or through some form of bridge table.

    Also, surrogate primary keys can really be helpful in these sorts of situations.

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'm very much in favor of surrogate keys, but I also define unique constraints for the natural keys too. I have surrogate keys on virtually every table that I create because I avoid using any column the user can "see" for primary keys if I can. If any user can see it (clerk, manager, or heven forbid legislator) then sooner or later someone will want to change it.

    Natural Keys are logically equivalent to Primary Keys, so both need to be protected.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  13. #13
    Join Date
    Jul 2007
    Location
    Belgium
    Posts
    15
    Exactly. So in this case, the candidate key would be the natural key (or composite attributes that define a unique row), and he surrogate would be the primary key...

  14. #14
    Join Date
    Feb 2004
    Location
    Alpine Califormia
    Posts
    1,789
    Quote Originally Posted by tod.mckenna
    So if I understand correctly, a student can exist without a parent? If that is your "business rule", then I think you might be going about the modeling process from the wrong angle.

    You may want to consider having a bridge table between Student_tbl and Parent_tbl, called StudentWithParent or something. In StudentWithParent, you will have 2 columns: Student_ID and Parent_ID. If a student has a parent, a row will exist in the bridge table to show the relationship. If the student does not have a parent (because the student is an adult, or an orphan I guess), then no row will exist in the bridge table.

    So,

    Student_tbl 1-to-n StudentWithParent n-to-1 Parent_tbl

    where n >= 0

    And while you're at it, consider creating surrogate primary keys for each row in Student_tbl and Parent_tbl to use in StudentWithParent. A surrogate key is simply a meaningless, usually sequential, integer number assigned uniquely to each row in a table. They're perfect for joins and representing relationships as mentioned.

    There are other ways to model this problem. If the above doesn't work for you, then we can try another approach.

    How would I go about creating this surrogate key?? I dont think I am creating this bridge table correctly. I created a family_tbl

    Code:
    CREATE TABLE [dbo].[Family_tbl](
    	[StudentID] [nvarchar](255) NULL,
    	[ParentID] [nvarchar](50) NULL,
    	[Child_Parent] [nvarchar](50) NULL,
    	[FamilyID] [int] IDENTITY(1,1) NOT NULL,
     CONSTRAINT [PK_Family_tbl] PRIMARY KEY CLUSTERED 
    (
    	[FamilyID] ASC
    Last edited by desireemm; 06-24-09 at 11:11.

  15. #15
    Join Date
    Feb 2004
    Location
    Alpine Califormia
    Posts
    1,789
    ok I'm gtg now

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •