Results 1 to 8 of 8
  1. #1
    Join Date
    Feb 2012
    Posts
    13

    Unanswered: Having trouble normalizing a database

    I don't have a lot of experience with SQL or databases. Most of the work I've done in the past has been with Excel, which is very different. But my wife asked me to help here design an Access database. This database will be used at a school (university) to hold basic data about the students in the program.
    I had received the original data in the form of two excel spreadsheets, which I had to merge. I then imported them into Access into one table. I knew that to make things easier, that I had to normalize the table.

    The original table:
    students
    • student_id
    • W_number
    • last_name
    • first_name
    • middle
    • address
    • city
    • state
    • zip_code
    • phone_number
    • email
    • gpa
    • standing
    • major_code
    • degree_code
    • college
    • major
    • graduation_date
    • plans
    • status


    I went ahead and created new tables for major, degree, college, graduation date, standing and status. I also created a new table for location with zip, city and state in it, then used a junction table to connect it to the main terminal. Finally, I created fields in the main students table to act as foreign keys for all of the new tables (major_id, degree_id, etc.) and created the relationships between the main tables and the new tables.
    The issue I'm running into now is how to get the data out of the original columns and into the new tables. None of the updates that I've tried have worked. I've been able to figure out the data I need using SELECT, but can't get it to update. Also need to know how to update the new columns (foreign keys) with the id numbers from the new terminals.
    I'm sure this is something really easy that I'm missing, but I just can't figure it out
    Any help would be appreciated.
    Thanks

    Harlan

  2. #2
    Join Date
    Feb 2012
    Posts
    13
    If it helps, here is a diagram of what I am trying to do. As I said before, I don't have a ton of experience with databases, so if what I'm doing isn't correct, let me know. I'm always looking to learn more
    Attached Files Attached Files

  3. #3
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    I don't think you 'get' normalisation as yet.
    there should be no need to have the graduation data in a separate table, its an element of the students personal record. im not to certain what you are trying to model but in an ideal world you'd probably have the courses a struden is studying in a separate (intersection) table.
    ie
    courses
    st.rudents
    students courses, the PK of which would be the course id and the student id

    also im nto sure why you feel you need to have a table for student zip codes, the student zip code is an attribute of the students address, pulling it out into a separate table doens't do anything (well if it does I don't understand it..)

    I don't fully nderstand the US zip code system but I don't understand what what you zip code entity is trying to represent. I would expect there to be other tables such as city and state as parent tables to that table. I presume that a zip code in itself should be unique enough not to require a zipcodeID aswell as a zipcode

    similarly Im surprised to see in your major table a major_id and major_code. I would have expected the major_code itself to be unique and therefore make an ID colun redundant
    there are also other tabels Id expect to see but arent there, such as college...

    to answer you question of how do you stuff your exel data into your tables, well the answer is a series of insert queries.
    first off import your excel data into your access db
    take note of the column and table names.
    then you need to 'stuff' the data into the 'parent' tables first.
    eg
    lets say you want to populate the majordegree table
    Code:
    insert into major (major_code,Major) select distinct excel_col1, excel_col2 from anexceltable
    that assumes that the major code and course name are in a table called anexceltable and are stored in columns called excel_col1 (major_code) and excel_col2 (major). you will naturally have to change these to refledct the names in your application

    do that for the other parent tables...

    then to make ceretain you use the foreign key values you need to JOIN to the parent tables to get the right foreign keys. because you are starting out Id suggest that you do this an easier way, by just importing the data directly into the student table (including the descriptions/values of the parent tables. ie include the major course name, the degree name and so on. add the columns you need for those parent tables foreign key. make certain they are the same datatype (and size) as the parent table)
    then update each ine as a separate query

    update mystudenttable set major_code = select major.major_code from major where major.major = mystudenttable.major

    and then repeat for all other parent tables.
    the advantage of this approach is thaat although it takes longer you can be assured its less complex meaning htere is less chance of a screw up, less risk that a failure in one join wont bring down the whole operation. you can check to make certain your data is 'sane'. if it isn't then modify your query and re run it until you do get sane answers. you haven't lost anything or overwritten anything vital as you still have the original data.
    I'd rather be riding on the Tiger 800 or the Norton

  4. #4
    Join Date
    Feb 2012
    Posts
    13
    I think you kind of misunderstood what it is I'm doing here.
    First off, the database was given to me by my wife. I have no say over what goes in it. I'm just taking information from an Excel spreadsheet and converting it to a database.
    Second, I already imported all of the information into the first table. Excel is now out of the equation.
    Now to answer your questions:
    also im nto sure why you feel you need to have a table for student zip codes, the student zip code is an attribute of the students address, pulling it out into a separate table doens't do anything (well if it does I don't understand it..)

    I don't fully nderstand the US zip code system but I don't understand what what you zip code entity is trying to represent. I would expect there to be other tables such as city and state as parent tables to that table. I presume that a zip code in itself should be unique enough not to require a zipcodeID aswell as a zipcode
    For a US address, there is usually the street and street number (which is not in this database), city, state and zip code (postal code). The reason I was separating out the zip code and giving it its own id (zip_id) is that there are some zip codes that include multiple cities. For an example one zip code (84401) could have as many as two or three cities in it. I'm thinking that instead of using the zip_id, I can just use a composite primary key (zip + city). But then how would I link that to the main table? Because I can't just use the zip as the foreign key (can I?). This is where I'm thinking of using a junction table. But I'm still not sure.
    similarly Im surprised to see in your major table a major_id and major_code. I would have expected the major_code itself to be unique and therefore make an ID colun redundant
    there are also other tabels Id expect to see but arent there, such as college..
    I've decided to do away with the major_id, but I realized I have a problem. There are three majors in the data that I was given. They are all different major codes, but all the same major. It depends on which college they belong to. How should I handle this situation? I'm thinking on joining the major and college tables. Does that make sense?
    As for the graduation date, the reason I've pulled that into it's own table is that many students graduate on the same day. Does that matter? Do I really need that separate table?
    Now on to the data:
    To start, the parent table is going to be students. That's the main table. I've been able to insert most of the data from the parent table into child tables by hand. The only one I'm having an issue with is the graduation table. That's because the graduation_id is not an auto number, so I can't just use an UPDATE statement on the dates. I need to use the UPDATE statement on both the dates and the ID, but I don't know how to increment the ID.
    The other question is how do I UPDATE the foreign keys in the parent table (college_id, standing_id, etc.) to match up with the actual value? None of the statements I've tried have worked. I keep getting errors, though I'm wondering if it's because I'm using Access and the syntax is different.
    I've deleted all of the relationships between the parent and child tables for now until I can get the correct data into the parent table. Then I will DROP the columns I don't need and hope that everything matches up.
    If anyone else is willing to help out, it would be appreciated.

    Thanks

  5. #5
    Join Date
    Feb 2012
    Posts
    13
    So after some digging, I've figured out how to populate the foreign keys. My syntax was incorrect. It was this:
    Code:
    UPDATE students
    INNER JOIN standing
    ON students.standing = standing.standing
    SET students.standing_id = standing.standing_id;
    So I've updated degree, standing and status. Still not sure what the best course of action is for graduation date, the majors and colleges and locations (city, state and zip).

    For the majors and colleges, this is what I have (data-wise):

    College/ Major/ Major Code
    College of Health Professions/ Health Promotion/ 2008
    College of Education/ Health Promotion/ 5013
    College of Education/ Health Promotion/ 5012

    What would be the best way to put this data in tables?

    Thanks

  6. #6
    Join Date
    Mar 2009
    Posts
    5,442
    Provided Answers: 14
    I'm totally with healdem here and I think you answered without analyzing and/or completely understanding his answer.
    Quote Originally Posted by hmsiegel View Post
    First off, the database was given to me by my wife. I have no say over what goes in it.
    What normalization is and what you wife wants are two different things, except if your wife is a database guru, which she is not, obviously (no offence here).
    Quote Originally Posted by hmsiegel View Post
    For a US address, there is usually the street and street number (which is not in this database), city, state and zip code (postal code). The reason I was separating out the zip code and giving it its own id (zip_id) is that there are some zip codes that include multiple cities. For an example one zip code (84401) could have as many as two or three cities in it. I'm thinking that instead of using the zip_id, I can just use a composite primary key (zip + city). But then how would I link that to the main table? Because I can't just use the zip as the foreign key (can I?).
    The zip code still remains an attribute of the student's address, notheless and I don't see any reason to store the adress in a seperate table. Even if there was such a reason, e.g. because students have more than one address, would not change anything: Having different cities with the same zip code simply means that it is not a candidate key (i.e. it could not be used as a primary key in an addresses or cities table) and this address table would have it's own Identity column (which could not be the zip code, see above).
    Quote Originally Posted by hmsiegel View Post
    The only one I'm having an issue with is the graduation table. That's because the graduation_id is not an auto number, so I can't just use an UPDATE statement on the dates. I need to use the UPDATE statement on both the dates and the ID, but I don't know how to increment the ID.
    What refrains you from using an Identity column (autonumber) in that table?

    Generally speaking, here's what I do in such cases:

    1. Select the data that will be moved to a lookup table.

    2. Create that lookup table with an Identity column.

    3. Fill the lookup table with data from the original table (Col0 is the Identity column):
    Code:
    INSERT INTO LookupTable ( Col1, Col2 )
        SELECT Col3, Col5
          FROM OriginalTable
      GROUP BY Col3, Col5
      ORDER BY Col3, Col5;
    4. Add a Foreign Key column to OriginalTable (FK_LookupTable).

    5. Fill FK_LookupTable. The Jet Engine (Access database engine) only accepts a subset of the whole SQL instruction set, so we need to use a rather odd construct (it would be easier in T-SQL on a SQL Server):
    Code:
    UPDATE OriginalTable INNER JOIN 
           LookupTable ON (OriginalTable.Col3 = LookupTable.Col1) AND 
                          (OriginalTable.Col5 = LookupTable.Col2)
       SET FK_LookupTable = LookupTable.Col0;
    6. Check the results:
    Code:
    SELECT OriginalTable.Col0, OriginalTable.FK_LookupTable, OriginalTable.Col5
      FROM OriginalTable LEFT JOIN 
           LookupTable ON OriginalTable.FK_LookupTable = LookupTable.Col0
     WHERE LookupTable.Col0 Is Null;
    7. Correct the non-matching rows that appear in the query here above.

    8. Delete Col3 and Col5 from OriginalTable.

    If you use a junction table, the process is a tad more complex but the principle remains the same.
    Have a nice day!

  7. #7
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    OK, so understanding where things lie at present. You cant make changes to the design becuase its controlled by your wife, yet you are coercing that design into database.


    takign a stpe back, my comment about not fully graspign normalisation still holds. I don't see any crfedible reasonwhy you'd want to store graduation dates in a table of their own, especailly with an autonumber ID as the PK. Why?
    ..well its wrong on several levels. lets assume there is a need for a graduation table (persaonlly I can't see it, but lets presume their is one). a date is unique there is never more than one 15th of August in any one year, so storing the date 15 Aug 2014 is unique in itself, it doesn't need an autogenerated ID to get a unique value.
    there should be no need to be concerned about storage, with modern computers storage is very very cheap, besides which storing the date as opposed to an autogenerated ID will save at most 3, possibly 4 bytes per row.. negligiable in modern times. if you were using a severely limited storage medium then OK mebbe your approach has some merit.
    granted you would have to make certain that you only EVER store the date part in the column. FYI Access like most other db's stores dates and times in the same datatype. see here on how Access stores dates.

    OK so lets look at your graduation date table. does it justify a separate table. I'd argue not. the graduation date is an attribute of the student course or student depedning on your design. it doesnt' justify the need for a separate table. you are not storing anyhtign unique to a graduation date, other than the date itself. so what is the purpose of the table. by havign a spearate table to me that infers there is somethign special about graduation dates. yes graduation is significant to students but its not significant in db terms. Im sure you have your reasons why you want to have a separate table, but Im not convinced its 'right'

    OK so you have multiple courses ('major' from different institutions so that deos indeed justify an auto number column as you have to translate the outside world reference to a common internal reference

    US ZIP codes. its my understanding that the US postal code are not that different to UK Post Codes. the first 3 digits represent the main sorting office, the next identifies the local office and so on. in fact you can download a version of the ZIP database here. again Id argue there is no need to have an intermediate table (student zip) for the zip code its an attribute of the students address (granted you could well argue that perhaps you ought to have a sub table for student address (eg term time, home etc...)).
    however unlike the UK system where 7 digits uniquely identifies a specific delivery address or locale of a small group of address the US system uses 5 or 5+4.. tharts where your problem coudl well lie. but you dont' own the ZIP database so why worry about it? by all means have FK from the zip codes table to the student address.
    I'd rather be riding on the Tiger 800 or the Norton

  8. #8
    Join Date
    Feb 2012
    Posts
    13
    I apologize if I came off as harsh or ungrateful. When I said that the database was given to me by my wife, I was only referring to the data that is in it. You (healdem) said something about courses. Of courses would be nice information to have. But that's not what they are tracking. Just the information that I have.
    I believe that the design of the database is better now. I took your suggestions as well. I didn't move the graduation dates to a separate table. You're right, it didn't make sense. I also did away with the students_zip table, and just created a zip table, although I did use zip_id because of the differences in cities. Also, I realized that there should have only been 2 major codes. The refer to the same program, just in different colleges. So I did away with the college table, and added the college field to the major table.
    So here's my question now. I would like to create a form so that adding the information is easier, although I'm not entirely sure how to do that. I tried creating a form, and it looked nice, but I wasn't able to bring up any of the record or add anything. I'm sure it had to do with the query behind the scenes.
    I'm sure I'll need to create a couple of reports too, thought I'm not sure what they will include yet.
    Thanks for all the help.
    I've included the update model as well as the actual database (in zip).
    As always, any and all suggestions are welcome.
    Thanks again.
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •