Page 1 of 2 12 LastLast
Results 1 to 15 of 18

Thread: Key fields

  1. #1
    Join Date
    Feb 2004
    Posts
    34

    Unanswered: Key fields

    Edit: the question is more clear in http://www.dbforums.com/showthread.p...=1#post4477661

    OK this question is not really SQL server related but since we will build the database in MS SQL Server (2000) i think this is the proper forum to ask this question.

    It's actually a quite simple question i gues, but my experience with databases is very limited (buiding my first serious database actually).

    when i have two tables Organization and Samples coming from this organization with the following fields:

    Code:
    Organization:
    
    InstituteID (identification)
    Name
    Adress
    ...
    
    Sample:
    
    SampleID (sample identification)
    InstituteID (references to the Institute table)
    SampleDate
    Location
    ...
    In general is it better to use the sampleID and InstituteID combined as primary key in the second (sample) table (and build an index on the combined key) or to generate a new unique key for each record? And why should i choose either one of these options?

    One advantage of the first option is you dont have to generate keys yourself (or use some autonumbering system), however the disadvantage is the key grows bigger and bigger when adding more tables (fraction taken from samples, analysis performed on fractions, Analysis giving results).
    Last edited by irenicuz; 10-15-05 at 04:19.

  2. #2
    Join Date
    May 2002
    Posts
    299
    This is an aged old question/debate. Artificial key allows you to have a narrow key but natural key allows you to cover the query. Depending on who you ask, the opinion and supporting *proof* in perf gain can be shown.

    Personally, I would go with natural key but I do use artificial key in the past.

    Here is a sample of previous discussion on such topic.

    http://groups.google.com/groups?q=sq...vs+natural+key
    --
    -oj
    http://www.rac4sql.net

  3. #3
    Join Date
    Feb 2004
    Posts
    34
    I am already reading a lot about it so ill hope i can decide for myself but still any opinions are appreciated. One more thing that might be important is that when using natural keys in deeper tables (such as results) a natural key can result in up to 7 fields to make a record unique.

    Edit: Another thing to consider: It is most likely because of the complexity of data checking (not done by the DBMS) needed before records are inserted most of the data will be inserted by a single administrator executing these procedures.
    Last edited by irenicuz; 10-14-05 at 08:53.

  4. #4
    Join Date
    Feb 2004
    Posts
    34
    One more question:

    If i have say a number of countries (max length 20 characters) should i put them in a seperate table (they dont have aditional attributes) and use integer numbers to refer to this table or just store the full country name in the field? If i do add them in a different table you haev to generate id's for them and need 1 more join to join the data. On the other hand you do save space in the database, but is it really worth the hassle?

  5. #5
    Join Date
    May 2002
    Posts
    299
    If this is an OLTP system, the normalization should be observed. Thus, a seperate table for COUNTRIES would be needed.

    As to your earlier question about the wide multi-column key, I would suggest you search for Kimberly Tripp's article/paper on covering indexes.
    --
    -oj
    http://www.rac4sql.net

  6. #6
    Join Date
    Feb 2004
    Posts
    34
    ok im really a starter so why would that be? For data inserts/updates it will take longer and use more space if not normaliing countries, but for data retreival joining another table will make the time to retreive results longer not?

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    I'm confused. Is SampleID unique for each Sample record, as its name implies? If so, then under no circumstances would you want to combine it with InstituteID as a composite key. It is a primary key on its own.

    If SampleID is not unique for each Sample record, then you might want to rethink your object names. If the Sample record is simply implementing a many-to-many join between Institutes and some other sample table, then an additional surrogate key may not be necessary, but may simplify application programming.

    I am one of the camp that puts a surrogate key on everything, because composite keys become unwieldy after several layers of day, and I believe that consistency in development is important.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  8. #8
    Join Date
    Feb 2004
    Posts
    34
    Different institutes supply unique Samples. Though the SampleID institute 1 gives to a sample can be the same as intitute 2 gives to another sample. That is why only instituteID+SampleID make a record unique.

    From this sample, fractions are taken, the fractions are analyzed and the analysis gives results (so all 1 to many relations)

    if i continue using keys the way i use them in the sample table example i will end up in the results table with the following fields identifying a record:

    InstituteID
    SampleID
    FractionMethodID
    AnalysisMethodID
    MeaurementNumber
    CASNumber (identifying a result)

    So the 2 questions i have:

    Should i replace this key (build mostly from natural keys) with 1 numeric key value (composite key)? This means more joins to retrieve certain information (all tables have to be included to link results to an institute)? What will work faster?

    Or should i replace the natural keys in every table with a composite key and use this composite keys as replacement for natural keys but use them in the same way i use the natural keys now so i am still able to use less joins to retrieve data? (you know what i mean here?)

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    It is a matter of choice and style. I make sure every table I create has a non-composite primary key, usually surrogate.
    Regardless, you will want a unique index/constraint on InstituteID/SampleID if you elect not to make it the primary key.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  10. #10
    Join Date
    Feb 2004
    Posts
    34
    I have read pages and pages about this topic already, but every person, site or newsgroup seems to have a different opinion. That makes it not easier to come to a final decission. Even checking performance afterwards seems no option since at this moment we have no idea about how big the database will grow and this also seems to influences what path to take.

    It all comes down to the following 1 question:

    What works faster for data RETRIEVAL. Normalize as far as u can go and generate surrogate keys in every table where necessary, resulting in more joins to retrieve data or use natural composite keys that can result in keys up to 7 fields BUT not having to join a lot of tables to get the same result? Expect the database to grow pretty large over time.

    If anyone could answer that it would help greatly in coming to a decission about what path to take.

  11. #11
    Join Date
    Feb 2004
    Location
    Burbank & Santa Cruz de la Sierra
    Posts
    3,716
    I think you are trying to make it simpler than it is. You really can't boil it down to that one either-or decision.

    Database design is what it is because thought and trial-n-error have brought it to where it is. Complete/Full normalization is ideal if you want to get anything from anywhere, coming from any direction. The reality is that there are/may be tradeoffs to performance in a fully-normalized DB. This accounts for much of the reason there are different LEVELS of normalization.

    If you really want speed in the retrieval of data, you perhaps should use an indexed flat file, or a single table with a single unique key. Yeah, going way the other way, but you get the idea.

    There is no silver bullet.

    Everything's a tradeoff.

    Which accounts for the number and varied opinions on the subject.
    aka "Paul"
    Non est ei similis.

    I just read "100 Things To Do Before You Die". I was surprised that "Yell for help!!" wasn't one of them

  12. #12
    Join Date
    Feb 2004
    Posts
    34
    I know this is not an easy topic with one streight forward answer. But my problem is that because i am still a rookie with database design i am trying to find out what the things are that are most difficult to do for a DBMS (MS SQL server in this case), before trying to decide with what option i will go (mostly surrogate or natural keys)

    I dont expect a clear answer but am hoping that someone who does have experience with the MS SQL server DBMS and who has build a similar database like i do (a database with say six 1 to many related tables and some lookup tables attached to it) could come up with the basic advantages and disadvantages of using surrogate keys in a database similar to mine.

    If i have 6 tables and the 6th table uses the primary key of the 5th table + 1 field to make tuples in in this 6th table unique i just have the feeling it is not right since you have a lot of fields in more than 1 table (not properly normalized if u ask me). It feels better to just create 1 surrogate key for each 1 of these fields (which does result in having one surrogate primary key unqiue for a tuple and having a surrogate reference key refering to its parent table.)

    But am i right here? Will Select queries run fast in both systems as long as the tables are properly indexed or does 1 of these methods have a sgnificant disadvantage opposed to the other?

    I just want to get informed well before i am making a decission based on a feeling that is not right

  13. #13
    Join Date
    Aug 2005
    Posts
    100
    If the database is expected to be large, you do not want to carry around any extra storage space. Bigger backups, more resources to archive and purge data etc.

    It's not a question of all or nothing, it's possible to use natural keys to a certain level in the hierarchy, then at some point assign a surrogate identity for further referencing from that point.

    Personally I prefer to use natural keys, but use surrogates when it makes sense to do so.

    As for the performance questions you had, I think neither method has a distinct advantage over the other. That depends more on usage patterns of the data (and indexes). But methodwise, properly implemented both are ok.

    rockmoose

  14. #14
    Join Date
    Feb 2004
    Posts
    34
    So when does it make sense for you to use surrogates?

  15. #15
    Join Date
    Feb 2004
    Posts
    34
    After reading a lot i am leaning toward surrogate keys now. Mostly because using a number of colums as primary key and clustering them makes the the indexes larger, which can result in deecreased performance. Next to that a lot more storage is required and updates to data (that won't hapen often though) require a lot of cascade updating.

    The only big disadvantage is having to use most tables in every query to filter on certain data. However i have the feeling that when the DBMS is smart enough in query optimization this shoulden't cause a too high performance penalty

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •