Results 1 to 9 of 9
  1. #1
    Join Date
    May 2008
    Posts
    3

    Q about normalization and functional dependencies

    Hi everyone!

    I have a university project to design a database system consisting of a bunch of entity sets and relationships. I have a basic idea for a system which would benefit me in my electronics hobby, basically an application to help me keep track of what components I have where, how many, and how many of them are needed for a given project.

    The requirements for the system is for all relation schemas to be in BCNF, and I basically know what this means. However, in choosing the schemas for my relations, I become unsure as to how functional dependencies are actually determined or "how they come about".

    Take for example the case where I have an entity set Project(projectid, projectname, projectdescription). My idea is to have pID as the key for the relation and allow several projects with the same names (and theoretically even the same descriptions, down to the very last case-sensitive letter). Thus I would imagine that there would be a FD like this: pID -> pName pDesc. However, I start to wonder if simply using pID is enough of a key, or can there be another functional dependency between e.g. pDesc -> pName? Because it's very unlikely that I'll assign the same description to two projects, the description could be used to identify a given project, though that's not the idea. My idea is to open my software application and display a list of all projects in the database (including their ID numbers) and select one that I want to work on. Fetch the ID number of the tuple and then continue work.

    So basically what I'm asking is how do I know when functional dependencies appear? Do I choose them myself at design time e.g. by specifying that pID is the only property that can be guaranteed to be unique? Or are FDs determined all the time depending on what data is in the relation? If I have for example two tuples in the Project relation, (1, "Project name", "Description 1") and (2, "Project name", "Description 2") then judging from this there would be actually three FDs, right? pID -> pName, pDesc -> pName, pID pDesc -> pName. Most literature that I have read (books, webpages etc.) seem to derive functional dependencies from the actual tuples in a relation, rather than from the purpose and intended operation of the schema or relation as such.

    I was hopeful that someone could set the records straight on this one. I'm so confused that I don't know what's left or right anymore.

    Cheers

    Mankku

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    I'm not really an expert in formal normalisation, but since nobody else has responded I'll have a go...

    I agree that tutorials and exercises on normalisation often give the impression that it is done by examining a sample set of data - that's a false impression. In reality you need to ask and obtain answers to questions about the data like "can two projects ever have the same description" or "how do you uniquely identify a project". Typically this is done by the analyst (or developer acting as analyst) interviewing experts in the subject area - though often common sense will tell you the answer anyway.

    Attributes like Project ID, Project Name and Project Description are all potentially candidate keys - i.e. if the business decrees that no 2 projects shall ever have the same Name or Description then either of these could be used as a key. Project ID should be unique by definition (if it isn't unique, it doesn't deserve the name "ID") so is a candidate key. If they are all candidate keys then they are all functionally dependent on each other i.e. Project ID -> Project Name and Project Name -> Project ID, etc. One of them becomes the Primary Key: this choice is arbitrary in theory, but in practice you would choose the ID as being the most convenient.

    There will be other attributes that are clearly not candidate keys - e. g. Project Start Date (unless the business has some bizarre rule that says you can't start 2 projects on the same day!)

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    My tuppen'th -
    Since this is for uni, just make the project name and desciption unique and candidate keys. Forget whether or not they may possibly, one day, with the right wind behind them be duplicated in reality. If you don't constrain them you have not demonstrated what you have demonstrated in this post, that you understand the concept of keys and dependencies.

    What you then have is the ID field is a surrogate key and the functional dependency issues you are struggling with no longer apply.

    I'm not very academically minded when it comes to relational design either so again you might get another persons more pure take on it all.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  4. #4
    Join Date
    May 2008
    Posts
    3
    Hi, andrewst and pootle flump! Thank you for your replies! I had almost given up hope of ever receiving any replies on this subject.

    My intention all along has been to have the projectID be unique, however I became unsure as to how the rest would go. pootle flump's comment that I should make everything unique sounds quite reasonable, I guess I just couldn't decide that by myself.

    So if we have all three attributes unique then they each form candidate keys, additionally any combination of them are keys (superkeys?) and if we would have designated the projectID as the primary key, then it would be called a surrogate key? I'm just trying to straighten out the terminology so that I can impress my lecturer and maybe score an extra plus on the report. In fact, he hasn't really distinguished between keys more than defining "key" and "superkey". By "key" he refers to a minimal key so I guess that equals a candidate key?

    Thanks for helping me out on this one, as the project progresses I'll probably post a few more questions.

    Cheers

    Mankku

  5. #5
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Any combination of attributes that includes a key is a superkey - e.g. {Project ID, Start Date} or {Project ID, Start Date, Manager Name, Project Status}. A minimal key is a superkey from which no attributes can be removed without losing the key property - e.g. {Project ID}.

    A surrogate key in database theory is a key whose values are internally maintained by the DBMS and not known by the user (who would identify the row by some other key). An artificial key is a key that is not "natural" i.e. has been added for convenience, but is known by the user. Project ID would probably be an artificial key. (However, most people call artifical keys surrogate keys!)

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by andrewst
    (However, most people call artifical keys surrogate keys!)
    a very interesting distinction

    i may want to adopt this terminology

    can you give me an example of a surrogate key within this scheme?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    May 2008
    Posts
    3
    Thanks for the added information, Tony. As for picking out a surrogate key in the Project scheme, if we assume that all columns are required to be unique then projectID could be a value that's not displayed to the user. Instead, we let the user pick projects based on project name and description, and the ID is not shown but can still be used internally. Correct?

    Mankku

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Correct. Of course, one could also argue that the surrogate key is superfluous too
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Quote Originally Posted by r937
    can you give me an example of a surrogate key within this scheme?
    I suppose any "IDENTITY"-type key that assigns a meaningless number to identify each row for use in foreign keys and joins but that is never displayed on a screen or report. This could be where a perfectly good natural key exists like State Code for a States table but the DB designer is a surrogate-zealot so assigns it a State ID anyway.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •