Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Aug 2008
    Posts
    25

    When to split in more tables

    I guess since I'm new to this forum I can start with a silly question about a design issue I'm struggling with. I have this app which will let people track the training they do. It will be a web app hosted on my server, not distributed and run on PHP and Mysql. Very little chance that will change in the next years. I'm sure about that sincde it's a personal project and I don't know any other databases then mysql. So the arguments about "what if you switch databases" is not relevant here.

    So each week users might add a few trainings, which consist of a date, method, category, time, intensity, load and some details. here's my first go at the table design:
    Code:
    CREATE TABLE `mc_trainings` (		
    	`id` INT UNSIGNED NOT NULL AUTO_INCREMENT, 
    	`date` DATE NOT NULL,
    	`method` ENUM('routetraining','bouldertraining') NOT NULL,
    	`category` ENUM('strength','resistance','endurance','everything') NOT NULL DEFAULT 'everything',
    	`time` SMALLINT UNSIGNED NULL,
    	`level` ENUM('3','4','5a','5a+','5b','5b+','5c','5c+','6a','6a+','6b','6b+','6c','6c+','7a','7a+','7b','7b+','7c','7c+','8a','8a+','8b','8b+','8c','8c+','9a','9a+','9b','9b+') NOT NULL,
    	`load`  ENUM('1','2','3','4','5','6','7','8','9','10') NOT NULL DEFAULT '3',	
    	`details` VARCHAR(255) NULL,
    	`user_id` INT UNSIGNED NOT NULL, 			
    	PRIMARY KEY (`id`),
    	FOREIGN KEY (`user_id`) REFERENCES `mc_users` (`id`)
    ) ENGINE=InnoDB;
    From reading a bit here, I know some of you might have a heart attack viewing all those ENUMs in there. I hope you survive ..

    Anyway, if I remove all those in separate tables, I end up with this
    Code:
    CREATE TABLE `mc_trainings` (		
    	`id` INT UNSIGNED NOT NULL AUTO_INCREMENT, 
    	`date` DATE NOT NULL,
    	`method_id` SMALLINT UNSIGNED NOT NULL,
    	`category_id` SMALLINT UNSIGNED NOT NULL,
    	`time` SMALLINT UNSIGNED NULL,
    	`level` ENUM('3','4','5a','5a+','5b','5b+','5c','5c+','6a','6a+','6b','6b+','6c','6c+','7a','7a+','7b','7b+','7c','7c+','8a','8a+','8b','8b+','8c','8c+','9a','9a+','9b','9b+') NOT NULL,
    	`load`  SMALLINT UNSIGNED NOT NULL,	
    	`details` VARCHAR(255) NULL,
    	`user_id` INT UNSIGNED NOT NULL, 			
    	PRIMARY KEY (`id`),
    	FOREIGN KEY (`method_id`) REFERENCES `mc_trainingmethods` (`id`),
    	FOREIGN KEY (`category_id`) REFERENCES `mc_trainingcategories` (`id`),
    	FOREIGN KEY (`user_id`) REFERENCES `mc_users` (`id`)
    ) ENGINE=InnoDB;
    
    # there will be 3 - 5 methods
    CREATE TABLE `mc_trainingmethods` (
    	`id` SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
    	`method` VARCHAR(255) NOT NULL,
    	PRIMARY KEY (`id`)
    ) ENGINE=InnoDB;
    
    # there will be about 5 - 7 categories
    CREATE TABLE `mc_trainingcategories` (
    	`id` SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
    	`category` VARCHAR(255) NOT NULL,
    	PRIMARY KEY (`id`)
    ) ENGINE=InnoDB;
    
    # load will be a number between 0 and 10
    CREATE TABLE `mc_trainingload` (
    	`id` SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
    	`load` SMALLINT UNSIGNED NOT NULL,
    	PRIMARY KEY (`id`)
    ) ENGINE=InnoDB;
    This means every query to retrieve some trainings will mean a join on at least 5 tables. I know that from a pure design point, having all these separate tables might be better, but in practice it just seems so cumbersome.

    Sure you can say that there might be updates to the ENUMs in the first table. I might want to add a category in a year time for example. However, since I am the only developer here, is it really impossible/unwanted to do that?

    Any (other) thoughts?

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    The biggest problem I see is this:
    Quote Originally Posted by Mattiman
    Very little chance that will change in the next years.
    Most spectacular failures usually begin with such statements.

    ENUMs are just constraints, and their use doesn't seem to violate normalization principles for as long as they remain attributes of "training". However, at the very moment they turn into entities themselves (e.g you decide you want to add a description of what "bouldertraining" is), you are doomed.

    Another downside of using ENUMs in your situation is this: if you want to help users enter their data by presenting them with drop-down lists of possible values, you will have to maintain the lists in two places (database and HTML/PHP code). In a sense, that also turns those ENUMs into entities.

    Anyway, if your goal is to just publish a list of "mc_trainings" on the web, do as you wish. However, if you want to learn something about application design, I suggest you start doing things properly from the beginning.
    ---
    "It does not work" is not a valid problem statement.

  3. #3
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    nick, very nice explanation of the evils of ENUM

    good job
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  4. #4
    Join Date
    Aug 2008
    Posts
    25
    Thanks for your reply Nick. I'll try to reply to your statements as best as I can below:
    Most spectacular failures usually begin with such statements
    yes, that's probably true. But I bet there are enough situations where there was indeed no change or very little. The specific situation here is: it's a personal project, I'm learning as I go. I made the choice to use PHP and MySql. I'm not going to build something now, not using any PHP/Mysql specific features, because the app "might have to be ported to .NET and Oracle in 3 years time". If that is the case, that will be so much extra work for me, that migrating some data from one database to another is probably only a small part of the job.

    So I agree with you that there will change things. 100% sure. But I only wanted to describe the context of the project, because I think it makes a difference in how portable everything has to be. This is not a project being handed over to corporate client X, who's going to have to migrate databases in 2 years time because of some higher level decision in the company.

    ENUMs are just constraints, and their use doesn't seem to violate normalization principles for as long as they remain attributes of "training". However, at the very moment they turn into entities themselves (e.g you decide you want to add a description of what "bouldertraining" is), you are doomed.
    That's an interesting point. I will give this some more thought.

    Another downside of using ENUMs in your situation is this: if you want to help users enter their data by presenting them with drop-down lists of possible values, you will have to maintain the lists in two places (database and HTML/PHP code). In a sense, that also turns those ENUMs into entities.
    What I thought I would do is query the db for the possible values and then show those in a dropdown list. But I can see now (haven't researched it fully), that doing a query for the ENUMs is more difficult than doing the same for a list of rows when I have the possible values in a separate table ....

    Anyway, if your goal is to just publish a list of "mc_trainings" on the web, do as you wish. However, if you want to learn something about application design, I suggest you start doing things properly from the beginning.
    Maybe I'm misunderstanding this line, but it comes across as a bit condescending. No need for that. I come here for the exact reason that I want to learn about doing things properly. That's why I ask my question(s).

    So thanks a lot so far for the input, very helpful.

    Quote Originally Posted by r937
    nick, very nice explanation of the evils of ENUM
    Funny, I was almost sure by just reading a few threads on this forum that you'd respond this way
    There are probably (for you) good reasons for this response. But remember that just saying something is evil doesn't explain much. I did read your previous rants about ENUM, and I understand most arguments. But I'm not sure they imply that ENUMs are evil by them selves. There might be many situations in which choosing something else then ENUM for a field has more advantages and less disadvantages. But that's something else then saying something is evil, can absolutely never ever be used, etc

    Anyway, thanks for your input so far guys. If I understand both of you well, I should look into the multi table solution instead of the use of ENUMs for method, category and load. Are there other issues you see which might be improved?

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Mattiman
    Are there other issues you see which might be improved?
    don't use reserved words as column names, as you will be forced forever to escape them, e.g. DATE and TIME and LOAD

    also, i noticed that your main table has a load column, but for some reason you did not define it as a foreign key

    before you do, consider ~not~ using an auto_increment for the load table
    Code:
    CREATE TABLE mc_trainingloads 
    ( trainingload SMALLINT UNSIGNED NOT NULL PRIMARY KEY 
    ) ENGINE=InnoDB;
    then the foreign key would be...
    Code:
    CREATE TABLE mc_trainings 
    ( ...  
    , trainingload SMALLINT UNSIGNED NOT NULL 
    , FOREIGN KEY ( trainingload ) 
           REFERENCES trainingloads ( trainingload )
    , ...
    ) ENGINE=InnoDB;
    a surrogate key here would be quite useless
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    Aug 2008
    Posts
    25
    Quote Originally Posted by r937
    don't use reserved words as column names, as you will be forced forever to escape them, e.g. DATE and TIME and LOAD
    Ah, thanks for mentioning that. I knew about that, but didn't remember it.

    Quote Originally Posted by r937
    also, i noticed that your main table has a load column, but for some reason you did not define it as a foreign key

    before you do, consider ~not~ using an auto_increment for the load table
    It's correct that I forgot to define the load as a foreign key. About the training load (which will be a number between 0 and 10) and not using an auto_increment: it does feel a bit strange to have these two tables:
    Code:
    CREATE TABLE `mc_trainings` (		
      `trainingload`  SMALLINT UNSIGNED NOT NULL,	
      FOREIGN KEY (`trainingload`) REFERENCES `mc_trainingloads` (`trainingload`),
    ) ENGINE=InnoDB;
    
    CREATE TABLE `mc_trainingloads` (
      `trainingload` SMALLINT UNSIGNED NOT NULL PRIMARY KEY
    ) ENGINE=InnoDB;
    In which mc_trainingloads only contains a single or double digit. But maybe that's something to get used to. From the programming view, I might as well ditch the second table and just insert the value of 0-10 direct in the first table. But of course then you miss the database enforcement of the value having to be 0-10.

    About not having a n auto-increment surrogate: If I understand you correctly: you think the surrogate is useless because the value itself is an incrementing number?

    About the other references (to method and category): would it be ok in those cases to use surrogate keys (auto increment ids)?

    Again, thanks for the valuable input

  7. #7
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Quote Originally Posted by Mattiman

    Maybe I'm misunderstanding this line, but it comes across as a bit condescending. No need for that.
    Didn't mean it to sound that way. Let me try and rephrase that. I don't know your background or aspirations. May be you are a professional athlete, or an MD, and your goal is to quickly build something that allows you to collect research information about different approaches to physical training. If that is the case, it does not really matter how you implement your application, as long as it does what you need now. You may never need to touch it again.

    If, however, your goals is, at least in part, to practice design and coding techniques and principles of data and application architecture, which would be the case if your career is in the IT field, your best approach would be to look beyond the short-term advantages of "quick and dirty" solutions and try to see a bigger picture.
    ---
    "It does not work" is not a valid problem statement.

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Mattiman
    But of course then you miss the database enforcement of the value having to be 0-10.
    no, you don't

    the database enforcement means that only those values in the referenced table can be used, so if you had loads of 1,2,3,5,8,15, then only those values would be allowed

    Quote Originally Posted by Mattiman
    About not having a n auto-increment surrogate: If I understand you correctly: you think the surrogate is useless because the value itself is an incrementing number?
    no, not because it is incrementing, which it doesn't have to be, but because the values themselves are perfectly adequate

    a surrogate key is often used when the "real" key (some other column(s) with unique values) would be cumbersome

    for instance, suppose you have a forum database, with discussion threads in various forums, and you wanted to ensure that each thread in a forum had a unique title, but also that thread titles might be repeated across forums, which themselves have unique names

    for instance, each forum might have a sticky thread called "Rules for this forum"

    then the "real" key for a thread would be a composite of the forum title and the thread title

    so far so good, yeah? but this means that referring to a thread would require a foreign key also consisting of composite forum and thread titles -- clearly, this is too cumbersome, and a surrogate integer is used instead

    for your load table, there is no cumber (if i can make up a word), as the surrogate would be exactly the same size as the key it's standing in for

    make sense?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Aug 2008
    Posts
    25
    Hi Nick, thanks for your reply. I can give some more background info about the project (and me): I am a web designer/developer, learning these things on the go. This project is a personal project, but it will be a public web app. So, I am looking for best practices. It's not a quick and dirty thing I create locally for myself. I am trying to learn good design and coding techniques. So any input of why something is not considered a good choice (what are advantages and disadvantages) is very much appreciated!

    In case I do try to critically discuss something, it's not because I want to prove I'm right and someone else is wrong, it's just because I try to discover everything I can about the subject.

    Quote Originally Posted by r937
    no, you don't

    the database enforcement means that only those values in the referenced table can be used, so if you had loads of 1,2,3,5,8,15, then only those values would be allowed
    maybe I didn't write it clearly enough. What I mean is: what if I would not have the second table with "trainingloads"? And just have my PHP logic enforce that trainingloads be one of the allowed values. I have to do that anyways, because the model classes I design will contain all necessary validation rules for the model. So in that sense, any restrictions in the database are "just" extra reinforcements/validations of the data. Does that make sense?

    Quote Originally Posted by r937
    a surrogate key is often used when the "real" key (some other column(s) with unique values) would be cumbersome
    Ok, that is clear I think. I even remember reading some discussions about surrogate keys or not. But then if I follow your logic, what about the other tables for trainingmethods and trainingcategories:

    Code:
    # bouldertraining, routetraining, campusboard, systemboard, fingerboard, gym, ..., ...
    CREATE TABLE `mc_trainingmethods` (
      `id` SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
      `method` VARCHAR(255) NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB;
    
    # there will be about 5 - 7 categories: strength, resistance, endurance, everything, ..
    CREATE TABLE `mc_trainingcategories` (
      `id` SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
      `category` VARCHAR(255) NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB;
    I might as well change those table to:
    Code:
    CREATE TABLE `mc_trainingmethods` (
      `method` VARCHAR(255) NOT NULL PRIMARY KEY
    ) ENGINE=InnoDB;
    
    CREATE TABLE `mc_trainingcategories` (
      `category` VARCHAR(255) NOT NULL PRIMARY KEY 
    ) ENGINE=InnoDB;
    is it not?

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Mattiman
    ... And just have my PHP logic enforce that trainingloads be one of the allowed values. I have to do that anyways, because the model classes I design will contain all necessary validation rules for the model. So in that sense, any restrictions in the database are "just" extra reinforcements/validations of the data. Does that make sense?
    yes, it makes sense, but you have it backwards -- the PHP validation is the "extra" part (although i am not disdaining it, as it makes for a happier user experience)

    the database validation should be in place regardless of what you do in the application -- after all, not all data comes in via the application (maybe in your case it does, but this is not always true), and having all your validation in application code sort of ties you down to using only application code to interact with your data

    and remember, data always outlives applications



    Quote Originally Posted by Mattiman
    is it not?
    no, it is not

    although it would work, most people feel that a VARCHAR(255) foreign key would bloat the data table too much
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Aug 2008
    Posts
    25
    Quote Originally Posted by r937
    and remember, data always outlives applications
    is that so? maybe. never thought about that. maybe I will

    Quote Originally Posted by r937
    although it would work, most people feel that a VARCHAR(255) foreign key would bloat the data table too much
    ok, that sounds like a good argument

  12. #12
    Join Date
    Aug 2008
    Posts
    25
    To continu on the ENUM question: if you have the following field in a table "users":
    Code:
    `user_sex` enum('M','F') default 'M',
    would you go so far in this case to also ditch the ENUM field and create a separate table user_sex, with 2 rows, M and F, and reference that?

  13. #13
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    you're asking me? yes, i probably would

    what if you wanted to add a couple more values, H and T?

    how would you be sure you know what they meant? easiest would be to have a sex name that you could look up in the sex codes table
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  14. #14
    Join Date
    Aug 2008
    Posts
    25
    H and T? M and F is just male and female. I could also do
    ENUM('male','female')

  15. #15
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    yes, H and T

    see? you need a translation, don't you
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •