Page 1 of 7 123 ... LastLast
Results 1 to 15 of 96
  1. #1
    Join Date
    Jun 2011
    Posts
    3

    Best way to design a table with start and end data

    I am designing a database that stores trip data for mileage tracking purposes. The database contains a Trips table that stores date/time, location and odometer information for both the starting and the ending locations. I want to be able to store a partial trip in the database without compromising integrity.

    For example, a user is preparing to depart on a business trip. The user logs in to the web app using his or her mobile device, enters location and odometer information, and starts a new trip. This partial trip is stored on the database, so that once the user arrives, he or she can log in to a different mobile device and enter the destination location and odometer information so that the trip can be completed.

    The Trips table currently stores data for both the starting and ending locations. All columns are non-nullable. Obviously, this does not allow storing null values for the end location data. If I allow null values in the end columns, then my web app frontend has to check the entire table for null end values at startup so it can complete an unfinished trip. This undoubtedly has performance implications when dealing with many users and trips. Also, allowing nulls for all trips compromises data integrity in the Trips table, because only one trip at a time should be unfinished. I don't really want to allow incompleted trips, except for the special case listed above.

    Should I separate the start and end locations into separate tables? Maybe keep track somehow of which trip id is incomplete? If so, how? Is there a better solution? I am very inexperienced with database design, so I would appreciate some help with how to implement a design for the above requirements.

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Your arguments for not allowing nulls are unconvincing.
    If you do not allows nulls, your web app frontend will still have to check the entire table for whatever dummy value you use in order to complete an unfinished trip.
    If you do not allow nulls, then how are you going to keep data integrity from being compromised when more than one trip is unfinished? With dummy values, it will surely LOOK like the trip is finished, but that is a poor substitute for actual data integrity.

    Your performance and integrity issues can be solved with an indexed view on your table.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Jun 2011
    Posts
    3
    You have a good point, and I definitely agree with you. I have gotten some input from other sources, and the current plan is to allow nulls, and to index by UserId to cut down on query times. However, I'd love to hear more. I am not very familiar with indexes (or databases in general), and I only just know about views. Should we use views? Is there some resource you can direct me to that explains these concepts?

    Thanks for your input.

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Do a little research on Indexed Views. Though it is a more advanced SQL Server concept, it may prove beneficial to you. You could, for instance, create an indexed view on your table showing only records with NULL end columns, and put a unique constraint on the view to disallow anybody appearing on it more than once. This will throw an error if someone tries to create two simultaneous start records, and will also greatly speed up accessing the incomplete records.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Jun 2011
    Posts
    3
    Thanks so much for the advice. You've been very helpful.

  6. #6
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    It seems like a trip that has finished and one that is current are two very different database predicates with different attributes. I'd expect to have at least two separate tables for those different types of fact. Logically the starting date/time and finishing date/time would belong in different tables.

    Populating nulls for attributes that don't exist for a current trip is surely a kludge that's only going to cause ambiguity and incorrect results. There's no obvious reason to use either "dummy values" or "dummy nulls" for that case.

    Consider recording and treating your time intervals as half-open intervals. I typically find that it's easier to get consistent results using half-open intervals rather than closed intervals.

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by dportas View Post
    Logically the starting date/time and finishing date/time would belong in different tables.
    david, would you please expand on this a bit

    at first glance, it's remarkably similar to saying that in a person database, logically, the first name and last name would belong in different tables
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    No, no, no, Rudy.
    It is analogous to saying that First Name and Last Name should be stored in one table, but middle name should be stored in a separate table since, as in the case of my girlfriend (such a sweetie), not everyone has a middle name. And "Codd Forbid" we allow null values in a middle name field.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Database design consists of identifying the types of facts to be recorded in the database and then creating data structures to record those fact types accurately. If the facts you are recording do not have end dates then why add a nonexistent end date to them and populate it with nulls?

    No accurate representation of information requires nulls. In all cases nulls are introduced by the database designer - very often for technical reasons that have nothing to do with the accuracy of the data. Therefore the designer ought to have a good reason in mind before adding nulls to his data model. The default assumption ought to be not to add nulls.

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    david, i think many of us are familiar with the theory, but seriously, where do you draw the line?

    i'd like to see you respond the blindman's NULL middle name situation

    in my opinion, there are times when practicality trumps theory
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Where you draw the line is a good question. At one extreme every database could consist of a single table where every column was nullable. Alternatively you can go for Normal Form and no nulls. I'd suggest that 5NF is generally the safest starting assumption for any database design. One thing that might cause you to deviate from that is if you need to implement some business rule that can't easily be supported between multiple tables. DBMSs sometimes make it hard to implement constraints that reference more than one table (other than the very simplest RI constraints). So if some constraint is important to you then you may have to compromise. The decision needs to be made based on the merits of each particular situation.

    In this example you would probably want a constraint to ensure that StartDate is earlier than EndDate. So if your DBMS doesn't provide a good way to do that then that could be a reasonable justification for combining the data into one single table. Doing so has significant disadvantages though. Completed journeys must have other attributes that don't apply to uncompleted ones. If you allow nulls you create the possiblity that data could be inconsistent in ways that you didn't intend - some attributes of the completed journey could be populated and other attributes not populated. So you may need extra constraints to prevent that. Also the database users will have to deal with erroneous or misleading results caused by the nulls or dummy values that you invent for the attributes which don't apply.

    Quote Originally Posted by r937 View Post
    i'd like to see you respond the blindman's NULL middle name situation
    What is the scenario being proposed? To use a null for a person without a middle name? In a previous discussion I recall Blindman saying that nulls in SQL represent unknown values. That's not true but it is a very common misconception. On that basis null wouldn't make a good choice for the middle name attribute of a person without a middle name. It would be wrong to suggest that a person's middle name was unknown when we know that they don't have a middle name. A zero-length string indicates much more clearly that a person doesn't have a middle name. Where names are mandatory then they probably need a "length>0" constraint. Don't make the name nullable or the constraint won't be enforced. If they are optional then they belong in another table. In one database I worked with I had a table keyed on people's names because deduplication using names and knowing the frequency of each name was important.

    Quote Originally Posted by r937 View Post
    in my opinion, there are times when practicality trumps theory
    Theory is practical. Theory is either practical or wrong - and when it's shown to be wrong then the result is just new theory to replace the old one. Fortunately, most of the basic database design theory that millions of people apply or benefit from every day is very well established and has a proven track record over many years of analysis and use. Theory does not say: "Every database must be in 5NF". Theory says: "If a database is not in 5NF then it can have certain kinds of data integrity problems." Practicality can't ever trump that because in practice it's perfectly true.

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    thank you ... very helpful analysis
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Using a zero-length string for a middle name is an acceptable solution.
    But splitting start and end dates into separate tables in David's scenario is a horrible idea. It serves absolutely zero practical purpose, and only complicates both the database design and the sql coding required.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  14. #14
    Join Date
    May 2008
    Posts
    277
    Not to wade into the whole "nulls are evil" debate, but what's wrong with using a dummy "infinite" time value to represent in-progress/unfinished trips? PostgreSQL supports 'infinity' as a special value for timestamps and dates; using '9999-12-31' would seem to be an acceptable alternative.

    I suppose NULL might be more semantically correct -- if we don't know when the trip will end, leaving the end time blank makes sense -- but from the standpoint of practicality, I'd think using a dummy value removes a lot of COALESCEs and checking for IS NULL from your SQL.

    It's pretty dry reading, and I've not yet waded through the entire thing, but I've found Richard Snodgrass's Developing Time-Oriented Database Applications in SQL to be quite useful in addressing how to handle time issues in databases. For what it's worth, the solution above is what he recommends.

  15. #15
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    I fail to see how one can justify intentionally putting incorrect data into a table.
    And you aren't saving anything. You're still going to need to write exception code for that dummy value, just like you would for the null.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •