If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Best way to design a table with start and end data

Closed Thread
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-02-11, 16:16
davidhaymond davidhaymond is offline
Registered User
 
Join Date: Jun 2011
Posts: 3
Best way to design a table with start and end data

I am designing a database that stores trip data for mileage tracking purposes. The database contains a Trips table that stores date/time, location and odometer information for both the starting and the ending locations. I want to be able to store a partial trip in the database without compromising integrity.

For example, a user is preparing to depart on a business trip. The user logs in to the web app using his or her mobile device, enters location and odometer information, and starts a new trip. This partial trip is stored on the database, so that once the user arrives, he or she can log in to a different mobile device and enter the destination location and odometer information so that the trip can be completed.

The Trips table currently stores data for both the starting and ending locations. All columns are non-nullable. Obviously, this does not allow storing null values for the end location data. If I allow null values in the end columns, then my web app frontend has to check the entire table for null end values at startup so it can complete an unfinished trip. This undoubtedly has performance implications when dealing with many users and trips. Also, allowing nulls for all trips compromises data integrity in the Trips table, because only one trip at a time should be unfinished. I don't really want to allow incompleted trips, except for the special case listed above.

Should I separate the start and end locations into separate tables? Maybe keep track somehow of which trip id is incomplete? If so, how? Is there a better solution? I am very inexperienced with database design, so I would appreciate some help with how to implement a design for the above requirements.
  #2 (permalink)  
Old 06-03-11, 11:14
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Your arguments for not allowing nulls are unconvincing.
If you do not allows nulls, your web app frontend will still have to check the entire table for whatever dummy value you use in order to complete an unfinished trip.
If you do not allow nulls, then how are you going to keep data integrity from being compromised when more than one trip is unfinished? With dummy values, it will surely LOOK like the trip is finished, but that is a poor substitute for actual data integrity.

Your performance and integrity issues can be solved with an indexed view on your table.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
  #3 (permalink)  
Old 06-03-11, 15:14
davidhaymond davidhaymond is offline
Registered User
 
Join Date: Jun 2011
Posts: 3
You have a good point, and I definitely agree with you. I have gotten some input from other sources, and the current plan is to allow nulls, and to index by UserId to cut down on query times. However, I'd love to hear more. I am not very familiar with indexes (or databases in general), and I only just know about views. Should we use views? Is there some resource you can direct me to that explains these concepts?

Thanks for your input.
  #4 (permalink)  
Old 06-03-11, 16:55
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Do a little research on Indexed Views. Though it is a more advanced SQL Server concept, it may prove beneficial to you. You could, for instance, create an indexed view on your table showing only records with NULL end columns, and put a unique constraint on the view to disallow anybody appearing on it more than once. This will throw an error if someone tries to create two simultaneous start records, and will also greatly speed up accessing the incomplete records.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
  #5 (permalink)  
Old 06-03-11, 18:02
davidhaymond davidhaymond is offline
Registered User
 
Join Date: Jun 2011
Posts: 3
Thanks so much for the advice. You've been very helpful.
  #6 (permalink)  
Old 06-06-11, 08:05
dportas dportas is offline
Registered User
 
Join Date: Dec 2007
Location: London, UK
Posts: 732
It seems like a trip that has finished and one that is current are two very different database predicates with different attributes. I'd expect to have at least two separate tables for those different types of fact. Logically the starting date/time and finishing date/time would belong in different tables.

Populating nulls for attributes that don't exist for a current trip is surely a kludge that's only going to cause ambiguity and incorrect results. There's no obvious reason to use either "dummy values" or "dummy nulls" for that case.

Consider recording and treating your time intervals as half-open intervals. I typically find that it's easier to get consistent results using half-open intervals rather than closed intervals.
  #7 (permalink)  
Old 06-06-11, 12:58
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by dportas View Post
Logically the starting date/time and finishing date/time would belong in different tables.
david, would you please expand on this a bit

at first glance, it's remarkably similar to saying that in a person database, logically, the first name and last name would belong in different tables
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
  #8 (permalink)  
Old 06-06-11, 14:20
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
No, no, no, Rudy.
It is analogous to saying that First Name and Last Name should be stored in one table, but middle name should be stored in a separate table since, as in the case of my girlfriend (such a sweetie), not everyone has a middle name. And "Codd Forbid" we allow null values in a middle name field.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
  #9 (permalink)  
Old 06-06-11, 18:28
dportas dportas is offline
Registered User
 
Join Date: Dec 2007
Location: London, UK
Posts: 732
Database design consists of identifying the types of facts to be recorded in the database and then creating data structures to record those fact types accurately. If the facts you are recording do not have end dates then why add a nonexistent end date to them and populate it with nulls?

No accurate representation of information requires nulls. In all cases nulls are introduced by the database designer - very often for technical reasons that have nothing to do with the accuracy of the data. Therefore the designer ought to have a good reason in mind before adding nulls to his data model. The default assumption ought to be not to add nulls.
  #10 (permalink)  
Old 06-06-11, 18:37
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
david, i think many of us are familiar with the theory, but seriously, where do you draw the line?

i'd like to see you respond the blindman's NULL middle name situation

in my opinion, there are times when practicality trumps theory
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
  #11 (permalink)  
Old 06-07-11, 05:49
dportas dportas is offline
Registered User
 
Join Date: Dec 2007
Location: London, UK
Posts: 732
Where you draw the line is a good question. At one extreme every database could consist of a single table where every column was nullable. Alternatively you can go for Normal Form and no nulls. I'd suggest that 5NF is generally the safest starting assumption for any database design. One thing that might cause you to deviate from that is if you need to implement some business rule that can't easily be supported between multiple tables. DBMSs sometimes make it hard to implement constraints that reference more than one table (other than the very simplest RI constraints). So if some constraint is important to you then you may have to compromise. The decision needs to be made based on the merits of each particular situation.

In this example you would probably want a constraint to ensure that StartDate is earlier than EndDate. So if your DBMS doesn't provide a good way to do that then that could be a reasonable justification for combining the data into one single table. Doing so has significant disadvantages though. Completed journeys must have other attributes that don't apply to uncompleted ones. If you allow nulls you create the possiblity that data could be inconsistent in ways that you didn't intend - some attributes of the completed journey could be populated and other attributes not populated. So you may need extra constraints to prevent that. Also the database users will have to deal with erroneous or misleading results caused by the nulls or dummy values that you invent for the attributes which don't apply.

Quote:
Originally Posted by r937 View Post
i'd like to see you respond the blindman's NULL middle name situation
What is the scenario being proposed? To use a null for a person without a middle name? In a previous discussion I recall Blindman saying that nulls in SQL represent unknown values. That's not true but it is a very common misconception. On that basis null wouldn't make a good choice for the middle name attribute of a person without a middle name. It would be wrong to suggest that a person's middle name was unknown when we know that they don't have a middle name. A zero-length string indicates much more clearly that a person doesn't have a middle name. Where names are mandatory then they probably need a "length>0" constraint. Don't make the name nullable or the constraint won't be enforced. If they are optional then they belong in another table. In one database I worked with I had a table keyed on people's names because deduplication using names and knowing the frequency of each name was important.

Quote:
Originally Posted by r937 View Post
in my opinion, there are times when practicality trumps theory
Theory is practical. Theory is either practical or wrong - and when it's shown to be wrong then the result is just new theory to replace the old one. Fortunately, most of the basic database design theory that millions of people apply or benefit from every day is very well established and has a proven track record over many years of analysis and use. Theory does not say: "Every database must be in 5NF". Theory says: "If a database is not in 5NF then it can have certain kinds of data integrity problems." Practicality can't ever trump that because in practice it's perfectly true.
  #12 (permalink)  
Old 06-07-11, 06:10
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
thank you ... very helpful analysis
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
  #13 (permalink)  
Old 06-07-11, 10:28
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Using a zero-length string for a middle name is an acceptable solution.
But splitting start and end dates into separate tables in David's scenario is a horrible idea. It serves absolutely zero practical purpose, and only complicates both the database design and the sql coding required.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
  #14 (permalink)  
Old 06-07-11, 11:19
futurity futurity is offline
Registered User
 
Join Date: May 2008
Posts: 270
Not to wade into the whole "nulls are evil" debate, but what's wrong with using a dummy "infinite" time value to represent in-progress/unfinished trips? PostgreSQL supports 'infinity' as a special value for timestamps and dates; using '9999-12-31' would seem to be an acceptable alternative.

I suppose NULL might be more semantically correct -- if we don't know when the trip will end, leaving the end time blank makes sense -- but from the standpoint of practicality, I'd think using a dummy value removes a lot of COALESCEs and checking for IS NULL from your SQL.

It's pretty dry reading, and I've not yet waded through the entire thing, but I've found Richard Snodgrass's Developing Time-Oriented Database Applications in SQL to be quite useful in addressing how to handle time issues in databases. For what it's worth, the solution above is what he recommends.
  #15 (permalink)  
Old 06-07-11, 12:33
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
I fail to see how one can justify intentionally putting incorrect data into a table.
And you aren't saving anything. You're still going to need to write exception code for that dummy value, just like you would for the null.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On