| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |
|

06-02-11, 16:16
|
|
Registered User
|
|
Join Date: Jun 2011
Posts: 3
|
|
|
Best way to design a table with start and end data
|
|
I am designing a database that stores trip data for mileage tracking purposes. The database contains a Trips table that stores date/time, location and odometer information for both the starting and the ending locations. I want to be able to store a partial trip in the database without compromising integrity.
For example, a user is preparing to depart on a business trip. The user logs in to the web app using his or her mobile device, enters location and odometer information, and starts a new trip. This partial trip is stored on the database, so that once the user arrives, he or she can log in to a different mobile device and enter the destination location and odometer information so that the trip can be completed.
The Trips table currently stores data for both the starting and ending locations. All columns are non-nullable. Obviously, this does not allow storing null values for the end location data. If I allow null values in the end columns, then my web app frontend has to check the entire table for null end values at startup so it can complete an unfinished trip. This undoubtedly has performance implications when dealing with many users and trips. Also, allowing nulls for all trips compromises data integrity in the Trips table, because only one trip at a time should be unfinished. I don't really want to allow incompleted trips, except for the special case listed above.
Should I separate the start and end locations into separate tables? Maybe keep track somehow of which trip id is incomplete? If so, how? Is there a better solution? I am very inexperienced with database design, so I would appreciate some help with how to implement a design for the above requirements.
|
|

06-03-11, 11:14
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
Your arguments for not allowing nulls are unconvincing.
If you do not allows nulls, your web app frontend will still have to check the entire table for whatever dummy value you use in order to complete an unfinished trip.
If you do not allow nulls, then how are you going to keep data integrity from being compromised when more than one trip is unfinished? With dummy values, it will surely LOOK like the trip is finished, but that is a poor substitute for actual data integrity.
Your performance and integrity issues can be solved with an indexed view on your table.
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|

06-03-11, 15:14
|
|
Registered User
|
|
Join Date: Jun 2011
Posts: 3
|
|
|
|
You have a good point, and I definitely agree with you. I have gotten some input from other sources, and the current plan is to allow nulls, and to index by UserId to cut down on query times. However, I'd love to hear more. I am not very familiar with indexes (or databases in general), and I only just know about views. Should we use views? Is there some resource you can direct me to that explains these concepts?
Thanks for your input.
|
|

06-03-11, 16:55
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
Do a little research on Indexed Views. Though it is a more advanced SQL Server concept, it may prove beneficial to you. You could, for instance, create an indexed view on your table showing only records with NULL end columns, and put a unique constraint on the view to disallow anybody appearing on it more than once. This will throw an error if someone tries to create two simultaneous start records, and will also greatly speed up accessing the incomplete records.
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|

06-03-11, 18:02
|
|
Registered User
|
|
Join Date: Jun 2011
Posts: 3
|
|
Thanks so much for the advice. You've been very helpful.
|
|

06-06-11, 08:05
|
|
Registered User
|
|
Join Date: Dec 2007
Location: London, UK
Posts: 732
|
|
It seems like a trip that has finished and one that is current are two very different database predicates with different attributes. I'd expect to have at least two separate tables for those different types of fact. Logically the starting date/time and finishing date/time would belong in different tables.
Populating nulls for attributes that don't exist for a current trip is surely a kludge that's only going to cause ambiguity and incorrect results. There's no obvious reason to use either "dummy values" or "dummy nulls" for that case.
Consider recording and treating your time intervals as half-open intervals. I typically find that it's easier to get consistent results using half-open intervals rather than closed intervals.
|
|

06-06-11, 12:58
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
Quote:
Originally Posted by dportas
Logically the starting date/time and finishing date/time would belong in different tables.
|
david, would you please expand on this a bit
at first glance, it's remarkably similar to saying that in a person database, logically, the first name and last name would belong in different tables
|
|

06-06-11, 14:20
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
No, no, no, Rudy.
It is analogous to saying that First Name and Last Name should be stored in one table, but middle name should be stored in a separate table since, as in the case of my girlfriend (such a sweetie), not everyone has a middle name. And "Codd Forbid" we allow null values in a middle name field.
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|

06-06-11, 18:28
|
|
Registered User
|
|
Join Date: Dec 2007
Location: London, UK
Posts: 732
|
|
Database design consists of identifying the types of facts to be recorded in the database and then creating data structures to record those fact types accurately. If the facts you are recording do not have end dates then why add a nonexistent end date to them and populate it with nulls?
No accurate representation of information requires nulls. In all cases nulls are introduced by the database designer - very often for technical reasons that have nothing to do with the accuracy of the data. Therefore the designer ought to have a good reason in mind before adding nulls to his data model. The default assumption ought to be not to add nulls.
|
|

06-06-11, 18:37
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
david, i think many of us are familiar with the theory, but seriously, where do you draw the line?
i'd like to see you respond the blindman's NULL middle name situation
in my opinion, there are times when practicality trumps theory
|
|

06-07-11, 05:49
|
|
Registered User
|
|
Join Date: Dec 2007
Location: London, UK
Posts: 732
|
|
Where you draw the line is a good question. At one extreme every database could consist of a single table where every column was nullable. Alternatively you can go for Normal Form and no nulls. I'd suggest that 5NF is generally the safest starting assumption for any database design. One thing that might cause you to deviate from that is if you need to implement some business rule that can't easily be supported between multiple tables. DBMSs sometimes make it hard to implement constraints that reference more than one table (other than the very simplest RI constraints). So if some constraint is important to you then you may have to compromise. The decision needs to be made based on the merits of each particular situation.
In this example you would probably want a constraint to ensure that StartDate is earlier than EndDate. So if your DBMS doesn't provide a good way to do that then that could be a reasonable justification for combining the data into one single table. Doing so has significant disadvantages though. Completed journeys must have other attributes that don't apply to uncompleted ones. If you allow nulls you create the possiblity that data could be inconsistent in ways that you didn't intend - some attributes of the completed journey could be populated and other attributes not populated. So you may need extra constraints to prevent that. Also the database users will have to deal with erroneous or misleading results caused by the nulls or dummy values that you invent for the attributes which don't apply.
Quote:
Originally Posted by r937
i'd like to see you respond the blindman's NULL middle name situation
|
What is the scenario being proposed? To use a null for a person without a middle name? In a previous discussion I recall Blindman saying that nulls in SQL represent unknown values. That's not true but it is a very common misconception. On that basis null wouldn't make a good choice for the middle name attribute of a person without a middle name. It would be wrong to suggest that a person's middle name was unknown when we know that they don't have a middle name. A zero-length string indicates much more clearly that a person doesn't have a middle name. Where names are mandatory then they probably need a "length>0" constraint. Don't make the name nullable or the constraint won't be enforced. If they are optional then they belong in another table. In one database I worked with I had a table keyed on people's names because deduplication using names and knowing the frequency of each name was important.
Quote:
Originally Posted by r937
in my opinion, there are times when practicality trumps theory
|
Theory is practical. Theory is either practical or wrong - and when it's shown to be wrong then the result is just new theory to replace the old one. Fortunately, most of the basic database design theory that millions of people apply or benefit from every day is very well established and has a proven track record over many years of analysis and use. Theory does not say: "Every database must be in 5NF". Theory says: "If a database is not in 5NF then it can have certain kinds of data integrity problems." Practicality can't ever trump that because in practice it's perfectly true.
|
|

06-07-11, 06:10
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
thank you ... very helpful analysis
|
|

06-07-11, 10:28
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
Using a zero-length string for a middle name is an acceptable solution.
But splitting start and end dates into separate tables in David's scenario is a horrible idea. It serves absolutely zero practical purpose, and only complicates both the database design and the sql coding required.
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|

06-07-11, 11:19
|
|
Registered User
|
|
Join Date: May 2008
Posts: 270
|
|
Not to wade into the whole "nulls are evil" debate, but what's wrong with using a dummy "infinite" time value to represent in-progress/unfinished trips? PostgreSQL supports 'infinity' as a special value for timestamps and dates; using '9999-12-31' would seem to be an acceptable alternative.
I suppose NULL might be more semantically correct -- if we don't know when the trip will end, leaving the end time blank makes sense -- but from the standpoint of practicality, I'd think using a dummy value removes a lot of COALESCEs and checking for IS NULL from your SQL.
It's pretty dry reading, and I've not yet waded through the entire thing, but I've found Richard Snodgrass's Developing Time-Oriented Database Applications in SQL to be quite useful in addressing how to handle time issues in databases. For what it's worth, the solution above is what he recommends.
|
|

06-07-11, 12:33
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
I fail to see how one can justify intentionally putting incorrect data into a table.
And you aren't saving anything. You're still going to need to write exception code for that dummy value, just like you would for the null.
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|