Page 1 of 3 123 LastLast
Results 1 to 15 of 42
  1. #1
    Join Date
    Aug 2008
    Posts
    19

    newbie question about db design

    I'm fairly new to SQL and have a design question. For a small project I have several places where I need the same data structure, representing for each day of the week (Mon to Sun) either a range of hours or no range for that day. For example, a work schedule might be Mon 9am-3pm, Tue 9am-6pm, and Thu 1pm to 11pm, and nothing on the other days.

    But I also need the same thing to represent other things within the same project, such as the hours that a store is open each week, and several other things. For all of these there only needs to be one interval (or none) for each day of the week, and it's sufficient for intervals to start and end on hour boundaries. In a programming language I'd define a class and methods to represent this and then use it over and over. What's the right approach design-wise with SQL?

    I was thinking I'd have a single table of WeeklyIntervals with columns id (primary key), MonStart (0-23), MonEnd (0-23), TueStart, ... and so on. A day without shift would set the corresponding start and end to zero. Then an Employee table might have a column Workschedule that's a foreign key (WeeklyIntervals id). Similarly a Store table might have an Hours column.

    But I also thought about having an Intervals table with each row representing just one interval, and another column to indicate which "set" of intervals it belongs to. So I'd have id, setid, day of week ("Sun", "Mon" ...), start (0-23) and end (0-23). A set of intervals would all share the same setid. This seems cleaner. Having MonStart, MonEnd, TueStart, TueEnd, and so on seems more awkward. But with this simpler design I have to check to see that a set of intervals doesn't have duplicate days. And how do I get a new setid key when I want to create a new set? And how is an empty set of intervals (no intervals for any day of the week) going to be represented? A setid that matches zero rows would do it, but then I'm back to how to get the next setid to use, since a setid that has no rows in the table may currently be used as a foreign key in another table. This initially seemed cleaner, until I started trying to figure out how to make it work!

    Any pointers or suggestions would be welcome!

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Welcome to the forums Bo Baffett!

    I've moved this thread to the Database Concepts & Design topic, where it should get the best responses
    George
    Home | Blog

  3. #3
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Bo Baffett
    And how is an empty set of intervals (no intervals for any day of the week) going to be represented?
    i'll answer this part first -- by rows that aren't there!!!

    the day of the week should be part of the table's key
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    This sounds like homework to me. If it is homework, we need to understand a lot more about the class and the assignment to give you much useful help. If it isn't homework, I'd take r937's suggestion as a start, but I'd go much further to make the design more resiliant to both common real world problems and changes that I'd expect to the design.

    Can you give us a bit more insight into what "real world" tasks you want to accomplish? That will help us give you better suggestions.

    -PatP

  5. #5
    Join Date
    Aug 2008
    Posts
    19

    Not homework!

    I'm not a student and this isn't homework, although I'm not surprised that it's so simple that it could be homework. It's a project for my job, to turn some Python code I wrote into a nice demo. I wanted to use a database rather than something ad-hoc with flat files, partly as an attempt teach myself a little SQL. (I started out as an EE, then did an MS in CS but database wasn't one of the required classes, unfortunately.) I'm doing the UI with pylons and SQLAlchemy, and using SQLite for the db but will need to switch to MS SQL later if the demo is good enough to give this project a longer lifespan, which I very much hope turns out to be the case.

    Thanks for any replies!

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    excellent, i shall try to toss in a reference to neutrinos when i can

    so, did you like my suggestion?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Aug 2008
    Posts
    19

    empty set representation

    Quote Originally Posted by r937
    i'll answer this part first -- by rows that aren't there!!!
    That makes sense for the case of an empty set of intervals, but then I don't see how to know when a "setid" key exists in some other table as a foreign key, pointing to an empty set of rows. The only idea I had for that was to have another table, Sets, that did nothing but keep track of the active setid's. Maybe that autoincrement primary key is its only column. So for an empty set, there would be a row in the Sets table with primary key setid, but then no rows in the other table with that key. Or something.

    Quote Originally Posted by r937
    the day of the week should be part of the table's key
    Should it? I'm probably thinking in Python rather than thinking in SQL here, but when I want to do something with one of these sets of intervals I thought I'd usually want to retrieve the entire set (by setid) at once. So then I wouldn't want the day of the week to be a primary key, right? Or did you mean that making the day of the week part of the key helps somehow with the empty set representation?

    Thanks for your help!

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    here's some people data --

    1 Todd
    2 Rudy

    here's some schedule data --

    1 Tue 9-5
    1 Wed 1:30-10
    2 Sun 3:30-8
    2 Mon 2-10
    3 Tue 1:30-10

    the primary key in the scheduling data is a composite (of either 2 columns or 3, the difference depending on a business rule regarding whether a person can be scheduled more than once in a single day)

    notice how the day is part of the key

    so, how do you "point" to a row that isn't there?

    allow me to demonstrate:

    what hours is Todd working on Thu?

    now ask yourself two questions:

    1. you did a lookup using WHERE wkday='Thu' and got NULL back, right? so where did 'Thu' come from

    2. you did not need a 'Thu' column to find out that Todd isn't working on Thu


    sorry if my reply sounds flippant, i am trying to explain very difficult modelling concepts with very simple examples

    if you understand the example but not the concept it's trying to illustrate, please do ask
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Aug 2008
    Posts
    19

    I think I got your point there

    Since there can only be one interval per day, making the setid and day-of-week both primary keys makes sense. I can still query just on the setid when I want the entire set, as would usually be the case. Most commonly, since this is interfacing to existing code, I would grab the entire set, translate it into an instance of the Python class I'm using to represent these sets, and hand it off to the existing code to do its thing.

    Okay, but then I need a way to grab the next unused setid. Suppose I'm adding a new person to the person table. Each person has a set of shifts that they are currently working, and a set of shifts that they've requested. So Kenny is the new person, and has no current shifts. If the next unused setid is 102, then the 'current' field for Kenny gets the value 102. A query on the schedule table for setid=102 returns zero rows, showing that Kenny has no current shifts assigned. At some later point shifts might be added to schedule set 102, but for now it's empty.

    Now the program wants to add the preferred shifts for Kenny, and I want to create a new schedule set, presumably with setid=103 as the next unused key. But there are no rows in the schedule table with key 102, because setid=102 is an empty set. So how do you figure out that setid=102 is actually being used, i.e., that there's another table with setid=102 as a foreign key, pointing to that particular set which happens to be empty at the moment?

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Bo Baffett
    So how do you figure out that setid=102 is actually being used, i.e., that there's another table with setid=102 as a foreign key, pointing to that particular set which happens to be empty at the moment?
    that's not possible

    the whole deal with foreign keys is that they must refer to an existing primary key

    so the premise is somehow wrong, you can't "reserve" a set without inserting rows for it

    by the way, what's a set?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Aug 2008
    Posts
    19

    So ...

    It sounds like the idea I suggested earlier about having a second table with one row per schedule is the only idea still standing then. Your earlier example oversimplifies the problem, because schedules don't just apply to people. In fact a person entry needs to have a schedule representing the actual schedule, and a schedule representing the preferred schedule. And a store has a schedule of hours that the store is open.

    So, we have something like:

    People:
    id name sch pref
    1 Rudy 101 102
    2 Carl 103 104

    Schedules
    id
    101
    102
    103
    104

    Intervals
    sid day start end
    101 Mon 9am 5pm
    102 Mon 6am 3pm
    102 Tue 6am 3pm
    104 Tue 9am 5pm
    104 Wed 9am 5pm


    The Schedule table has just the primary key column. The primary key in Intervals is a composite of the schedule id (a foreign key into the Schedule table) and the day, since a day of the week can have at most one interval in a schedule.

    In this example, Rudy's current schedule is just Mon 9am-5pm, but Rudy's preferred schedule is Mon and Tue 6am-3pm. Carl has no hours currently scheduled (Schedule id=103 has no rows in the Intervals table, i.e., the empty set of intervals). He wants Tue and Wed 9am-5pm.



    If you see any problems in this please let me know. Thanks for the help.

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    that looks okay, but now try to do it without that schedule table

    the biggest secret to data modelling (which is the stage you're at) is to do it without surrogate keys

    yes, it's fine to have a person table, and assign a person_id instead of using the person's name as the key

    but that is a very slippery slope, and soon enough, you find yourself creating tables with ids and you have no idea what those ids actually stand for -- remember, they are surrogates and stand for some other unique identifying key
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Aug 2008
    Posts
    19
    Quote Originally Posted by r937
    that looks okay, but now try to do it without that schedule table
    I don't see how. There's also a Store table that needs to point to schedules, as I mentioned, and other uses for this as well.

    I'm assuming it's better to re-use the schedule implementation rather than having separate tables, i.e., StoreHours for a store, and CurrentSchedule and PreferredSchedule for employees, and a few others, with all of these having the same structure. I can see how to avoid the schedule table if I have these five or so tables that all implement essentially the same data structure, differing only in the keys (store, employee, etc.). But then I've multiplied the amount of code to maintain by a factor of five, including the code to define the tables and the Python code that uses the model.



    Quote Originally Posted by r937
    yes, it's fine to have a person table, and assign a person_id instead of using the person's name as the key
    Names aren't unique, and the person_id is actually going to be an employee ID that's assigned elsewhere and used in other pre-existing tables (which is why I eventually have to switch from SQLite to MS SQL) so I hadn't really considered any other approach here.

    Quote Originally Posted by r937
    but that is a very slippery slope, and soon enough, you find yourself creating tables with ids and you have no idea what those ids actually stand for -- remember, they are surrogates and stand for some other unique identifying key
    That and wikipedia helped me to understand the idea of surrogate keys a little better, but in this case I'm not seeing an alternative, other than to have five or so tables (and the associated code to use them) that all replicate the same functionality.

  14. #14
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by r937
    but that is a very slippery slope, and soon enough, you find yourself creating tables with ids and you have no idea what those ids actually stand for -- remember, they are surrogates and stand for some other unique identifying key
    ...or you can use natural keys, and pretty soon you'll find yourself with five, six, and seven or more columns making up a composite primary key, wreaking havoc on relational integrity.
    Frankly if, "soon enough", you find yourself unable to figure out that PersonID stands for a record in the Persons table, then your Alzheimer's has probably progressed beyond the point where you should be writing code.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  15. #15
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by blindman
    ...or you can use natural keys, and pretty soon you'll find yourself with five, six, and seven or more columns making up a composite primary key, wreaking havoc on relational integrity.
    thanks for supporting me -- this is exactly the dangerous situation i was warning against

    see, when the natural key grows beyond more than a couple of columns, then two things happen: the hierarchy of table relationships becomes progressively complex, and the temptation just to toss a surrogate key at it becomes nigh irresistible

    of course, just because a surrogate key is then assigned does not somehow make the complex relationships go away -- it masquerades them

    blindman is one of these guys blessed with prescience, so he knows when he assigns a surrogate what he's doing, while remaining fully cognizant of the natural relationships inherent in the complex hierarchy of related tables

    but people new to data modelling do not have the benefit of blindman's intuition and professional experience

    whether you use a surrogate key or not (and i do, too), you must never lose sight of the logical relationships and their cardinality/optionality, which are inherent in the logical design and so easily glossed over by DBAs who just wanna build tables with GUID keys, man

    blindman is also very jealous of my logical modelling abilities, leading him to poke fun at me because i'm old, but i take it as a sign of affection -- besides, i am old

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •