Results 1 to 15 of 15
  1. #1
    Join Date
    Dec 2005
    Posts
    74

    Question Unanswered: database of field trips from a daycare (was "Help with building a query")

    I am working on a database design and since I'm new to SQL, I want to test the feasibility of the design before I commit to it. That is, I want to make sure that it is possible to build a single (and thus, fast) query to look up the data I need. The database is for a control system with lots of boring details so I'll adapt the problem and present a simple equivalent example. I'd appreciate any help with this.

    Let's say I have a database of field trips from a daycare. Every time the kids are taken out on a field trip, an entry is made in a simple table with two fields: Field Trip Configuration ID (FTC ID) and attending Student ID. For each student attending the trip there will be an entry with the appropriate FTC ID (not to be confused with the Federal Trade Commission Identification number! ) Depending on which kids are going to the trip, a banner, which will be displayed at the front of the bus, is created or reused, provided one had already been made. The FTC ID tells the teacher which banner to use. Before each trip, the teacher looks up the database by entering the set of Student IDs and if an existing configuration matches, it is shown to the teacher so they know to go get the banner with that ID. If there is no match, a new FTC ID is generated, and the teacher goes and makes a new banner with that ID.

    The above design probably seems horrendous to most of you. The trouble is in my actual structure - "students" are control and acquisition channels, and there can be many of them, with lots of possible configurations (test-sessions). The size of the database can get pretty large, so I think I'm stuck with such design. Nonetheless, I'd entertain suggestions for alternatives, as long as they don't violate the requirements of my project.

    My specific questions is: what would be a single SQL query to identify the (unique) FTC ID based on the set of Student IDs? A CREATE query for the above would only be a bonus. Thanks in advance!

    Kamen

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    As there is nothing that guarantees that the set of students will be unique for any given field trip, there may be many trips returned for any given group of students. For instance, the art classes trip to the museum could easily have exactly the same students every trip, and that could very easily be the same group as their monthly trip to the art store.

    If that is Ok, then you could use something like:
    Code:
    CREATE TABLE tFTC (
       FTCID	INT		NOT NULL
       PRIMARY KEY (FTCID)
    ,  name		VARCHAR(50)	NOT NULL
       )
    
    CREATE TABLE tKids (
       KIDID	INT		NOT NULL
       PRIMARY KEY (KIDID)
    ,  name		VARCHAR(50)	NOT NULL
       )
    
    CREATE TABLE tRoster (
       FTCID	INT		NOT NULL
       FOREIGN KEY (FTCID) REFERENCES tFTC (FTCID)
    ,  KIDID	INT		NOT NULL
       FOREIGN KEY (KIDID) REFERENCES tKids (KIDID)
       PRIMARY KEY (FTCID, KIDID)
       )
    
    INSERT tFTC (FTCID, name)
       SELECT 1, 'Hydraulic Trombone launch' UNION
       SELECT 2, 'Turbo Paint stretch' UNION
       SELECT 3, 'Despair for Dummies Demo'
    
    INSERT INTO tKids (KIDID, name)
       SELECT 1, 'Bob' UNION
       SELECT 2, 'Carol' UNION
       SELECT 3, 'Ted' UNION
       SELECT 4, 'Alice'
    
    INSERT INTO tRoster (FTCID, KIDID)
       SELECT 1, 1 UNION
       SELECT 1, 2 UNION
       SELECT 2, 3 UNION
       SELECT 2, 4 UNION
       SELECT 3, 1 UNION
       SELECT 3, 2 UNION
       SELECT 3, 3 UNION
       SELECT 3, 4
    
    SELECT FTCID
       FROM tRoster
       GROUP BY FTCID
       HAVING 2 = Count(*)
          AND 2 = Sum(CASE WHEN KIDID IN (1, 2) THEN 1 END)
    -PatP

  3. #3
    Join Date
    Dec 2005
    Posts
    74

    Thumbs up

    Wow, Pat! This is amazing - such a quick and exhaustive reply! I think I need to go learn some more SQL just to understand your example. By the way, I am reading the book "Understanding SQL" by Martin Gruber (which was laying around on a shelf in the office). It is from 1990 and reflects the ANSI SQL 89 standard. I have also been searching the 'Net for answers. Should I get a newer book?

    I knew the heart of the query would be the GROUP BY clause, but couldn't get around the limitation stated in my sources, saying that you have to use it with aggregate functions. Other than that, I am also having hard times understanding the actions of the HAVING clause in your example, partly because I'm not sure what case is being searched (student IDs 1 and 2?) And also because the CASE expression is not in my book... I'll go search the 'Net.

    Also, I think I may have mislead you to the purpose of the database. I did say it was going to be a database of field trips, while I actually am not interested in storing the actual field trips - I only need the field trip configurations, which would be unique. The reason for that, and thus the only purpose of the database, is so the teacher is able to always reuse an existing banner, and not have to make a new one that would have duplicated an existing one. Would that simplify the design you presented?

    Tanks again for the help, I really appreciate this!

    Kamen

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Sorry, today has been a really busy day, so I haven't gotten back to you until now.

    Yes, the target list of KIDID values is provided within the CASE. The HAVING clause is almost exactly like the WHERE clause, except that it is evaluated after the GROUP BY has taken affect instead of on the candidate rows projected by the JOIN operation.

    The real "gotcha" comes from the fact that you could easily have two or more identical lists of kids for completely different FTCs. For instance you could have one list of all the kids in the school, and another for the kids that would like free money... You need to allow for the fact that any given list of kids could return more than one FTC.

    -PatP

  5. #5
    Join Date
    Dec 2005
    Posts
    74
    Thanks again, Pat. Just one more thing. I won't bother you with the SQL language details that I still don't understand - I'm buying a newer book today (any recommendations?) But I want to clarify one detail about the formulation of the problem.

    Uniqueness is very important to me. If other elements of the problem definition must be changed, than that's what I have to do but I must have a unique result. The whole intent of this query is to ensure that a list of unique field trip configurations is generated. That is, if the query fails it means it's a new configuration, then create a new entry in the database. If one exists - then reuse it (for the school-bus banner). My ultimate goal is to use the configuration to look up the name of a table in the database. That is, for each configuration, there will be tons of data about each student recorded. Thus I want to create a table to store that data but I don't want too many tables created, and every time the same configuration is used (i.e., the same set of students is going to a field trip, regardless of the purpose of the trip), I can reuse the old table (i.e., keep appending data to it, rather than to a new table).

    I would formulate the real problem I am trying to design but it is quite extensive and I really do not want to abuse your patience and waste your time with my tasks. I hope the example would be relevant enough to be able to sort this out. Thanks!

    Kamen

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    If I understand your problem well enough from the analogy we're using (students and Field Trips), the unique combination of students is what defines an FTC for you, even though in the real world that isn't the case at all. As long as the unique combination of inputs is what defines a particular group, then you'll be fine using the logic (and the SQL) that I've presented.

    As far as book suggestions, a lot depends on which database engine(s) you're using. If you can find something that seems close to what you want that is written by Joe Celko, I'd recommend that. Joe can be a bit abrasive at times, and he's certainly headstrong, but he really knows SQL and can present it well. If that doesn't help, I'd tend to recommend books published by O'Reilly Press.

    -PatP

  7. #7
    Join Date
    Dec 2005
    Posts
    74
    First, thank you for the book suggestion - in mean time I had been reviewing the SQL books on Amazon and I had already set my sights on the "SQL in a Nutshell", published by O'Reilly press. Now, after your recommendation, this will be the book I'm getting. I could not find anything appropriate by Celko.

    Back to the example SQL code you wrote. Towards the end of my book there is an appendix that mentions some non-standard (as of SQL-89) operations. This got me again thinking about how I would have designed the query had I been able to use an imperative programming language (which is what I'm used to doing). I just want to give an example that would illustrate my train of thought, in order to confirm or rule out my approach. Here it is.

    Using the example you presented, we have three field trip configurations: FTCID 1, 2 and 3, represented by KIDID 1,2; 3,4; and 1,2,3,4, respectively. Let's say on the next trip only Ted and Alice will be going (KIDID 3 and 4), so the teacher wants to know whether she already has a banner or needs to make a new one. Once again, the banner for FTCID 3 would not work because this time Bob and Carol are not going. Would your query return FTCID 2 and nothing else? Most importantly, could you do something like this to find the FTCID that includes KIDID 3 and 4, but nothing else:

    Code:
    SELECT FTCID FROM tRoster WHERE KIDID = 3
    INTERSECT
    SELECT FTCID FROM tRoster WHERE KIDID = 4
    MINUS
    SELECT FTCID FROM tRoster WHERE NOT KIDID IN (3, 4)
    If this is a bad idea, would it be because of the use of the INTERSECT and MINUS (the latter is not even supported by my MS SQL 2005) or because of some sort of inefficiency?

    By the way, one of my problems is that I'm not tied to a specific DBMS. Our software product is intended for use by a wide variety of applications and it may need to interface to all sorts of DBMS. For now, I am using ODBC to connect to different databases (playing with MS Access and SQL). I am trying to stay as compatible as possible but I don't want to make major performance sacrifices.

    The main reason for suggesting the alternative above is to make sure that I'm presenting the problem correctly. A side effect would be learning - I'll take any knowledge I can get. I must say that at this time I am unable to fully understand the SELECT statement in your example - the restriction of the rows returned by the GROUP BY in the HAVNIG clause by COUNT-ing, combined with the SUM ... it's just too much actions crammed into too little "space" for me to comprehend, but I am working on it. (A little break-down would be highly appreciated but I don't want to impose.)

    Thanks again for your help - this has been very useful to me!

    Kamen

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Your imperative example helps clarify things. Sometimes I miss SmallTalk!

    Let's start by adding some decoration to the SELECT to make things a bit clearer, then I'll expand on that...
    Code:
    SELECT FTCID		-- Give me a list of FTCID values
       FROM tRoster		-- from the field trip roster
       GROUP BY FTCID	-- grouped by FTCID
       HAVING 2 = Count(*)	-- that have the correct number of total members
          AND 2 = Sum(CASE WHEN KIDID IN (1, 2) THEN 1 END)	-- and specified members
    The (1, 2) is the list of specific members that we're looking for in this case, equivalent to the kids going on a trip. The 2 we're comparing against is the expected number of possible matches (1, 2) that we could possibly have.

    Thinking in terms of sets (and sets are what relational algebra is based upon), what we're saying boils down to:

    1. Construct sets of candidate rows based on FTCID
    2. Reject any set that doesn't have the correct number of members (including any sub or super sets)
    3. Reject any set that doesn't have exactly the members specified.

    At this point, any remaining set is correct (it has only the right members, and all of the right members).

    -PatP

  9. #9
    Join Date
    Dec 2005
    Posts
    74
    Thank you very much! I am now studying this to try and completely understand it. I built a sample database on my MS SQL server and I'll be using MS Query to try and run your example. I'll let you know how it works.
    Kamen

  10. #10
    Join Date
    Dec 2005
    Posts
    74
    This is great, I now have some basic understanding of at least why the above SELECT statement does what it is supposed to do! Now, if I could only figure out why it has to be done that way... For example, why couldn't I change it in the following way:
    Code:
    SELECT FTCID
       FROM tRoster
       GROUP BY FTCID
       HAVING 2 = Count(*)
          AND (KIDID IN (1, 2))
    I know I can't do that - I tried it on my sample database - the query returned no rows. Plus you would have formulated it that way, if it could be done. I have the feeling it has to do something with the way the HAVING clause parses the expression within the sets defined by the GROUP BY, but I just can't figure out the difference.

    Either way, Pat, you have helped tremendously, and I thank you for that!
    Kamen

  11. #11
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    This gets really complicated to explain in a short, simple answer like a post. Joe Celko does the best job of this that I've ever seen in his SQL Tutorial class, where he lays out the seven steps or stages in resolving a SQL SELECT statement.

    The reason that you can't effectively control the participating row values in the HAVING clause is because that clause doesn't take effect until after the groups are rolled up... There aren't any individual values anymore!

    This starts to get even more complex as you realize that the code that I posted dances around a number of logical problems by referencing a CASE expression in the HAVING clause, which actually gets instantiated at the atomic row level, but never used until it is grouped (via the Sum expression), and never explicitly touched by my code until the HAVING is evaluated. This is perfectly valid SQL, and it makes sense both to read the statement and to think about how it gets constructed, but without understanding the SQL staging it is murderous to work out exactly how the blasted thing works (as you've noticed).

    This leads directly into one of those opportunities for me to climb onto my pulpit and preach, but right now I don't have the time to engage in that (I need to sneak off for lunch). Maybe I'll launch into that diatribe this evening (I'll bet you can hardly wait)!

    -PatP

  12. #12
    Join Date
    Dec 2005
    Posts
    74
    Hosanna!

  13. #13
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Since SQL2 (1992), "MINUS" was officially renamed "EXCEPT".
    Most of the time, though (and certainly in your example), it's possible (and better) to use an "AND" condition in WHERE (or HAVING) instead of INTERSECT, and use "AND NOT ..." instead of EXCEPT.
    Thus:
    Code:
    SELECT FTCID FROM tRoster
    WHERE  KIDID = 3 AND KIDID = 4 AND KIDID IN (3, 4)
    Of course, this does not make sense, and it's not equivalent to your INTERSECT and MINUS, because of the presence of duplicate FTCIDs.
    The problem being that you try to think in groups, while a WHERE condition thinks in rows.
    There is no single row containing both KIDID=3 and KIDID=4.
    There is only (possibly) a group (defined by that fact that all rows have the same FTCID) with exactly two rows, one with KIDID=3 and one with KIDID=4.

    Thus your INTERSECT and EXCEPT should operate on the GROUP BY level, or equivalently you could use three HAVING conditions, exactly what Pat Phelan came up with:
    Code:
    SELECT FTCID FROM tRoster GROUP BY FTCID
    HAVING  contains(KIDID,3) AND contains(KIDID,4)
       AND  NOT contains(KIDID, anything other than 3, 4)
    Unfortunately there is no aggregate function "contains", we have to implement it using one (or more) of the available functions COUNT, SUM, MIN, MAX or AVG. That's exactly what Pat did.

    In the INTERSECT and EXCEPT way, some of the HAVING conditions, viz. those which can be evalutated on row level (so do not need to combine information from multiple rows), can (and should) be moved to WHERE conditions, though.
    Here is a possible INTERSECT & EXCEPT equivalent:
    Code:
    SELECT FTCID FROM tRoster WHERE KIDID = 3
    INTERSECT
    SELECT FTCID FROM tRoster WHERE KIDID = 4
    EXCEPT
    SELECT FTCID FROM tRoster GROUP BY FTCID HAVING COUNT(*) > 2
    which is of course equivalent to
    Code:
    SELECT FTCID FROM tRoster WHERE KIDID = 3
    INTERSECT
    SELECT FTCID FROM tRoster WHERE KIDID = 4
    INTERSECT
    SELECT FTCID FROM tRoster GROUP BY FTCID HAVING COUNT(*) = 2
    and we're essentially back to Pat's solution, after combining the three HAVING conditions into one single, with ANDs.

    Strictly speaking, you would have needed "SELECT DISTINCT FTCID" in the first two queries, but the use of UNION, INTERSECT and EXCEPT automatically remove duplicates.
    A slightly more performant equivalent query would thus be:
    Code:
    SELECT DISTINCT FTCID FROM tRoster WHERE KIDID = 3
    INTERSECT ALL
    SELECT DISTINCT FTCID FROM tRoster WHERE KIDID = 4
    EXCEPT ALL
    SELECT FTCID FROM tRoster GROUP BY FTCID HAVING COUNT(*) > 2
    Well, in your situation, you probably have the guarantee that there are no duplicate FTCIDs for the same KIDID, so you may discard the two DISTINCTs altogether, giving an even more performant solution:
    Code:
    SELECT FTCID FROM tRoster WHERE KIDID = 3
    INTERSECT ALL
    SELECT FTCID FROM tRoster WHERE KIDID = 4
    INTERSECT ALL
    SELECT FTCID FROM tRoster GROUP BY FTCID HAVING COUNT(*) = 2
    Last edited by Peter.Vanroose; 02-19-06 at 08:03.
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  14. #14
    Join Date
    Feb 2006
    Posts
    113
    Peter,

    Thank you very much for that clear, detailed explanation. It is a huge help to me - an Excel user doing some SQL in MS Query. I've joined this forum to learn more SQL.

    Along the way it has explained why my use of UNION to join tables from different data was dropping out duplicates. When I identified the problem I had corrected it by changing to "UNION ALL" without knowing why. I had originally copied the "UNION ALL" from whereever I'd first seen it but in one usage had only typed "UNION". I won't make that mistake again!

    Appreciate your many posts to help newbies.

    regards,
    Fazza

  15. #15
    Join Date
    Dec 2005
    Posts
    74
    An update to this. I am almost finished with the implementation of the program but I ran into a problem executing the SQL query that Pat suggested against a MS Access *.mdb file using ODBC (latest driver). It works fine with my SQL Express 2005 server using either SQL Client or ODBC, but it gives me a syntax error with Access (more info here ). I need to be able to provide multiple DBMS compatibility and this is very frustrating, considering I had to already make exceptions for Access (such as accounting for non-standard data types, e.g., "DOUBLE" for "FLOAT", etc.)
    Kamen

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •