Page 1 of 2 12 LastLast
Results 1 to 15 of 16
  1. #1
    Join Date
    May 2009
    Posts
    8

    any advantage to minimize number of tables?

    Hey all,

    I come mainly from an embedded background. I took database design courses at university getting my engineering degree, so most my db knowledge is fairly academic.

    Anyways, I'm working on a project interacting with a database. I find it's design a bit weird. It essentially stores a list of counters for tests.

    Test1
    counter1 at time1
    counter2 at time1
    ...
    counterx at timen

    Test2
    counter1 at time1
    counter1 at time2
    ...
    countery at timen

    Now, the way the DB is arranged looks like they tried to minimize the number of tables. So the information is stored in one big 'counter' table with the following columns.

    counterid, test, time, counter, value
    1, Test1, time1, counter1, value1
    2, Test1, time1, counter2, value2
    ---

    I would think this would be better represented by:

    Test Table (1 table, with 1 entry per test):
    testid, test name, counterTable

    Test Counter Table (1 table per test, with one entry per counter):
    counter, CounterValuesTableID

    CounterValuesTable (1 table for every test for every counter)
    time, value

    ---

    Is there any reason you would want to have all the counters and values in one giant table? Seems to me it would totally muck up performance. Is it perhaps simpler to view the data from various tools if there are fewer tables?

    Also, what is the best tool to view/manipulate databases from a programmers point of view. Right now, I just use visual studio 2008 to connect to the DB and view/edit its contents.

    Thanks all,

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    one table per test is really poor design, and one table per test per counter would be a total disaster

    performance on large tables is optimized by the use of indexes that are appropriate to the nature of the queries to be performed

    visual studio works on only a small number of database systems (where small means 1), and there's a good argument for a front end tool that works with multiple database systems

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    May 2009
    Posts
    8
    why would having a separate table per test be really bad?

    Isn't that what db normalization (if memory serves me right) is all about?
    Am I wrong to think of DBs as giant hash tables?

    If I'm go to say get the counter values for a particular test, I only search the relevant data if everything is in its own table.

    If everything is in one table. I have to search through the entire table looking to match the testID and the counterID.

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Thinking from an embedded perspective, if you have to send bytes out ports and there are only 500 valid ports, how much sense would it make to code separate routines to send all 256 possible byte values to all 500 ports? You'd only have 128000 routines to manage, and each one would be exactly what you need.

    One aspect of normalization works like subroutine (or macro) parameterization. By factoring out the common attributes like counter, test, and time you have one table that can store all of your tests instead of potentially thousands of tables. This makes the system much easier to both understand and manage, as well as eliminating lots and lots of wasted space.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    I don't really get your examples but then I don't know your data. One point that might be relevant is The Principle of Orthogonal Design (see "Date on Database" by Chris Date, published by Apress). This is the idea that (roughly speaking) it ought to be impossible for the same tuple to appear in more than one place in the database. It's a practice that can minimise the amount of logic and processing required, even though that's not Date's main argument in its favour.

    Like other design rules this is "formalised common sense" I suppose. On occasions I have consciously applied just such a principle in design, even before I knew of it as a formal rule by any name.

  6. #6
    Join Date
    May 2009
    Posts
    8
    dportas and pat,

    Yes, that is what I was thinking. DB normalization would be good.
    The DB would be better if it were normalized... having a table per test... at the very least.

    This is in contrast to how it currently is, with one giant table with lots of duplicated data.

    What I am curious about is why r937 suggests the use of one large table.
    "one table per test is really poor design, and one table per test per counter would be a total disaster"

    This translates to: "normalization would be a total disaster"

    Unless my examples were misunderstood.

  7. #7
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by scamper_22
    What I am curious about is why r937 suggests the use of one large table.
    "one table per test is really poor design, and one table per test per counter would be a total disaster"

    This translates to: "normalization would be a total disaster"
    More tables doesn't necessarily mean "more normalized" and one table doesn't necessarily mean "less normalized". The regular Normal Forms have nothing to say about having possibly redundant tables. For example if the schema:

    R{A,B}

    is in 5NF then so could be the schema:

    R1{A,B}, R2{A,B}, R3{A,B}, ... etc

    You can't say that one is "more normalized" than the other.

    However, according to Date, if it's the case that my second example allows the same tuple to appear in more than one relation then the former design with just ONE relation is a "better" one. If you don't follow that rule then you may be forced to implement the same logic in three or more places.

    I'm not assuming anything about which design is better in your case. Just my 0.02.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Hold your horses there Newt! I might not have made myself clear.

    My analogy to embedded programming was if you had 500 ports that you might possibly need to send bytes. Having the 128000 separate routines (one for each byte value going to each of the 500 ports) would be like having separate tables for each test. One routine with two arguments (port number and byte value) would be like having the single table with columns to describe the applicable data for each row.

    There is no way on earth that I'd code and manage 128000 routines or 128000 tables when just one of them with appropriate atributes (or arguments) would work. I'm way too lazy to tackle that just for the amusement value!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    May 2009
    Posts
    8
    alright, let me provide a bit of a clearer example. It's a simplified version of the one above.

    There are lots of test. Each test contains a like of values at a time.

    Suppose we have the following table.


    testID, DateTime, value
    1, 12:03, 100
    1, 12:04, 101
    1, 12:05, 102
    ...
    2, 1:30, 200
    2, 1:45, 300
    ..

    So suppose I often do a query to get all the values for a test.
    With all this in one table, it would mean it would have to search through this giant table finding all the rows with a matching testID.

    If however it was in different tables... for example... one table per test.

    TestTable
    TestID, TestTableName
    1, table1
    2, table2

    Table1
    dateTime, value
    12:03, 100
    12:04, 101
    12:05, 102
    ...

    Table2
    1:30, 200
    1:45, 300

    This way, It would be much easier to get the values for a particular test... they are all in one table for the test.

    Does this make the question clearer?

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by scamper_22
    With all this in one table, it would mean it would have to search through this giant table finding all the rows with a matching testID.
    no, it does not mean that

    you are not familiar with database indexes?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    May 2009
    Posts
    8
    No I'm familiar with indexes... but why would you rely on them instead of just organizing the data in a logical manner

    I'd imagine backing the index would be a another hidden hash or table.

    Is it basically a consensus that in general, this is how these sorts of databases are designed... big tables relying on indexes?

    I don't have an issue with it.. .just found it curious. It would save me some headaches if I had to design it. Instinctively, I would have designed as as separate tables :P

  12. #12
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by scamper_22
    No I'm familiar with indexes
    Be carefull when designing your index - at first glance it looks obvious to have a compound index on testid and DateTime but if you do only have 2 values for testid then the index will never be used. Most optimisers will decide that because using testid will only reduce the search by 50% then it's not worth using the index at all. If you have lots of test ids then this point can be ignored. Also DateTime is likely to be a keyword so a bad choice for a field name.

    Quote Originally Posted by scamper_22
    but why would you rely on them instead of just organizing the data in a logical manner
    What would you do if someone came along with another 5 tests? or another 25 test? Having separate tables would mean you'd need to create new tables each time, the code (to insert, search and show the data) would probably need to be altered to be able to use these new tables. All this effort for no performance gain at all.

  13. #13
    Join Date
    May 2009
    Posts
    8
    "What would you do if someone came along with another 5 tests? "

    you add new tables... that's a given. I don't see the code being that complicated.

    On a new test, you just create a new sql table programmatically. One extra step in the creation.

    The rest of the code would not change. Searching, showing data... why would they change at all?

    The table names for each test would be stored in a main table
    testid, CounterTable
    Last edited by scamper_22; 05-27-09 at 19:23.

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by scamper_22
    "What would you do if someone came along with another 5 tests? "

    you add new tables... that's a given.
    but that's a poor design

    consider the query for finding the average of several tests -- the more tests, the clumsier this query
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by scamper_22
    "What would you do if someone came along with another 5 tests? "

    you add new tables... that's a given. I don't see the code being that complicated.

    On a new test, you just create a new sql table programmatically. One extra step in the creation.

    The rest of the code would not change. Searching, showing data... why would they change at all?

    The table names for each test would be stored in a main table
    testid, CounterTable
    So your table lookup table would be a little like an manually coded index ie you lookup the key value, find the subset of data then read the data. Not bad apart from the fact you'd need to :
    • code all this yourself
    • it would be slower than the inbuilt indexes
    • you'd have to maintain this new table
    • you'd end up with more complex code throughout your application (for instance try grabbing all the test results for today across 25 tables)
    and I suppose after doing all this you'd get almost the same functionality as using normal indexes which are available to you for no effort at all!

    Out of curiosity would you use indexes in your proposed separate test tables or would you split these tables again into perhaps sub-tables for each hour/minute/second ?

    Mike

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •