Page 1 of 2 12 LastLast
Results 1 to 15 of 26
  1. #1
    Join Date
    Sep 2003
    Location
    The extremely Royal borough of Kensington, London
    Posts
    778
    I recently noticed a website that stored information about user's accross various tables. Ie. As not every user would have a homepage, email and other various attributes many-many tables were created to avoid a majority of null values.

    When information about a specific user was requested, 10 queries were performed. 10 table joins for a single query or 10 unions. This I presume is fine, providing the tables do not become very large?

    Would it not be better in a web environment to perform a series of individual queries per table based on the user_id?

    However, the cost of joining outweighs a majority of nulls in a single table.
    Last edited by r123456; 12-18-03 at 08:14.
    Bessie Braddock: Winston, you are drunk!
    Churchill: And Madam, you are ugly. And tomorrow, I'll be sober, and you will still be ugly.

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    I guess it depends on the specific DBMS. For Oracle, a single table with lots of nulls in it would surely be the fastest approach - the nulls occupy little or no space in the table. A typical indexed user record lookup would require only a handful of reads - certainly less than 10!

    Why might it be better to read the records one at a time than to join them in a single query?

  3. #3
    Join Date
    Sep 2003
    Location
    The extremely Royal borough of Kensington, London
    Posts
    778
    Why might it be better to read the records one at a time than to join them in a single query?
    I no longer believe it would be quicker, as new queries would need to be run and passed to the DBMS with their result set returned.

    Would the optional user information fields in the DBforum user profiles such as website, occupation etc, be stored in seperate many-many tables.
    Last edited by r123456; 12-18-03 at 11:48.
    Bessie Braddock: Winston, you are drunk!
    Churchill: And Madam, you are ugly. And tomorrow, I'll be sober, and you will still be ugly.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    you would store them in separate tables only if you were (a) afraid of nulls, or (b) a fifth-normal-form junkie

    and you have to be really really weird to believe that a database has to be normalized all the way to 5NF

    i'd go with one table

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Sep 2003
    Location
    The extremely Royal borough of Kensington, London
    Posts
    778
    Given that many people on this site do not complete the majority of their personal profile fields does this imply a very large collection of nulls. Personally if I could predict that the majority of values for a given set of columns would be null, I would create a many-many or is this still not an issue, given that none of these columns would be foreign keys.

    I visited a website earlier to search for a member, only to be alerted that 11 queries were executed. This appeared rather strange as I searched simply for a member.
    Last edited by r123456; 12-18-03 at 12:24.
    Bessie Braddock: Winston, you are drunk!
    Churchill: And Madam, you are ugly. And tomorrow, I'll be sober, and you will still be ugly.

  6. #6
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Yes, I would expect to find many nulls in the personal profile table. Given that nulls are a fact of life in a SQL database, whether there are few or many nulls seems moot. I have no objection to nullable foreign keys, either.

    I would imagine that of the 11 queries, 10 were to get additional "many to one" details for the person, e.g. history of questions asked, list of favourite foods or whatever. If they were to get additional "one to one" details such as email address, date of birth then I would think the application was poorly coded.

    One of the issues with SQL DBMSs is that to a certain extent, logical design and physical design are tangled. While it may be logically desirable to split off nullable columns into separate tables with NOT NULL columns (as relational purists would like), physically you get better performance with the single table. And the SQL is easier to write too!

  7. #7
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    One table and Indexes, lots of indexes.

  8. #8
    Join Date
    Dec 2003
    Location
    Toronto
    Posts
    15
    What about data model flexibility?
    What if the business is asking for a new client attribute?
    What if the business is asking for morphing a certain attribute - let's say USERTYPE - into few distinct attributes like INDUSTRY, GEOGRAPHY, etc?
    Are you gonna delete and add columns?
    What about all the apps using this database? You should re-engineer all the stored procedures, triggers, etc?

    What if you are serving an OO application and the BLL (or DOM) is asking for the persistence of such objects like Address of a User?

    Having one table with many nulls is fast, easy and straightforward, except it is not offering flexibility.

  9. #9
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Single tables for queries are ultimately the fastest thus the rationale behind data warehousing. If this database is to be transactional or undergoing significant change a more normalized approach is preferable.

  10. #10
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Originally posted by cezarm
    What about data model flexibility?
    What if the business is asking for a new client attribute?
    What if the business is asking for morphing a certain attribute - let's say USERTYPE - into few distinct attributes like INDUSTRY, GEOGRAPHY, etc?
    Are you gonna delete and add columns?
    What about all the apps using this database? You should re-engineer all the stored procedures, triggers, etc?

    What if you are serving an OO application and the BLL (or DOM) is asking for the persistence of such objects like Address of a User?

    Having one table with many nulls is fast, easy and straightforward, except it is not offering flexibility.
    Flexibility is overrated, especially by OO programmers (and quite how flexible is the OO code anyway?)

    If the business wants a new attribute, we can add one in a jiffy. Of course, any change may impact stored procedures, triggers etc. Presumably it will also impact the OO application also? Or are you advocating the classic "thing, property and thing_property" tables that make the database as flexible as you like and to hell with integrity and queryability (is that a word?)

    I don't understand the point about persistence of Address. Obviously, we will have address attributes or a separate address table - where's the difficulty here?

    Come to that, what is it that you are proposing instead? A table per attribute, or is it that "thing" table I mentioned above?

  11. #11
    Join Date
    Dec 2003
    Location
    Toronto
    Posts
    15
    Originally posted by andrewst
    Flexibility is overrated, especially by OO programmers (and quite how flexible is the OO code anyway?)

    If the business wants a new attribute, we can add one in a jiffy. Of course, any change may impact stored procedures, triggers etc. Presumably it will also impact the OO application also? Or are you advocating the classic "thing, property and thing_property" tables that make the database as flexible as you like and to hell with integrity and queryability (is that a word?)

    I don't understand the point about persistence of Address. Obviously, we will have address attributes or a separate address table - where's the difficulty here?

    Come to that, what is it that you are proposing instead? A table per attribute, or is it that "thing" table I mentioned above?
    First of all, flexibility is a major issue in any serious application. If you think that you'll have to maintain and support further development of your app for a long period of time (let's say more then 1 year), then a flexible data model can save you a lot of pain.

    As I guess you already know, business is usual fuzzy, unclear and have the very bad habit to change it's mind without notice.

    To be more clear, I'll elaborate a little bit the story with address persistence.

    At the first release the business is defining a customer by being a person with a single address containing fields like Street, City, Province/State, ZIP/PostalCode, Country, Region, etc.
    You already may have doubts that the business model is somehow wrong, but you are told that this is the reality and you have to keep things simple, understandable and clean. Very nice!

    At the second release, the business already learned that a customer may have more then one address. In fact it has at least two of them, namely mailing address and invoice address....
    More then that, they would like to have some reports on web traffic analysis based on geographical information contained in the Address entity.
    Now, in order to do this, it's obvious that you don't have just a single table for Address but rather you have a set of entities like Country, State, City, etc.

    At the third release the business is becoming really smart and it's starting to ask himself what is a customer. It is a person? It is a company? Same person can act with multiple roles? If yes, what about the roles of a company?
    For example, the business may discover that it's model is not a business 2 consumer but rather a business 2 business. It learns that a company can be very well a customer as a person is.
    Then, they are thinking to build a more abstract entity called Business Party which can be either a Person or a Company or whatever else they can imagine (for example the IT Department of a certain company).

    Now this Business Party entity will need at least two or three addresses: mailing address, shipping address and invoiciong address.
    How you gonna deal with this?

    In the mean time the OO architects and developers are asking for a data model who is mapping with their Domain Object model.
    If the architecture of an OO application is good, than that app is quite fleible and changes can be done easy and fast.
    Now, because they are so fast, the OO guys are asking the same thing on your side. It is not acceptable for them to wait for database development....

    Are all the above familiar? If not, I guess a single table for Customer is the most desirable approach. Thinks should stay simple if not required otherwise.

    So, "thing, property and thing_property" is a good thing if you apply it with moderation.
    I don't understand why you are saying that integrity is affected.
    Yes, is true, the database will not be so "querable", but it offer greater flexibility and will guarantee a much better data consistency.

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    yes, that sounds familiar, but the only problem i have is that it takes three business releases to recognize the problem

    a competent data modeller would have uncovered the address issues in the very first modelling session with the business users

    the modeller cannot be meek

    the modeller must push the design flexibility at every stage, must ask the right questions, and must, above all, not let the DBAs just collapse everything into one table for convenience ("oh, but we always denormalize for performace, right?")


    rudy

  13. #13
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Just to clear up one possible point of confusion: I am not advocating shoe-horning everything about a customer into one table regardless. My original point was that any data that definitely has a (possibly optional) one-to-one relationship with the customer should normally go into the customer table as a nullable column. Like maybe middle name.

    Issues like those you bring up about addresses can be foreseen in advance by a good analyst. But in any case, exactly what table design are you proposing for customers and addresses that enables one to evolve from a "quick and simple" approach to a more complex one with one-to-many relationships, extra entities and subtypes, without changing the database design? Please elaborate!

    If it is the "thing and property" approach, then my objection about integrity comes into the picture. If your customer_property table looks like this:

    customer_id property_type property_value
    123 Date of birth 01-Jan-1980
    123 Country UK
    123 Sex M

    How do you ensure that only valid entries are made? i.e. that Date of Birth is a date, Country is from the set of countries, and Sex is M or F? Are you going to do it all in the application?

    Sorry, but application developers generally know little about database design and integrity, and should stick to building pretty front ends ;-) Treating the database as a dumb bucket is a bad idea!

  14. #14
    Join Date
    Dec 2003
    Location
    Toronto
    Posts
    15
    Yes, you are absolutely right.
    I've tried to keep the example simple.
    Of course, you should recognise all these things from the first iteration.

    But what if you are inheriting a legacy database with hundreds of stored procedures called in myriads of places in legacy apps with the source code written in ancient Greek?

    And what if the business is a multiheaded hydra with different opinions and with a poor project management?

    Then, you know what you have to do, but you can't do it right away.

    Then the battle you are speaking about is quite difficult and you have to progress in small steps.
    But you must have the big picture in mind, even nobody is understanding what the hack are you speaking about...

    I guess it's a matter of available energy, patience and willingness to make good against the will of the patient...

    More then that, implementing such a model from the first step can be a killing pill, as nobody will understand it and will take a great effort to accommodate with....

  15. #15
    Join Date
    Dec 2003
    Location
    Toronto
    Posts
    15
    Originally posted by andrewst
    Just to clear up one possible point of confusion: I am not advocating shoe-horning everything about a customer into one table regardless. My original point was that any data that definitely has a (possibly optional) one-to-one relationship with the customer should normally go into the customer table as a nullable column. Like maybe middle name.

    Issues like those you bring up about addresses can be foreseen in advance by a good analyst. But in any case, exactly what table design are you proposing for customers and addresses that enables one to evolve from a "quick and simple" approach to a more complex one with one-to-many relationships, extra entities and subtypes, without changing the database design? Please elaborate!

    If it is the "thing and property" approach, then my objection about integrity comes into the picture. If your customer_property table looks like this:

    customer_id property_type property_value
    123 Date of birth 01-Jan-1980
    123 Country UK
    123 Sex M

    How do you ensure that only valid entries are made? i.e. that Date of Birth is a date, Country is from the set of countries, and Sex is M or F? Are you going to do it all in the application?

    Sorry, but application developers generally know little about database design and integrity, and should stick to building pretty front ends ;-) Treating the database as a dumb bucket is a bad idea!
    Here it's a matter of trade off...
    For entities like Customer, Business Party, Invoice, Order, etc, such kind of "thing and property" approach may be dangerous.

    But when you are modeling processes like "ordering process" or "website user roles" or similar things where the information is quite complex and the rules (verbs) are supposed to change quite frequently, a "thing and property" approach is a desirable one.

    About the front & middle tier developers:
    They should not have to deal with this complexity on their end. If you use a database app which has a good support for stored procedures or similar business logic encapsulation on database level, then this is the place where the stuff should go.

    If you are going on the path of "things and property" you should be the only responsible for data integrity and should have the proper API and validations in place.

    So, it's a matter of trade off....

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •