Page 1 of 3 123 LastLast
Results 1 to 15 of 36
  1. #1
    Join Date
    Dec 2004
    Posts
    22

    Database Design Question (more columns, or more rows?)

    I'm sure this is asked often but I can't find a similar enough post.

    If I have a table that needs to include (for each row) some number of boolean flags, is it better to:

    -Add columns for each possible flag used to store the boolean values (probably around 10 values, give or take a few)

    -Add another table that has a foreign key to the original table, which stores the boolean statuses-- this may be up to ~10 rows per item entered into the original table, if all of the flags are set to true. (A false flag wouldn't have any entry in the secondary table at all.)

    From an ease of use/design standpoint (and possibly performance? - Oracle db), it seems adding columns is easily the best choice, but how many columns is too many? There's also the fact that I may need to add/drop columns if I ever need to change these boolean flags. I don't feel this is a big deal, but am I wrong about that?

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    the design with multiple columns on the row is easier to write SQL for

    (for me, this is sufficient )

    the design with multiple rows is relationally purer (5th or 6th normal form; i'm sure dportas will be along in a moment to clarify), provided that these multiple rows are in separate tables!!

    if they're in the same table, then you need an additional "metadata" column containing information about which attribute the column is for, and this now resembles the ill-conceived EAV model, which you should avoid
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jan 2003
    Location
    British Columbia
    Posts
    44
    Are you familiar with Bitwise Operations? Since we are talking about "Flags", I would simply give each flag an enumerated value, suitable for bitwise operations (each value a power of 2). Then store them all in one integer column in the "original table".

  4. #4
    Join Date
    Dec 2004
    Posts
    22
    Wouldn't using bit flags make accessing that data much more complicated? Especially if I ever integrate with some reporting tool?

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Quote Originally Posted by Garoad
    Wouldn't using bit flags make accessing that data much more complicated? Especially if I ever integrate with some reporting tool?
    Yes, using bit masks instead of booleans makes nearly everything more complicated, but it can save you several bytes per row. If you are paying over $1000 per megabyte of storage then this is important, but if you're paying those rates I'd really love to chat with you about some real deals that I can make you on hardware!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  6. #6
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by Garoad
    Wouldn't using bit flags make accessing that data much more complicated? Especially if I ever integrate with some reporting tool?
    Yes it would. For 10 attributes put them in separate columns.

    A reason to use binary might be if you have some application that consumes the data that way and the data size is such that it isn't efficient to store and manipulate individual values as columns or rows.

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by brucevde
    Are you familiar with Bitwise Operations? Since we are talking about "Flags", I would simply give each flag an enumerated value, suitable for bitwise operations (each value a power of 2). Then store them all in one integer column in the "original table".
    okay, here's a quick test to see if this approach is practical

    let's say that you have assigned separate meanings to each of 10 different flags as follows --
    Code:
    n  2^n   meaning    
    0    1   smart      
    1    2   thin       
    2    4   musical    
    3    8   pretty     
    4   16   artistic   
    5   32   sexy       
    6   64   healthy    
    7  128   rich       
    8  256   fecund     
    9  512   available
    let's say that these enumerated values are stored in an integer column, so that, for example, the value 937 translates as: smart, pretty, sexy, rich, fecund, and available

    now here's the quick test...

    what is the SQL to find all values that represent "smart or musical or artistic, as well as available"?

    if the flags were stored as separate columns with the appropriate names, and with 0 and 1 values (thus leaving aside whether the database system supports actual boolean values TRUE and FALSE), the SQL would be ...
    Code:
    WHERE ( smart=1 OR musical=1 OR artistic=1) AND available=1
    using the enumerated values in an integer column, what would the SQL be?

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Jan 2003
    Location
    British Columbia
    Posts
    44
    There is not much difference in the sql statement. The idea is the same, create expressions in the where clause that evaluate to true or false.

    The following will find all Available Smart, Available Musical, Available Artistic

    Code:
    Where Attributes & 512 = 512 And (Attributes & 1 = 1 Or Attributes & 4 = 4 Or Attributes & 16 = 16)
    Both the following statements will find all Available Smart Musical Or Available Smart Artistic

    Code:
    Where Attributes & 512 = 512 And (Attributes & 5 = 5 Or Attributes & 17 = 17)
    Where Attributes & 517 = 517 OR Attributes & 529 = 529
    The following will find all Available Smart Musical Artistic

    Code:
    Where Attributes & 533 = 533
    The & is the Bitwise And operator in SQL Server.
    Last edited by brucevde; 08-21-09 at 15:24.

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by brucevde
    There is not much difference in the sql statement.
    except for ...

    1) readability, i.e. self-documentation

    2) portability (not everybody is on sql server, eh)

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  10. #10
    Join Date
    Jan 2003
    Location
    British Columbia
    Posts
    44
    1) Depends on the person doing the reading.
    2) Pffttt... Obviously, you would use the correct syntax for whichever database system you are using. I have worked for 20 years in this biz (the last 10 as a contractor) and have never had to "port" any code from one db to another - meaning, in my experience, companies rarely if ever switch from one database system to another.

  11. #11
    Join Date
    Mar 2009
    Location
    Sydney, Australia
    Posts
    258
    Garoad

    So have you gotten a clear answer to your question ?

    The confusion is because you already have it in your mind that the identification of each of the optional attributes is boolean, and thus needs only {0|1} to represent it. This interferes with the ability to model the data independently. Let's call the main table Person, and the attributes you describe Description. The determining issue is, the Person table can have one-or-more Descriptions; each Description can apply to one-or-more Persons. An ordinary associative table is required: PersonDescription.

    From both ease-of-use and and ease-of-maintenance (not only the "need to change these boolean flags") perspectives, as well as maintaining the spirit of the Relational Model, the answer is:
    - Use a separate "lookup" table for each boolean flag, with it's expanded meaning: Description.
    - use an associative table (child of the main table and the Lookup table); this will store one row for each intersection, not "status" as you put it, but existence: PersonDescription.
    - adding, changing or dropping Descriptions is now dead-set easy.
    Regards
    Derek Asirvadem (Formerly DerekA)
    Information Architect / Senior Sybase DBA
    Copyright 2009 Software Gems Pty Ltd

    I answer questions from the Original Poster only. If you have a genuine question, as the moderators have requested, start a new thread.

    http://www.softwaregems.com.au

  12. #12
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    I prefer Derek's method too. Often when there are a number of booleans like this, the column names are really drawn from values from the same domain. If they are mutually exclusive, then a single column is required in the person table. If they are m:m like this then have a table of PersonCharacteristics (just my perferred name for PersonDescription) linked to a table of Characteristics.

    To answer Rudy's hypothitical question with such a design, I would probably use CASE and SUM, but I'll admit I've not thought too hard about the best method.

    Nice to see you back in 'Concepts too Derek
    Testimonial:
    pootle flump
    ur codings are working excelent.

  13. #13
    Join Date
    Mar 2009
    Location
    Sydney, Australia
    Posts
    258
    Thanks, pootle_flump. Hopefully the site has matured and there is more tolerance of people with a different skill set and a different approach.

    Yeah. The problem is actually a common mistake, I see it all the time. People focus on the use; the solution; how to get the data they want on a report, and they are thus encumbered in dealing with just the data; modeling just the data. While the treatment and use (extraction, expansion) is not to be divorced entirely from the data, an absolute demand to the process is, the modeling must be independent of the treatment and use. It is not a paradox that once the modeling has been done correctly, the treatment/use is actually easy; in my experience, when the treatment/use is difficult, that is in fact simple evidence that the data has not been normalised. People just get the sequence mixed up.

    The Person::Description relation is many-to-many, period. The Description has to be a entity. It is not an attribute of the Person.PK; it is not 1:1 with the Person.PK. Therefore it should not be in the Person table.

    A single column (if they are all mutually exclusive or not) is a treatment before the normalisation (and thus a typical unnormalised implementation). Such an implementation will be unnecessarily fixed (8 or 16 bits; or a fixed set of single columns). When the number of Descriptions per Person exceeds that (and data does grow), you need a rebuild/remodel.

    In fact, the assumption that they are bits is an encumbrance. It is a fixed idea from a specific treatment (I just need a bit to indicate which of each Characteristic is "on" for each Person). That might be true for a particular report, but it has nothing to do with the data, or the other reports. In modeling terms, they are not bits or varchars; they are Existence or Occurrence in an associative table. Which is the ordinary result of resolving the many-to-many relation. And as an ordinary associative table, there are no limits to size or range.

    In the early stages (I will purposely refrain from using standard terms such as 1NF, 3NF, BCNF, because most posters here have quite different ideas as to what the terms mean, and I don't want to get side-tracked in such discussions), what the key of the Description is, and how (not if) it is associated, do not matter. It is just an entity, like Person. Both can be assumed to have proper (or substandard) Relational keys before the modeling process is complete.

    And sure, it is not complete, there is more to do. We do not want repeating groups of Description text in the associative table (although some would jump to point out that it is "faster", that is, again, reversing the prescribed order, because those are physical, implementation considerations). Before the process is complete, yes, both Description and PersonDescription will have good PKs, and no repeating groups.

    Now for the treatment.
    1 For the Person GUI, Description is a drop-down, with multiple choice enabled.
    2 For the reports for people who accept that the Person:Description relation is 1-to-many, for each Person row, there are multiple Description rows. Much like Invoice and LineItem.
    3 For that special (bit-centric) report. Here you are limited by the specific platform. In Sybase and DB2 it is dead-set easy (MS does different things in each version, so there you are limited to platform-and-version): Use subqueries to populate your fixed set of columns, or to turn on your fixed set of bits, or bits-in-an-integer. And be ready to change code whenever the fixed set changes. Note, it is the report/code that has to change, because it has a fixed mindset; not the database.
    4 For all normal reports via a report tool, these can usually handle pivots and concatenating columns, so wherever Person is a row, Description can be a column (in the report, not the database). It can alternately be a concatenated (derived) single column: "FatBaldGood".

    No one is reduced to figuring out IDs or bit flags or bit values in a single integer. To find Persons who are sexy, one merely:
    Code:
    SELECT  LastName,
            FirstName,
            PhoneNo
        FROM  Description d,
              PersonDescription pd,
              Person p
        WHERE d.Description   = "Sexy"
        AND   d.DescriptionId = pd.DescriptionId
        AND   pd.PersonId     = p.PersonId
    Last, treatment/rendition is always last. Yes, you could certainly get clever, and use enumerated values for DescriptionId and thus simplify the sequel for [3] and [4] by using SUM(DescriptionId). That won't get you the expanded text, but it will correctly load a single integer or bigint column. The limitation is then 16 or 32 Description entries. Still no change to the correct data model.
    Attached Files Attached Files
    Regards
    Derek Asirvadem (Formerly DerekA)
    Information Architect / Senior Sybase DBA
    Copyright 2009 Software Gems Pty Ltd

    I answer questions from the Original Poster only. If you have a genuine question, as the moderators have requested, start a new thread.

    http://www.softwaregems.com.au

  14. #14
    Join Date
    Mar 2009
    Location
    Sydney, Australia
    Posts
    258
    The & is the Bitwise And operator in SQL Server.
    And Sybase (which preceded it) and DB2. IIRC (it has been a while) Oracle does the same but the notation is different. They are extensions to ANSI SQL, but essential to high performance (although terse) coding, every vendor has them. No comment on Rudy, but I would say any SQL coder (not user) having difficulty with powers-of-two or bitwise operations, is seriously hindered.
    using the enumerated values in an integer column, what would the SQL be ?
    Bruce has answered the question, at the level of the question.

    But for the rest of the readers, there's more; what is implied in the question. The point is, the value of the ID column, or of the bit, or of the summed-bit column, is completely irrelevant; in real code in a normalised database, one does not use it or care what the value is; much as one does not and should not, care what the PersonId for Fred Bloggs is. Additionally, there are usually more than one ways to do the same thing in Sequel, depending on ones proficiency.

    Here's another method for your consideration, a stored proc which obtains Persons who have any of a set of Descriptions, which are passed as parameters; I've coded for four. (Note, this is related to the quoted post, not my previous post; I am responding to that question/exchange, I am not retracting or invalidating my previous post.) I already have a cluster in the database with a similar construct, which I butchered so that I could test the code before posting, so please bear with me ...
    - for Part, read Person
    - for Keyword, read Description
    - for KeywordSum, read Bruce's integer or bigint column
    Code:
    CREATE PROC GetPartByKeyword_sp
    AS
            @Keyword1 _Description,         -- parameters/sargs, one is mandatory
            @Keyword2 _Description = NULL, 
            @Keyword3 _Description = NULL, 
            @Keyword4 _Description = NULL   -- allow up to 8/16/32 sargs
            
    SELECT  p.ShortName,
            p.KeywordSum                    -- limited to 8/16/32 Keywords
        FROM  OE_Part p
        WHERE p.KeywordSum & (
            SELECT SUM(KeywordId)
                FROM OE_Keyword
                WHERE Keyword = @Keyword1
                OR   (Keyword = @Keyword2 AND @Keyword2 != NULL)
                OR   (Keyword = @Keyword3 AND @Keyword3 != NULL)
                OR   (Keyword = @Keyword4 AND @Keyword4 != NULL)
                --   allow up to 8/16/32 sargs
                ) != 0
    Not a value or an enumerated value or a power-of-two anywhere; no need to know what 255 or 937 means; or what the ID for or "sexy" is.
    1) readability, i.e. self-documentation
    Readability is not an issue as long as the coder uses a consistent style. Self-documentation is dependent on (a) good naming conventions and (b) standards. It has nothing to do with bitwise operations. (Ok, if you've never seen them before, there is a learning curve, fair enough, but that is a separate issue; and it is certainly not a reason to exclude them.)
    2) portability (not everybody is on sql server, eh)
    I've answered that point at the top of this post. It is portable. (This column Database Concepts and Design, is for the Relational Database Management Systems; we have been discussing SQL specifically thus far in the forum/thread; we are not suddenly going to switch to PC spreadsheets with little or no Database capabilities and no Relational capabilities.)
    Last edited by Derek Asirvadem; 08-27-09 at 12:00.
    Regards
    Derek Asirvadem (Formerly DerekA)
    Information Architect / Senior Sybase DBA
    Copyright 2009 Software Gems Pty Ltd

    I answer questions from the Original Poster only. If you have a genuine question, as the moderators have requested, start a new thread.

    http://www.softwaregems.com.au

  15. #15
    Join Date
    Dec 2004
    Posts
    22
    Derek is right. If I understand correctly that I need to model this as a full-on many-to-many relationship with a cross reference table... This was the first thing I thought of but I wanted an alternative option, because it seems to make a mess of things as the schema grows (2 extra tables), including a cross reference table with a ton of rows in it. I would think this would offer trashy performance compared to simply adding flag columns and letting the view technology handle how to correctly represent the flags (check boxes). But then reporting tools would be an issue still.. (ahh I hate reporting tools!)

    I don't like many-to-many relations, they're always a pain to deal with.

    Maybe I should be using an object database instead of a relational. :/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •