Results 1 to 9 of 9
  1. #1
    Join Date
    Jul 2004
    Posts
    4

    Question How to store config options?

    My internet site trexx.ch allows users to create their own homepage with just a few clicks. However, there are a lot of things that each user can configure. With each release there are some new options. Now my question: What's the better approach? To have one user table with many columns for each option, (where you have to add a column every time you want a new one) or to have a user table AND an options table with the following structure?
    Code:
    user_id                 int
    option_name             varchar
    option_type             enum('int', 'varchar', 'date')
    option_int_value        int (null)
    option_varchar_value    varchar (null) 
    option_date_value       date (null)
    What would you recommend and why? Thanks for your help,
    tREXX

  2. #2
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi,

    I would recommend the second solution with an options table but with a slight change.

    First, the reason why I like the second solution.

    No matter how many options you add, the structure of the underlying table does not change. Therefore, all programs that work currently will work no matter how many options you add. Adding new options will only require you to write programs that know how to handle the new option and integrate it with the main program.

    The small change I would recommend is to have "value" and "option_type" columns instead of:
    option_type enum('int', 'varchar', 'date')
    option_int_value int (null)
    option_varchar_value varchar (null)
    option_date_value date (null)
    That way, you can add "images" if you so desire and you do not have to add an extra column to this table.

    The value can be either the actual text, or a pointer to an image or URL, etc. The option_type gives you enough information to allow you to do the right thing for it.

    Hope that was useful.

    As an aside, there are two key points related to software development that appear here.

    The second solution you proposed is essentially a way to keep things that change (the number of options, etc.) separate from the things that remain relatively unchanged (like any user information). This gives you a very clear separation in code and lets you develop much faster, reducing development time at least by half (rough estimate.)

    The other is the use of "data driven development" where you realize that the type assigned to the value of the option is data and should not be hard coded into the data structure. The flexibility gained, and the resultant raising of the level of abstraction in your programs, can result in an order of magnitude improvement in the development time and drastically reduce the maintenance effort.

    Ravi

  3. #3
    Join Date
    Sep 2004
    Posts
    5
    rajiravi: I'm not sure I like the idea of making the value a simple text field. If you do that then you have to make sure there are routines around *every* insert and update statement that ensure that the values can be converted to and from the target type from the string. For some types, this leads to a loss of precision (floating point) or efficiency (blob values must be encoded in the character encoding of the column). You also lose formatting capabilities of the DB with regard to special types. Most DBs offer a lot of date formatting and manipulation routines. These require a lot of extra work if you have to cast your text to a date first. Also, it makes it impossible to query using {<, >, BETWEEN, etc.} on date fields, and probably other fields where the string sorting doesn't match the native sorting, again, without converting to the right type from the string value.

    In other words, it makes the database much more inefficient, and the user code much more error prone. I think I'd prefer the OP's post as it stands, but I'd like even better an alternative solution, if there is one.

    This is similar to the problem in my post "How to allow user defined fields" but not quite the same. I'm sort of disappointed that there aren't any more elegant solutions to this problem. I'm sure it must be pretty common. I've actually encountered it myself out in the wild, in a competitor's document imaging product my company needed to convert data from.

    I do have to wonder what the OP's second solution has over his first. Your main concern is ease of upgrading, right? The options table is pretty much constant for any given release, right? So either way, you're going to have to provide a script that makes sure a certain table has all the necessary elements. In one case, it's the site table and columns. In the other, it's the options table and rows. I don't see a big difference there. And your user code might be considerably simplified by foregoing the options table in favor of your first solution. But maybe not.

  4. #4
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    pdf23ds writes that he fears a loss of efficiency and inability to write effective queries.

    While these are valid concerns, they must be considered within the overall requirements of the application. Based on the original post, I believe that the main need is for a flexible design that lets the developer/webmaster add features (options). Therefore, the flexible design approach suggested by the original poster seems a good starting point.

    Now, let us examine the performance implications of this design.

    The UI (User Interface, a browser for web applications) has all the information available to know what type of data (text, date, number [including floats]) is valid for an option. It is relatively easy to write a UI that will present, for example, a calendar to select from when it sees a date option. Or that will ensure that a value entered is a number.

    Even when there are three valid fields, as in the original suggestion of option_int_value, etc., the UI has to do some validation up-front to ensure that only valid dates are entered in date fields. You do not want to leave the enforcement to the database because, in general, it is best to capture errors as close to source as possible. The UI is best able to tell a valid value from an invalid one right away.

    I do not see any performance issues here. Javascript or JSPs generating Dynamic HTML is a possible solution.

    Coming to the question of blobs, in general, large objects are not stored directly in the database. Oftentimes, you store them in filesystems and have pointers identifying the location of the files. The code for an html page that has images in it exemplifies this approach. The html code itself never stores the image, but only points to the location of the image file.

    Such an approach is easily implemented in a UI. Based on the type of option, the UI can make the right decision, with or without further advice from the end-user.

    With my suggested approach, there are no performance issues so far.

    As for the question of queries, I am unable to see the purpose of queries asking for date values in certain options. Nevertheless, assuming that all date option-types are actually dates in some agreed upon format, it is easy to write the where query as:
    where option_name = 'A'
    and option_type = 'Date'
    and value = '01-Jan-2004'
    With a single table approach the query would be:
    where option_name = 'A'
    and option_type = 'Date'
    and option_date_value = '01-Jan-2004'
    Wow! The two queries are almost identical!

    Once again, I see neither a performance hit nor a problem with error prone code. The only errors that will be introduced are through an incomplete understanding of the problem (to put it politely.)

    Let us think through the columns in the original table again.. Here they are, repeated for convenience:

    Code:
     user_id                 int
          option_name             varchar
          option_type             enum('int', 'varchar', 'date')
          option_int_value        int (null)
          option_varchar_value    varchar (null) 
          option_date_value       date (null)

    If you observe closely, there is a subtle duplication involved in this description.

    The allowed values for option_type and the columns option_***_value are the same. This causes severe problems, that every time a new option_type is added, you must add a new column. A design that requires you to change data structures again and again is not good.


    Thanks,
    Ravi

  5. #5
    Join Date
    Sep 2004
    Posts
    5
    I'll let the OP evaluate most of your arguments, as they aren't material to my concerns.

    "The allowed values for option_type and the columns option_***_value are the same. This causes severe problems, that every time a new option_type is added, you must add a new column. A design that requires you to change data structures again and again is not good."

    I think that in this case the obvious solution is be to go ahead and include columns (and option_type members) for all the data types supported by the database, or all that the OP would ever consider using in any future revision of his or her software. As there would be a presumably small number of rows in this table, there wouldn't be a space issue with giving varchars and decimals and such very high lengths/precisions. And in the event that new data types are added to the database itself... Well, I don't think that's an issue. So there would never be a need to add new columns to the table.

  6. #6
    Join Date
    Jul 2004
    Posts
    4
    Thanks for your response, both of you. It seems like you'd have the same arguments for both approaches as i do :-) I think (from a developer point of view) the first solution is a more convenient approach, because it makes selecting and data manipulation easy.
    The second approach has a prettier, flexible design but requires more work to be done, writing a class that handles all the selecting, data manipulation and type-converting stuff.

    Anyway, it doesn't make a great difference for now. I guess it depends also on the number of possible options. As long as there aren't a lot (let's say < 30), it may be alright to have columns. If i would have hundred of options per user, i'd prefer the options table, because you could also add a column for some kind of categorization to that table.

  7. #7
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    The coding time for the second option is less than that for the first option. Do it and you will find out.

    It never varies!

    The first argument against the better method is almost always "performance issues". When that argument is refuted, the second argument invariably says the inferior option is better because it is "simpler". This is never backed up by any evidence or experimentation.

    In this case, the second solution has a simpler table structure, and consequently, less code. In fact, the second solution is the simpler solution.

    Given a choice between two options, one that has no shortcomings and another that is rigid and inflexible, the vast majority of developers choose the rigid, inflexible option. And then they end up with an application that is hard to maintain or enhance.


    Oh, well, so be it.

    Thanks,
    Ravi

  8. #8
    Join Date
    Jul 2004
    Posts
    4
    Ok, i've done some work and tried both options :-)

    Let's assume we'd like to have a view (for instance a simple view for a forum) like this:
    Code:
    ME_MEMBER_ID    ME_NAME ME_LOCATION ME_AGE  PO_MESSAGE
    1               trexx   Baden       20      trexx was here
    2               sepp    Rietheim    30      sepp was here too
    With some options stored in columns the tables might look like this:
    Code:
    MEMBERS
    ---------------------------------------------    
    ME_NAME ME_MEMBER_ID    ME_LOCATION    ME_AGE
    trexx   1               Baden          20
    sepp    2               Rietheim       30
        
    POSTS
    ---------------------------------------------
    PO_POST_ID  PO_MEMBER_ID    PO_TEXT
    100         1               trexx was here
    101         2               sepp was here too
    Now selecting the view is easy:
    Code:
    select
        me_member_id,
        me_name,
        me_location,
        me_age,
        po_text
    from
        members,
        posts
    where me_member_id=po_member_id
    Ok... Now we want to achieve the same thing, but we store the user options in an options table instead in columns. What we have now, might look like this:
    Code:
    MEMBERS
    ---------------------------------------------    
    ME_NAME ME_MEMBER_ID
    trexx   1
    sepp    2
    
    OPTIONS
    ---------------------------------------------    
    OP_MEMBER_ID    OP_NAME     OP_VALUE
    1               location    Baden
    1               age         20
    2               location    Rietheim
    2               age         30
    
    POSTS
    ---------------------------------------------    
    PO_POST_ID  PO_MEMBER_ID    PO_TEXT
    100         1               trexx was here
    101         2               sepp was here too
    Now there is no easy way to get the view that we want. If we simply join all the tables we get two rows per user, post and option. so we need something more complicated (tested on mysql 4.1.0, the REPEAT is just a workaround to avoid a bug):
    Code:
    select
        me_member_id,
        me_name,
        mo_location,
        mo_age,
        po_text
    from
        (select 
            me_member_id     mo_member_id, 
            max(me_location) mo_location, 
            max(me_age)      mo_age 
         from (    
             select 
                op_member_id    me_member_id, 
                repeat(' ', 20) me_location, 
                op_value        me_age
             from options
             where op_name='age'
             union all    
             select 
                op_member_id    me_member_id, 
                op_value        me_location, 
                '-'             me_age
             from options
             where op_name='location'
         ) tmp
         group by me_member_id) 
        mo_member_options,
        members,
        posts
    where 1=1
        and me_member_id=mo_member_id
        and me_member_id=po_member_id
    Is this the way to go? I guess i would create a view out of the subselect from the statement above (if mysql supports that someday *g*). But then, when we add a new option, we would also have to alter the view again. Is this really better than altering a table and some columns? I think, no matter how good you optimise your indizies and queries, the second statement will execute slower than the simple join... or am i completely wrong?

  9. #9
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi,


    I would put an index on the options table on the two columns (op_member_id, op_name)

    If your DBMS allows functions in select statements, you could create functions like this: (pseudo code)
    Code:
     
    function get_age(me_id)
    select op_value 
    into val
    from options
    where op_member_id = me_id
    and op_name = 'age';
    return val;
    Because of the index on the two columns in the where clause, performance should be quite fast.



    The view will be:
    Code:
     
    Select member_id, 
    get_age(me_member_id),
    get_location(me_member_id),
    ...
    from options
    where .....
    I think this view is quite clear and self-explanatory.


    Note: A further optimization can be performed by writing a generic function get_value as follows: (again pseudo-code)
    Code:
     
    function get_value( member_id, option_name)
    select op_value 
    into val
    from options
    where op_member_id = member_id
    and op_name = option_name;
    return val;
    Then the various get_xxx functions simply become:
    Code:
     
    function get_age (member_id)
    	return get_value(member_id, 'age');
    This way, there is one sql statement only, the DBMS parses it once and uses it forever (as long as the DB is up). Also, I believe that the code is relatively simple to understand.


    I feel that this structure will give acceptable performance.

    Also, if you want to test adding a new option quickly from the application, you do not have to modify the view, just add the data, and the application code could do something like:
    select get_value(me_member_id, 'new-option')
    from options
    where op_member_id = ...



    Hope that helps.

    Ravi

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •