If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > How to store config options?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 09-11-04, 09:22
tREXX tREXX is offline
Registered User
 
Join Date: Jul 2004
Posts: 4
Question How to store config options?

My internet site trexx.ch allows users to create their own homepage with just a few clicks. However, there are a lot of things that each user can configure. With each release there are some new options. Now my question: What's the better approach? To have one user table with many columns for each option, (where you have to add a column every time you want a new one) or to have a user table AND an options table with the following structure?
Code:
user_id                 int
option_name             varchar
option_type             enum('int', 'varchar', 'date')
option_int_value        int (null)
option_varchar_value    varchar (null) 
option_date_value       date (null)
What would you recommend and why? Thanks for your help,
tREXX
Reply With Quote
  #2 (permalink)  
Old 09-11-04, 21:28
rajiravi rajiravi is offline
Registered User
 
Join Date: Apr 2004
Location: Toronto, Canada
Posts: 249
Hi,

I would recommend the second solution with an options table but with a slight change.

First, the reason why I like the second solution.

No matter how many options you add, the structure of the underlying table does not change. Therefore, all programs that work currently will work no matter how many options you add. Adding new options will only require you to write programs that know how to handle the new option and integrate it with the main program.

The small change I would recommend is to have "value" and "option_type" columns instead of:
option_type enum('int', 'varchar', 'date')
option_int_value int (null)
option_varchar_value varchar (null)
option_date_value date (null)
That way, you can add "images" if you so desire and you do not have to add an extra column to this table.

The value can be either the actual text, or a pointer to an image or URL, etc. The option_type gives you enough information to allow you to do the right thing for it.

Hope that was useful.

As an aside, there are two key points related to software development that appear here.

The second solution you proposed is essentially a way to keep things that change (the number of options, etc.) separate from the things that remain relatively unchanged (like any user information). This gives you a very clear separation in code and lets you develop much faster, reducing development time at least by half (rough estimate.)

The other is the use of "data driven development" where you realize that the type assigned to the value of the option is data and should not be hard coded into the data structure. The flexibility gained, and the resultant raising of the level of abstraction in your programs, can result in an order of magnitude improvement in the development time and drastically reduce the maintenance effort.

Ravi
Reply With Quote
  #3 (permalink)  
Old 09-11-04, 22:17
pdf23ds pdf23ds is offline
Registered User
 
Join Date: Sep 2004
Posts: 5
rajiravi: I'm not sure I like the idea of making the value a simple text field. If you do that then you have to make sure there are routines around *every* insert and update statement that ensure that the values can be converted to and from the target type from the string. For some types, this leads to a loss of precision (floating point) or efficiency (blob values must be encoded in the character encoding of the column). You also lose formatting capabilities of the DB with regard to special types. Most DBs offer a lot of date formatting and manipulation routines. These require a lot of extra work if you have to cast your text to a date first. Also, it makes it impossible to query using {<, >, BETWEEN, etc.} on date fields, and probably other fields where the string sorting doesn't match the native sorting, again, without converting to the right type from the string value.

In other words, it makes the database much more inefficient, and the user code much more error prone. I think I'd prefer the OP's post as it stands, but I'd like even better an alternative solution, if there is one.

This is similar to the problem in my post "How to allow user defined fields" but not quite the same. I'm sort of disappointed that there aren't any more elegant solutions to this problem. I'm sure it must be pretty common. I've actually encountered it myself out in the wild, in a competitor's document imaging product my company needed to convert data from.

I do have to wonder what the OP's second solution has over his first. Your main concern is ease of upgrading, right? The options table is pretty much constant for any given release, right? So either way, you're going to have to provide a script that makes sure a certain table has all the necessary elements. In one case, it's the site table and columns. In the other, it's the options table and rows. I don't see a big difference there. And your user code might be considerably simplified by foregoing the options table in favor of your first solution. But maybe not.
Reply With Quote
  #4 (permalink)  
Old 09-11-04, 23:33
rajiravi rajiravi is offline
Registered User
 
Join Date: Apr 2004
Location: Toronto, Canada
Posts: 249
pdf23ds writes that he fears a loss of efficiency and inability to write effective queries.

While these are valid concerns, they must be considered within the overall requirements of the application. Based on the original post, I believe that the main need is for a flexible design that lets the developer/webmaster add features (options). Therefore, the flexible design approach suggested by the original poster seems a good starting point.

Now, let us examine the performance implications of this design.

The UI (User Interface, a browser for web applications) has all the information available to know what type of data (text, date, number [including floats]) is valid for an option. It is relatively easy to write a UI that will present, for example, a calendar to select from when it sees a date option. Or that will ensure that a value entered is a number.

Even when there are three valid fields, as in the original suggestion of option_int_value, etc., the UI has to do some validation up-front to ensure that only valid dates are entered in date fields. You do not want to leave the enforcement to the database because, in general, it is best to capture errors as close to source as possible. The UI is best able to tell a valid value from an invalid one right away.

I do not see any performance issues here. Javascript or JSPs generating Dynamic HTML is a possible solution.

Coming to the question of blobs, in general, large objects are not stored directly in the database. Oftentimes, you store them in filesystems and have pointers identifying the location of the files. The code for an html page that has images in it exemplifies this approach. The html code itself never stores the image, but only points to the location of the image file.

Such an approach is easily implemented in a UI. Based on the type of option, the UI can make the right decision, with or without further advice from the end-user.

With my suggested approach, there are no performance issues so far.

As for the question of queries, I am unable to see the purpose of queries asking for date values in certain options. Nevertheless, assuming that all date option-types are actually dates in some agreed upon format, it is easy to write the where query as:
where option_name = 'A'
and option_type = 'Date'
and value = '01-Jan-2004'

With a single table approach the query would be:
where option_name = 'A'
and option_type = 'Date'
and option_date_value = '01-Jan-2004'

Wow! The two queries are almost identical!

Once again, I see neither a performance hit nor a problem with error prone code. The only errors that will be introduced are through an incomplete understanding of the problem (to put it politely.)

Let us think through the columns in the original table again.. Here they are, repeated for convenience:

Code:
 user_id                 int
      option_name             varchar
      option_type             enum('int', 'varchar', 'date')
      option_int_value        int (null)
      option_varchar_value    varchar (null) 
      option_date_value       date (null)

If you observe closely, there is a subtle duplication involved in this description.

The allowed values for option_type and the columns option_***_value are the same. This causes severe problems, that every time a new option_type is added, you must add a new column. A design that requires you to change data structures again and again is not good.


Thanks,
Ravi
Reply With Quote
  #5 (permalink)  
Old 09-12-04, 00:36
pdf23ds pdf23ds is offline
Registered User
 
Join Date: Sep 2004
Posts: 5
I'll let the OP evaluate most of your arguments, as they aren't material to my concerns.

"The allowed values for option_type and the columns option_***_value are the same. This causes severe problems, that every time a new option_type is added, you must add a new column. A design that requires you to change data structures again and again is not good."

I think that in this case the obvious solution is be to go ahead and include columns (and option_type members) for all the data types supported by the database, or all that the OP would ever consider using in any future revision of his or her software. As there would be a presumably small number of rows in this table, there wouldn't be a space issue with giving varchars and decimals and such very high lengths/precisions. And in the event that new data types are added to the database itself... Well, I don't think that's an issue. So there would never be a need to add new columns to the table.
Reply With Quote
  #6 (permalink)  
Old 09-12-04, 08:14
tREXX tREXX is offline
Registered User
 
Join Date: Jul 2004
Posts: 4
Thanks for your response, both of you. It seems like you'd have the same arguments for both approaches as i do :-) I think (from a developer point of view) the first solution is a more convenient approach, because it makes selecting and data manipulation easy.
The second approach has a prettier, flexible design but requires more work to be done, writing a class that handles all the selecting, data manipulation and type-converting stuff.

Anyway, it doesn't make a great difference for now. I guess it depends also on the number of possible options. As long as there aren't a lot (let's say < 30), it may be alright to have columns. If i would have hundred of options per user, i'd prefer the options table, because you could also add a column for some kind of categorization to that table.
Reply With Quote
  #7 (permalink)  
Old 09-12-04, 10:32
rajiravi rajiravi is offline
Registered User
 
Join Date: Apr 2004
Location: Toronto, Canada
Posts: 249
The coding time for the second option is less than that for the first option. Do it and you will find out.

It never varies!

The first argument against the better method is almost always "performance issues". When that argument is refuted, the second argument invariably says the inferior option is better because it is "simpler". This is never backed up by any evidence or experimentation.

In this case, the second solution has a simpler table structure, and consequently, less code. In fact, the second solution is the simpler solution.

Given a choice between two options, one that has no shortcomings and another that is rigid and inflexible, the vast majority of developers choose the rigid, inflexible option. And then they end up with an application that is hard to maintain or enhance.


Oh, well, so be it.

Thanks,
Ravi
Reply With Quote
  #8 (permalink)  
Old 09-14-04, 08:56
tREXX tREXX is offline
Registered User
 
Join Date: Jul 2004
Posts: 4
Ok, i've done some work and tried both options :-)

Let's assume we'd like to have a view (for instance a simple view for a forum) like this:
Code:
ME_MEMBER_ID    ME_NAME ME_LOCATION ME_AGE  PO_MESSAGE
1               trexx   Baden       20      trexx was here
2               sepp    Rietheim    30      sepp was here too
With some options stored in columns the tables might look like this:
Code:
MEMBERS
---------------------------------------------    
ME_NAME ME_MEMBER_ID    ME_LOCATION    ME_AGE
trexx   1               Baden          20
sepp    2               Rietheim       30
    
POSTS
---------------------------------------------
PO_POST_ID  PO_MEMBER_ID    PO_TEXT
100         1               trexx was here
101         2               sepp was here too
Now selecting the view is easy:
Code:
select
    me_member_id,
    me_name,
    me_location,
    me_age,
    po_text
from
    members,
    posts
where me_member_id=po_member_id
Ok... Now we want to achieve the same thing, but we store the user options in an options table instead in columns. What we have now, might look like this:
Code:
MEMBERS
---------------------------------------------    
ME_NAME ME_MEMBER_ID
trexx   1
sepp    2

OPTIONS
---------------------------------------------    
OP_MEMBER_ID    OP_NAME     OP_VALUE
1               location    Baden
1               age         20
2               location    Rietheim
2               age         30

POSTS
---------------------------------------------    
PO_POST_ID  PO_MEMBER_ID    PO_TEXT
100         1               trexx was here
101         2               sepp was here too
Now there is no easy way to get the view that we want. If we simply join all the tables we get two rows per user, post and option. so we need something more complicated (tested on mysql 4.1.0, the REPEAT is just a workaround to avoid a bug):
Code:
select
    me_member_id,
    me_name,
    mo_location,
    mo_age,
    po_text
from
    (select 
        me_member_id     mo_member_id, 
        max(me_location) mo_location, 
        max(me_age)      mo_age 
     from (    
         select 
            op_member_id    me_member_id, 
            repeat(' ', 20) me_location, 
            op_value        me_age
         from options
         where op_name='age'
         union all    
         select 
            op_member_id    me_member_id, 
            op_value        me_location, 
            '-'             me_age
         from options
         where op_name='location'
     ) tmp
     group by me_member_id) 
    mo_member_options,
    members,
    posts
where 1=1
    and me_member_id=mo_member_id
    and me_member_id=po_member_id
Is this the way to go? I guess i would create a view out of the subselect from the statement above (if mysql supports that someday *g*). But then, when we add a new option, we would also have to alter the view again. Is this really better than altering a table and some columns? I think, no matter how good you optimise your indizies and queries, the second statement will execute slower than the simple join... or am i completely wrong?
Reply With Quote
  #9 (permalink)  
Old 09-14-04, 11:18
rajiravi rajiravi is offline
Registered User
 
Join Date: Apr 2004
Location: Toronto, Canada
Posts: 249
Hi,


I would put an index on the options table on the two columns (op_member_id, op_name)

If your DBMS allows functions in select statements, you could create functions like this: (pseudo code)
Code:
 
function get_age(me_id)
select op_value 
into val
from options
where op_member_id = me_id
and op_name = 'age';
return val;
Because of the index on the two columns in the where clause, performance should be quite fast.



The view will be:
Code:
 
Select member_id, 
get_age(me_member_id),
get_location(me_member_id),
...
from options
where .....
I think this view is quite clear and self-explanatory.


Note: A further optimization can be performed by writing a generic function get_value as follows: (again pseudo-code)
Code:
 
function get_value( member_id, option_name)
select op_value 
into val
from options
where op_member_id = member_id
and op_name = option_name;
return val;
Then the various get_xxx functions simply become:
Code:
 
function get_age (member_id)
	return get_value(member_id, 'age');
This way, there is one sql statement only, the DBMS parses it once and uses it forever (as long as the DB is up). Also, I believe that the code is relatively simple to understand.


I feel that this structure will give acceptable performance.

Also, if you want to test adding a new option quickly from the application, you do not have to modify the view, just add the data, and the application code could do something like:
select get_value(me_member_id, 'new-option')
from options
where op_member_id = ...



Hope that helps.

Ravi
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On