Hi all

I have problem with data model design for my application. There are two main problems to solve:

1. Source Data - set of enities, connected with relations. Set of properties describing each of them is not well defined:
  • will be set up during system installation, and is likely to change over time
  • some of entities have "type" property, which determines set of attributes actually used
  • new types (and therefore - sets of attributes) are defined during system operation

2. There is set of transformations, which calcullate some values from Source Data's attributes.
  • transformations are dynamically defined during system operation.
  • transformations should be considered as calcullated source data entity's columns.

Additional requirements:
  • results of transformations should be available at database level (DB is used for integration with some subsystems), but generally, persisting transformation results is not necessary (they can be calcullated at runtime)
  • system should be able to access source data and transformed attributes in "regular" way (query for each column, use it in order by and where);

Source data storage considered so far:
1. Wide table:
  • contain columns for all possible attributes
  • null values where corresponding property is not available;
  • has to be periodically extended by adding new columns.

2. Dynamic schema
  • database schema is modified as necessary; lots of DDL generated and executed at runtime

3. Decomposed storage:
  • Core entity data (PK, FKs, type, ...), plus one of:
    1. vertical storage (Attribute-Value map for each entity)
    2. AV - separate table for each attribute type variant
    3. Dynamic Table (single table for each attribute)

  • Core entity data
  • variable attributes stored in some weakly structured format

5. XML storage
  • Core entity data
  • variable attributes stored as XML document
  • Generally similar to p4

1. Handled in application layer.
  • Database holds transformation definitions (serialized transformation objects).
  • Transformations are easy to implement in high level language
  • Transformation results have to be stored in database; if attibute set of source data entities is not constrained - results can be stored as regular entity attributes.
  • Otherwise - DB -> app calls may be necessary (application logic calls from within database)

2. Handled by database - dynamic SQL
  • transformations defined in specialized tables (will require forseeing all transformation types, in order to design storage for all possible transformation parameters)
  • dynamic sql is generated using transformation descriptions and used in queries

3. Handled by database - dynamic Functions
  • transformations defined in database, as in p.2
  • there is set of utility procedures, which generate stored procedure/function definitions using transformation descriptions.
  • queries use these functions

Handling transformations at application level seems to be most straightforward choice, but storing transformation results in db may be a problem (for large number of transformations, there will be lots of results, and most of them will probably not be used; unnecessary data redundancy) - so I'm considering dynamic functions approach.

For data storage two options seem reasonable: dynamic tables or XML.

Maybe anyone developed similar system, and have some experience in handling weakly structured data?

And by the way - such design decissions seem to be common for some problem areas (eg CRM systems), but I didn't manage to find comprehensive problem analysis anywhere. Any links to atricles, books, etc greatly appreciated.