1. Registered User
Join Date
Oct 2008
Posts
14

## Data Modelling Problem - Please Assist

Hi,

I am working on an database for statistics collection and I have the following problem.

I need to be able do the following:
1. User Decides Data disaggregation factors (such as gender/location/time etc)
2. Then one can store data according to disaggregation factors, so if the factor is gender, data should be entered for males and females (say if we are studying number of patients treated, at the point of data entry one would enter number of male patients and number of female patient. In the case of location disaggregation factor, user would enter any number of locations (could be towns), and data would be entered/stored accordingly. So instead of male and female data as in the first example, data would be stored for all patients, location wise. To complicate matters a bit further, the disaggregation factors can be combined. This means a user can decide a certain piece of data is to be aggregated by both gender and location, and one should be able to store/query the records accordingly (i.e. store male patients from a certain location). The disaggregation factors could be anything a user decides them to me.

Can you advise how to best model this using Relational Model?

2. Registered User
Join Date
Dec 2007
Location
London, UK
Posts
741
Sounds pretty straightforward so far:

PatientLocation {PatientID*, Location}
PatientGender {PatientID*, Gender}

(*=Key)

Aggregation of both together just requires a join.

3. Registered User
Join Date
Jul 2009
Location
Michigan
Posts
125
Okay, I have read this several times and am not totally sure I understand what you want, but here is my best guess.

First you need a table to define what the disaggregation factors are:

DisaggregationFactor:
DisaggregationFactorID int (serially assigned number)
Description varchar(50) (gender, location, age, etc)

Next you need a table to store the valid responses for each disaggregation factor:

FactorValue:
FactorValueID int (serially assigned number)
DisaggregationFactorID int (FK)
ValidValue varchar(50) (male/female, Jamestown, Ages 1-10, etc)

Next you need a table to identify the participants, even if you only give them a number:

Participant:
ParticipantID int (serially assigned number)
Name varchar(50)

Now you need a table to store the disaggregation factor values for each participant:

Response:
ParticipantID int (FK)
FactorValueID (FK)

To get an aggregate for any given disaggregation factor you need use joins to get from the DisaggregationFactor you want to the various possible FactorValues to the Response table with matching IDs.

Assuming you have multiple statistical studies going on at once you will need to add a study table and the study key to each of the tables.

Hope I came at least close to what you are asking.

4. vaguely human
Join Date
Jun 2007
Location
London
Posts
2,527
Originally Posted by gravity
Can you advise how to best model this using Relational Model?
It's very difficult to work out what you want from your description. If you're just producing user defined reports on patients then couldn't you just use put the fields in the patient table and then add another "meta" table to hold the various allowed fields (gender, location) ie
Code:
Patient ( id, name, gender, location, etc )
Fields ( name ) -- gender, location etc
Then just build the SQL to do the reporting ie ask the user if they want to limit by one of the fields and if so pull the allowed values ie where gender="Male". It's a can of worms deciding where you store the allowed values but you basically have the following choices:
• Pull the valid values from the Patient table but then how do you initialise the starting values.
• Having FKs on lookup tables is fine but you now have to work out which table to grab the data from plus your screen to maintain these lookup values must manage all these tables.
• Using a single lookup table (OTLT) makes it easy to manage and access the values but you loose the automatic referential checks - your choice.
• Using constraints to limit values is a non starter as you can't access the allowed values.
• You could also just let the user type in their own value and forget about allowed values all together.

Next you'd need to ask the user what field(s) the user wants to total by and add this to the SQL. Finally you just run the SQL and display the results. I did a similar system where the user could then drill down through the data. The generated SQL isn't complicated and might look like:
Code:
select gender, count(*)
from Patient
group by gender

select location, count(*)
from Patient
where gender = "Male"
group by location
If new fields will be added often then you'll have problems because you'll have to alter the structure of the database each time - ie add the new field to the patients table and add a new lookup table if you're using FKs. If this is the case then you might want to look at EAV but then this isn't exactly a relational approach and it's looked down upon generally.

5. Registered User
Join Date
Oct 2008
Posts
14
Originally Posted by dportas
Sounds pretty straightforward so far:

PatientLocation {PatientID*, Location}
PatientGender {PatientID*, Gender}

(*=Key)

Aggregation of both together just requires a join.
Thanks but as I said the disaggregation factors could be anything the user decides them to be (and hence aren't known in advance).

6. Registered User
Join Date
Dec 2007
Location
London, UK
Posts
741
Originally Posted by gravity
Thanks but as I said the disaggregation factors could be anything the user decides them to be (and hence aren't known in advance).
Then the question is what tools will these users use to create that data model and execute queries against it? It doesn't alter my suggestion for the design.

7. Registered User
Join Date
Oct 2008
Posts
14
Originally Posted by mike_bike_kite
It's very difficult to work out what you want from your description. If you're just producing user defined reports on patients then couldn't you just use put the fields in the patient table and then add another "meta" table to hold the various allowed fields (gender, location) ie
Code:
Patient ( id, name, gender, location, etc )
Fields ( name ) -- gender, location etc
Then just build the SQL to do the reporting ie ask the user if they want to limit by one of the fields and if so pull the allowed values ie where gender="Male". It's a can of worms deciding where you store the allowed values but you basically have the following choices:
• Pull the valid values from the Patient table but then how do you initialise the starting values.
• Having FKs on lookup tables is fine but you now have to work out which table to grab the data from plus your screen to maintain these lookup values must manage all these tables.
• Using a single lookup table (OTLT) makes it easy to manage and access the values but you loose the automatic referential checks - your choice.
• Using constraints to limit values is a non starter as you can't access the allowed values.
• You could also just let the user type in their own value and forget about allowed values all together.

Next you'd need to ask the user what field(s) the user wants to total by and add this to the SQL. Finally you just run the SQL and display the results. I did a similar system where the user could then drill down through the data. The generated SQL isn't complicated and might look like:
Code:
select gender, count(*)
from Patient
group by gender

select location, count(*)
from Patient
where gender = "Male"
group by location
If new fields will be added often then you'll have problems because you'll have to alter the structure of the database each time - ie add the new field to the patients table and add a new lookup table if you're using FKs. If this is the case then you might want to look at EAV but then this isn't exactly a relational approach and it's looked down upon generally.
If I understand you correctly, I can't have a table with a column called gender or location because these things are user defined. They can be anything the user can think of.

8. Registered User
Join Date
Oct 2008
Posts
14
Originally Posted by MarkATrombley
Okay, I have read this several times and am not totally sure I understand what you want, but here is my best guess.

First you need a table to define what the disaggregation factors are:

DisaggregationFactor:
DisaggregationFactorID int (serially assigned number)
Description varchar(50) (gender, location, age, etc)

Next you need a table to store the valid responses for each disaggregation factor:

FactorValue:
FactorValueID int (serially assigned number)
DisaggregationFactorID int (FK)
ValidValue varchar(50) (male/female, Jamestown, Ages 1-10, etc)

Next you need a table to identify the participants, even if you only give them a number:

Participant:
ParticipantID int (serially assigned number)
Name varchar(50)

Now you need a table to store the disaggregation factor values for each participant:

Response:
ParticipantID int (FK)
FactorValueID (FK)

To get an aggregate for any given disaggregation factor you need use joins to get from the DisaggregationFactor you want to the various possible FactorValues to the Response table with matching IDs.

Assuming you have multiple statistical studies going on at once you will need to add a study table and the study key to each of the tables.

Hope I came at least close to what you are asking.
Thanks Mark I was thinking along these lines too. One thing I cant seem to get my head around, the data needs to be recorder over a period of time. So if I am looking at a number or patients with a certain disease, and I decide to record/monitor the data by gender and location, I would record this data every so often (periodically). So every time the user has a new piece of data to record, they would have numbers for {male, location1}, {female, location1}, {male, location2}, {female, location2}, {male, location3}, {female, location3}...........{male, locationn}, {female, locationn}

Hopefully that makes sense.

9. vaguely human
Join Date
Jun 2007
Location
London
Posts
2,527
Originally Posted by gravity
If I understand you correctly, I can't have a table with a column called gender or location because these things are user defined. They can be anything the user can think of.
I must admit I was assuming your database held the information before you started reporting on it. If your db only holds gender and location and the user wants to report on blood type then you'll obviously have problems. Few questions:
• Does your db currently store the fields the user wants to report on?
• if not how often will they change and how will you populate any new fields with data?
• Will these fields change often?
• Roughly how much data will you be storing?
• Are those example reports a fair example of end reports or do some get much more complicated?

You're getting so many different types of answers because your description wasn't very clear and we probably need a few examples.

10. Registered User
Join Date
Oct 2008
Posts
14
Originally Posted by dportas
Then the question is what tools will these users use to create that data model and execute queries against it? It doesn't alter my suggestion for the design.
They will get someone to write a Front End app for them.

11. Registered User
Join Date
Dec 2007
Location
London, UK
Posts
741
Originally Posted by gravity
They will get someone to write a Front End app for them.
Then let that same person design the database.

12. Registered User
Join Date
Oct 2008
Posts
14
Originally Posted by mike_bike_kite
I must admit I was assuming your database held the information before you started reporting on it. If your db only holds gender and location and the user wants to report on blood type then you'll obviously have problems. Few questions:
• Does your db currently store the fields the user wants to report on?
No. It should make room for the user to define these.
• if not how often will they change and how will you populate any new fields with data?
The disaggregation factors differ depending on the project the user does. So in the case of the patient record project, these could be gender, location, bloodtype. If the same user has a public sensitization/training campaign, the factors may differ. In this case he may want to look at recipients' level of education etc
• Will these fields change often?
In a sense yes. They are project specific but all data should sit on the same DB
• Roughly how much data will you be storing?
I have no idea to be honest, but in time the DB will grow
• Are those example reports a fair example of end reports or do some get much more complicated?
I would say they are fair. But if they are to get more complicated I would rather be ready.

You're getting so many different types of answers because your description wasn't very clear and we probably need a few examples.

13. Registered User
Join Date
Jul 2009
Location
Michigan
Posts
125
Originally Posted by gravity
Thanks Mark I was thinking along these lines too. One thing I cant seem to get my head around, the data needs to be recorder over a period of time. So if I am looking at a number or patients with a certain disease, and I decide to record/monitor the data by gender and location, I would record this data every so often (periodically). So every time the user has a new piece of data to record, they would have numbers for {male, location1}, {female, location1}, {male, location2}, {female, location2}, {male, location3}, {female, location3}...........{male, locationn}, {female, locationn}

Hopefully that makes sense.
Since you never know how they are going to request the data to be aggregated you want to store it in as much detail as possible. If you add a date to the Response table you can record multiple responses (office visits, surveys, etc) for each patient. Then when they ask for a specific aggregation (gender and location, grouped by month) you can query the data (which is stored in its disaggregated form) and aggregate it the way they want.

To be more normalized you would actually add a table before Response that would assign a key to the office visit, survey, whatever and store the date there. Then instead of adding the date to the Response table you would add the key to this new table.

Glad I can be of help. Hopefully this new response gets you pointed in the right direction.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•