Dear Sirs et Madames,
I have recently been asked to combine an institutes varied and disparate health and demographic data sources into a centralized database which can be (eventually) queried via a query interface. Am trying to design the schema at the moment but would be glad of some advice before I move forward (My background is more as a Java programmer, I have never had to spend a significant amount of time on the database side of things). Scenario:
I have 3 applications which are the sources of data, HRS1, the original Data Collection application,HRS2 an upgrade of the original application (the data model changes somewhat) and the third is completely separate new application (OpenHDS) written in a different language (Java) and using MySQL as the database (the first 2 use VisualFoxPro), and the datamodel once again changes.
An additional complication is that the institute has several Demographic Surveillance sites. Over time, some sites have added columns to certain tables in order for them to capture more information , and as such most tables will have variations across site even when the datacollection application was the same. As an example let me just look at the Individual table from 2 separate sites:
HRS1 (Site A) Individual:
HRS1 (Site 2) Individual:
HRS2 (Site 1) Individual:
OpenHDS (Site 1 & 2) Individual:
Using this one Entity (Individual) as an example, how may one go about the steps of centralizing this data? The institute insists that no data should be left behind no matter how trivial it may seem. Am using PostgreSQL as the DBMS for the centralized database.
Please assist in any way, or let me know if there is any more information I can provide.
Then across list the various offices you are gathering data from. For each site list the corresponding column name or leave blank if they don't have a match. This will let you map each source into one table design. When you find a column at a site that doesn't exist in your column list add it. If you find the same column at multiple sites but with different sizes change the column size you initially entered to match the largest source size. At the end you will have a complete list of columns and a size that will work, and you will know what column name of each site cooresponds to the final column name. Now all you need to do is write a script or program to load the data.