If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Data Centralization, Advice Needed

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 09-28-10, 08:29
phoenixx phoenixx is offline
Registered User
 
Join Date: Sep 2010
Posts: 8
Data Centralization, Advice Needed

Dear Sirs et Madames,
I have recently been asked to combine an institutes varied and disparate health and demographic data sources into a centralized database which can be (eventually) queried via a query interface. Am trying to design the schema at the moment but would be glad of some advice before I move forward (My background is more as a Java programmer, I have never had to spend a significant amount of time on the database side of things). Scenario:

I have 3 applications which are the sources of data, HRS1, the original Data Collection application,HRS2 an upgrade of the original application (the data model changes somewhat) and the third is completely separate new application (OpenHDS) written in a different language (Java) and using MySQL as the database (the first 2 use VisualFoxPro), and the datamodel once again changes.

An additional complication is that the institute has several Demographic Surveillance sites. Over time, some sites have added columns to certain tables in order for them to capture more information , and as such most tables will have variations across site even when the datacollection application was the same. As an example let me just look at the Individual table from 2 separate sites:



HRS1 (Site A) Individual:

PERM_ID
REGION
OLD_REG
FAMILY_NUM
MEMBER_NUM
NAME
SEX
BIRTH_DATE
RELATIONSHIP_TO_HEAD
MOTHER_ID
FATHER_ID
HUSBAND_ID
ENTRY_TYPE
ENTRY_DATE
IS_PRESENT
EXIT_TYPE
EXIT_DATE
FIELD_WRKR
DATA_CLERK
DATAENTRY_DATE
STATUS_DAT
AGE

HRS1 (Site 2) Individual:

PERM_ID
REGION
FAMILY_NUM
MEMBER_NUM
NAME
SEX
BIRTH_DATE
RELATIONSHIP_TO_HEAD
MOTHER_ID
FATHER_ID
HUSBAND_ID
OCCUPATION
OCCUPATION_OTHER
EDUCATION
ENTRY_TYPE
ENTRY_DATE
EXIT_TYPE
EXIT_DATE
FIELD_WRKR
DATA_CLERK
DATAENTRY_DATE
STATUS_DAT
MARRIED
MARRIED_OTHER

HRS2 (Site 1) Individual:

INDIVIDID
NAME
GENDER
BIRTH_DATE
MOTHERID
FATHERID
FIELD_WRKR
DATA_CLERK
DATAENTRY_DATE
STATUS_DAT

OpenHDS (Site 1 & 2) Individual:

uuid
deleted
insertDate
status
voidDate
voidReason
dob
externalId
firstName
gender
lastName
middleName
pD
partialDate
collectedBy_uuid
insertedBy_uuid
voidedBy_uuid
father_uuid
mother_uuid
mother_uuid

Using this one Entity (Individual) as an example, how may one go about the steps of centralizing this data? The institute insists that no data should be left behind no matter how trivial it may seem. Am using PostgreSQL as the DBMS for the centralized database.

Please assist in any way, or let me know if there is any more information I can provide.

Thanks in advance
Reply With Quote
  #2 (permalink)  
Old 09-29-10, 07:10
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
Build a spreadsheet with every table and every column name and size such as:

Tbl1 - lastName - varchar(50)
Tbl1 - firstName - varchar(25)
etc.

Then across list the various offices you are gathering data from. For each site list the corresponding column name or leave blank if they don't have a match. This will let you map each source into one table design. When you find a column at a site that doesn't exist in your column list add it. If you find the same column at multiple sites but with different sizes change the column size you initially entered to match the largest source size. At the end you will have a complete list of columns and a size that will work, and you will know what column name of each site cooresponds to the final column name. Now all you need to do is write a script or program to load the data.
Reply With Quote
  #3 (permalink)  
Old 10-04-10, 02:20
phoenixx phoenixx is offline
Registered User
 
Join Date: Sep 2010
Posts: 8
Thanks Mark, will start from there... If I have any further questions will post again as a continuation of this same post.
Reply With Quote
Reply

Tags
centralization

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On