Results 1 to 3 of 3
  1. #1
    Join Date
    Sep 2010
    Posts
    8

    Data Centralization, Advice Needed

    Dear Sirs et Madames,
    I have recently been asked to combine an institutes varied and disparate health and demographic data sources into a centralized database which can be (eventually) queried via a query interface. Am trying to design the schema at the moment but would be glad of some advice before I move forward (My background is more as a Java programmer, I have never had to spend a significant amount of time on the database side of things). Scenario:

    I have 3 applications which are the sources of data, HRS1, the original Data Collection application,HRS2 an upgrade of the original application (the data model changes somewhat) and the third is completely separate new application (OpenHDS) written in a different language (Java) and using MySQL as the database (the first 2 use VisualFoxPro), and the datamodel once again changes.

    An additional complication is that the institute has several Demographic Surveillance sites. Over time, some sites have added columns to certain tables in order for them to capture more information , and as such most tables will have variations across site even when the datacollection application was the same. As an example let me just look at the Individual table from 2 separate sites:



    HRS1 (Site A) Individual:

    PERM_ID
    REGION
    OLD_REG
    FAMILY_NUM
    MEMBER_NUM
    NAME
    SEX
    BIRTH_DATE
    RELATIONSHIP_TO_HEAD
    MOTHER_ID
    FATHER_ID
    HUSBAND_ID
    ENTRY_TYPE
    ENTRY_DATE
    IS_PRESENT
    EXIT_TYPE
    EXIT_DATE
    FIELD_WRKR
    DATA_CLERK
    DATAENTRY_DATE
    STATUS_DAT
    AGE

    HRS1 (Site 2) Individual:

    PERM_ID
    REGION
    FAMILY_NUM
    MEMBER_NUM
    NAME
    SEX
    BIRTH_DATE
    RELATIONSHIP_TO_HEAD
    MOTHER_ID
    FATHER_ID
    HUSBAND_ID
    OCCUPATION
    OCCUPATION_OTHER
    EDUCATION
    ENTRY_TYPE
    ENTRY_DATE
    EXIT_TYPE
    EXIT_DATE
    FIELD_WRKR
    DATA_CLERK
    DATAENTRY_DATE
    STATUS_DAT
    MARRIED
    MARRIED_OTHER

    HRS2 (Site 1) Individual:

    INDIVIDID
    NAME
    GENDER
    BIRTH_DATE
    MOTHERID
    FATHERID
    FIELD_WRKR
    DATA_CLERK
    DATAENTRY_DATE
    STATUS_DAT

    OpenHDS (Site 1 & 2) Individual:

    uuid
    deleted
    insertDate
    status
    voidDate
    voidReason
    dob
    externalId
    firstName
    gender
    lastName
    middleName
    pD
    partialDate
    collectedBy_uuid
    insertedBy_uuid
    voidedBy_uuid
    father_uuid
    mother_uuid
    mother_uuid

    Using this one Entity (Individual) as an example, how may one go about the steps of centralizing this data? The institute insists that no data should be left behind no matter how trivial it may seem. Am using PostgreSQL as the DBMS for the centralized database.

    Please assist in any way, or let me know if there is any more information I can provide.

    Thanks in advance

  2. #2
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    Build a spreadsheet with every table and every column name and size such as:

    Tbl1 - lastName - varchar(50)
    Tbl1 - firstName - varchar(25)
    etc.

    Then across list the various offices you are gathering data from. For each site list the corresponding column name or leave blank if they don't have a match. This will let you map each source into one table design. When you find a column at a site that doesn't exist in your column list add it. If you find the same column at multiple sites but with different sizes change the column size you initially entered to match the largest source size. At the end you will have a complete list of columns and a size that will work, and you will know what column name of each site cooresponds to the final column name. Now all you need to do is write a script or program to load the data.

  3. #3
    Join Date
    Sep 2010
    Posts
    8
    Thanks Mark, will start from there... If I have any further questions will post again as a continuation of this same post.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •