Results 1 to 5 of 5
  1. #1
    Join Date
    Apr 2009
    Posts
    4

    Updating after normalizing tables

    I am developing a database for extracting information from a publicly-available voter file. It is a huge flat file containing over 900,000 records, and is updated weekly for downloading as a zipped text file, with the first row having assigned field names. It is pretty easy to just drop the table and import the entire file into the structure as long as I keep the data in one table.

    I have developed a schema to split it into normalized tables, but now I don't see how to update the related tables without going through the entire split after updating the original table. Any suggestions on how to proceed will be greatly appreciated.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    always load your external file into a single "landing table" which has fields that mirror the file layout exactly

    once you've loaded the "landing table" successfully, you can use SQL queries to pull the information out and store it into the other related tables
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Apr 2009
    Posts
    4
    Thanks. I was kind of heading down that path, but hoping for something more elegant. What you suggest actually seems like the best method, because it will not be a simple update or append operation. Comparing the downloads from the end of January until last week shows that 5328 voters have been dropped and 2980 new voters were registered. In addition, individual fields may have changed in some of the records common to both dates (address, party affiliation, etc.). So, I will have to delete some records (or move them to another table), append some, and update some. All of which makes me wonder if it is really worthwhile to normalize the database, especially since I would not be adding or editing records manually.

  4. #4
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    the advantage of the "landing field" approach is that with judicious use of SQL you can simply transfer the records you can, leave the problem records in the landing zone to until you work out why they have failed to load (could be duplicates, data errors in the sending system, or whatever)

    as to whether the data should be normalised... its all depends on what you see this information being used for. If you know every purpose that this data could be used for right now then I suspect you may be right you may not need to normalise the data. However if your task involved any form of uncertainty then normalising should be considered. Ultimately its your choice, most respondants here I expect would say normalise... but that may not be required. Its comes down to you how much time you have got, how much experience you've got.

    If the DB is to be used for political campaigning or generating spam telephone, email or postal mail then I'd strongly reccomend that you dont' normalise it. Alternatively if you are not experienced in designing db's and its intended for the above purposes then I'd strongly recommend that you normalise immediately.
    Last edited by healdem; 04-14-09 at 05:22.
    I'd rather be riding on the Tiger 800 or the Norton

  5. #5
    Join Date
    Apr 2009
    Posts
    4
    Thanks, healdm and r937 -- I really appreciate your thoughtful replies. The principal use of this database is to identify voters in different precincts, to check such things as party affiliation, approximate age, and voting history, along with their physical addresses. Much of this information can be obtained by using an expensive online database maintained by a national party. Part of my motivation for using the public records and open source software is that we focus on grassroots volunteers, and my personal quest is to remove some of the influence of money on politics. Ultimately, I want to be able to train office staff to use the database to track any changes that occur with people who are members of our organization. I am developing a separate database for members, which has such non-public information as email addresses, and I use the unique voter registration number from the public data as the primary key in both databases.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •