Results 1 to 4 of 4
  1. #1
    Join Date
    Jul 2009
    Posts
    2

    Unanswered: A database with a memory

    Sorry, I'm relatively new to databases, so this may be a stupid question.

    I want to create a database using MySQL. This database will consist of raw data that will need to be cleaned up: outliers will need to be replaced by best estimates, high frequency data will need to be averaged over various time intervals, etc. When doing the cleaning, I do not want to overwrite any old data. That is, I want to keep the original raw data and also the new cleaned up data, and I want this data to be all in the same database. In other words, I kind of want a database with a memory. Nothing ever gets erased or overwritten. If needs be, one can revert to the old state.

    Does anyone here know of the most efficient way to do this using MySQL?

    Thanks.

    Mike

  2. #2
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Do the following:
    • Load the input data into a feed table
    • Put a date on these records and copy them to a raw data table that holds all the raw data for each day.
    • Process the feed data as you need to.
    • Put the processed data with the date into a processed data table.
    • Delete any ancient data you no longer need.

    You will do your reporting etc on the processed data table. It's not difficult but I'd suggest that you get someone who knows what they're doing to write this for you - it will save you a lot of time.

    Mike

  3. #3
    Join Date
    Jul 2009
    Posts
    2
    By creating two tables, would I not use twice the memory? This could be a problem as I expect the database to be large (~10-100 GB).

    Sometimes, "cleanizing" the data will just involve replacing the occasional outlier, presumed to be an error, with a best estimate, which hopefully better reflects the real value. Thus, the processed data table need not contain much data - it could be very sparse.

    Any more suggestions about how to efficiently do this?

    Quote Originally Posted by mike_bike_kite
    Do the following:
    • Load the input data into a feed table
    • Put a date on these records and copy them to a raw data table that holds all the raw data for each day.
    • Process the feed data as you need to.
    • Put the processed data with the date into a processed data table.
    • Delete any ancient data you no longer need.

    You will do your reporting etc on the processed data table. It's not difficult but I'd suggest that you get someone who knows what they're doing to write this for you - it will save you a lot of time.

    Mike

  4. #4
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by mwangi
    Any more suggestions about how to efficiently do this?
    My reply was an efficient solution to your original problem. Now you've supplied more information I could suggest having a table only containing the records that you've processed. This will affect performance when selecting the data ie you'll now need to select all the raw data where there isn't a replacement "processed" record and then add all the processed records (this can be done in one select if you like).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •