If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > A database with a memory

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-01-09, 17:53
mwangi mwangi is offline
Registered User
 
Join Date: Jul 2009
Posts: 2
A database with a memory

Sorry, I'm relatively new to databases, so this may be a stupid question.

I want to create a database using MySQL. This database will consist of raw data that will need to be cleaned up: outliers will need to be replaced by best estimates, high frequency data will need to be averaged over various time intervals, etc. When doing the cleaning, I do not want to overwrite any old data. That is, I want to keep the original raw data and also the new cleaned up data, and I want this data to be all in the same database. In other words, I kind of want a database with a memory. Nothing ever gets erased or overwritten. If needs be, one can revert to the old state.

Does anyone here know of the most efficient way to do this using MySQL?

Thanks.

Mike
Reply With Quote
  #2 (permalink)  
Old 07-02-09, 07:42
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,517
Do the following:
  • Load the input data into a feed table
  • Put a date on these records and copy them to a raw data table that holds all the raw data for each day.
  • Process the feed data as you need to.
  • Put the processed data with the date into a processed data table.
  • Delete any ancient data you no longer need.
You will do your reporting etc on the processed data table. It's not difficult but I'd suggest that you get someone who knows what they're doing to write this for you - it will save you a lot of time.

Mike
Reply With Quote
  #3 (permalink)  
Old 07-03-09, 14:26
mwangi mwangi is offline
Registered User
 
Join Date: Jul 2009
Posts: 2
By creating two tables, would I not use twice the memory? This could be a problem as I expect the database to be large (~10-100 GB).

Sometimes, "cleanizing" the data will just involve replacing the occasional outlier, presumed to be an error, with a best estimate, which hopefully better reflects the real value. Thus, the processed data table need not contain much data - it could be very sparse.

Any more suggestions about how to efficiently do this?

Quote:
Originally Posted by mike_bike_kite
Do the following:
  • Load the input data into a feed table
  • Put a date on these records and copy them to a raw data table that holds all the raw data for each day.
  • Process the feed data as you need to.
  • Put the processed data with the date into a processed data table.
  • Delete any ancient data you no longer need.
You will do your reporting etc on the processed data table. It's not difficult but I'd suggest that you get someone who knows what they're doing to write this for you - it will save you a lot of time.

Mike
Reply With Quote
  #4 (permalink)  
Old 07-05-09, 07:25
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,517
Quote:
Originally Posted by mwangi
Any more suggestions about how to efficiently do this?
My reply was an efficient solution to your original problem. Now you've supplied more information I could suggest having a table only containing the records that you've processed. This will affect performance when selecting the data ie you'll now need to select all the raw data where there isn't a replacement "processed" record and then add all the processed records (this can be done in one select if you like).
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On