By creating two tables, would I not use twice the memory? This could be a problem as I expect the database to be large (~10-100 GB).
Sometimes, "cleanizing" the data will just involve replacing the occasional outlier, presumed to be an error, with a best estimate, which hopefully better reflects the real value. Thus, the processed data table need not contain much data - it could be very sparse.
Any more suggestions about how to efficiently do this?
Quote:
|
Originally Posted by mike_bike_kite
Do the following: - Load the input data into a feed table
- Put a date on these records and copy them to a raw data table that holds all the raw data for each day.
- Process the feed data as you need to.
- Put the processed data with the date into a processed data table.
- Delete any ancient data you no longer need.
You will do your reporting etc on the processed data table. It's not difficult but I'd suggest that you get someone who knows what they're doing to write this for you - it will save you a lot of time.
Mike
|