Quote:
Originally Posted by rocker86
as far as the incoming rows, there would be updates+fresh creates all in the same set. However, the use case we know would mean that on a normal basis, only 10% would be updates, and only 4-5% would be new entries
|
"The use case we know"?
"only 10% would be updates"?
I can tell you are new at this, but you seem like a quick learner.
Rule Of (Database) Application Development #1: Code for 100%. Not 90%. Not 95%.
100%.
Rule Of (Database) Application Development #2: Don't design around "Use Cases". Design around your Business Model.
Use Cases are for developers, who spend more time debating what font style and background color to use than they do thinking about entities and relationships. Design your database around Use Cases in iteration #1, and I guarantee (or your money back) that it will be woefully unable to support the Use Cases in iteration #2.
But back to your problem...
1) Load the data into staging tables.
2) Cleanse the staging data (consistent use of nulls, zero-length strings, default values, etc).
3) Verify the staging data. (relational integrity, datatypes and domains, etc.) Flag any records that fail.
4) Update existing records.
5) Add new records.
6) Delete all successfully processed staging data.
I strongly advise against deleting existing data an reimporting it. You will never be able to store additional related data in your database that is not in your source data, and it will play havoc with relational integrity.