If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Differential inserts

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-18-10, 14:11
rocker86 rocker86 is offline
Registered User
 
Join Date: Jul 2009
Posts: 37
Differential inserts

Hi,
I have possibly a few hundred thousand entries from the user coming in. In the most usual case, only 10% would be changes/updates to existing entries. Rest would be exact copies of existing stuff. So the two ways to do this I could think of were:

1) Delete all existing rows, insert all fresh
2) Do a comparison of each incoming row, figure out what to update, if does not exist, do a create.

Now both of these options seem far from ideal to me. Can anyone help me out with a really neat way to figure out the delta and maybe do a bulk update followed by create?

thanks
Reply With Quote
  #2 (permalink)  
Old 10-18-10, 15:58
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Load all your imported data into staging tables.

Delete records from your staging tables based upon existing natural keys in your production tables.

Process what is left.

By the way, are you only dealing with new records, or do you need to be concerned about modifications to existing records as well?
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #3 (permalink)  
Old 10-18-10, 17:05
rocker86 rocker86 is offline
Registered User
 
Join Date: Jul 2009
Posts: 37
thanks for the quick reply!

as far as the incoming rows, there would be updates+fresh creates all in the same set. However, the use case we know would mean that on a normal basis, only 10% would be updates, and only 4-5% would be new entries
Reply With Quote
  #4 (permalink)  
Old 10-18-10, 18:31
Teddy Teddy is offline
Purveyor of Discontent
 
Join Date: Mar 2003
Location: The Bottom of The Barrel
Posts: 6,075
Are there performance requirements you need to comply with? You have the usual suspects on the table already, truncate and repopulate (easy, reasonably quick, involves some potential downtime), party on production in real time while processing records (only real reason to do this is if there's some constraint preventing you from using staging tables), and using blindman's suggestion of using staging tables to shape your data and the process the stuff you care about.

The major unknown so far is how expensive it is to compare your incoming data with your existing data. That could possibly be cost prohibitive resources wise, assuming you have tolerance for temporarily blowing out the data with a truncate/restore. If you don't have that tolerance, then you don't get much choice...
__________________
oh yeah... documentation... I have heard of that.

*** What Do You Want In The MS Access Forum? ***
Reply With Quote
  #5 (permalink)  
Old 10-18-10, 18:43
rocker86 rocker86 is offline
Registered User
 
Join Date: Jul 2009
Posts: 37
So we do have the concept of staging tables and are in fact storing everything that comes in there.
Wouldn't truncating+inserting be really expensive since for like a 100,000 rows, if there were only 20 with updated values and 10 with new rows, I would be deleting 100,000 and then inserting all those again!
On the other hand, doing a comparison would mean figuring out the deltas for each row in staging? The result of which would be to do an update in case there is a change in value for an existing record or do a create in case none exists. I see none of these to be neat for the small % of deltas we expect. In a case were a better solution does not exist, I would love to know which of the two methods is relatively better?

thanks again!
Reply With Quote
  #6 (permalink)  
Old 10-19-10, 09:27
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Quote:
Originally Posted by rocker86 View Post
as far as the incoming rows, there would be updates+fresh creates all in the same set. However, the use case we know would mean that on a normal basis, only 10% would be updates, and only 4-5% would be new entries
"The use case we know"?
"only 10% would be updates"?
I can tell you are new at this, but you seem like a quick learner.
Rule Of (Database) Application Development #1: Code for 100%. Not 90%. Not 95%.
100%.
Rule Of (Database) Application Development #2: Don't design around "Use Cases". Design around your Business Model.
Use Cases are for developers, who spend more time debating what font style and background color to use than they do thinking about entities and relationships. Design your database around Use Cases in iteration #1, and I guarantee (or your money back) that it will be woefully unable to support the Use Cases in iteration #2.

But back to your problem...
1) Load the data into staging tables.
2) Cleanse the staging data (consistent use of nulls, zero-length strings, default values, etc).
3) Verify the staging data. (relational integrity, datatypes and domains, etc.) Flag any records that fail.
4) Update existing records.
5) Add new records.
6) Delete all successfully processed staging data.

I strongly advise against deleting existing data an reimporting it. You will never be able to store additional related data in your database that is not in your source data, and it will play havoc with relational integrity.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On