Unanswered: Re: Updating a database based on another database
The following is my dilemma:
I have two databases. One DB2 named ServerA and another Mysql named ServerB. I would like to take a certain number of tables from ServerA and check to see if there are any differences in the fields from the same tables in ServerB. If there are differences than update ServerB with the changes. Can anyone tell me how I can approach this rather than checking each row and checking if they are identical etc. Please note the tables are ENORMOUS in size. Also just to clarify, no I cannot access the DB2 server directly in an app etc so have to approach it this way.
You're going to need to do this in a program, and most probably row by row because there is no common interface (that I am aware of) between DB2 and MySQL. i.e. you cannot do SELECT * FROM MySQL.table WHERE NOT EXISTS (SELECT * FROM DB2.table)
You'll need a program but you can avoid looking at every row by adding a timestamp field to both tables. Update the timestamp on every insert and update. Then periodically run the following process to sync the tables.
Extract all records where timestamp greater or equal to LAST_DONE.
Extract these records out into a file.
Load this file into a transfer table on the other database.
Update or insert these records into the main table on the other database.
Set LAST_DONE = max timestamp in your transfer table.
Clear the transfer table.
The above should be pretty quick to run and, if run every 30min, would keep the two tables reasonably closely in sync. Obviously this won't cover deletion of records but, with a little imagination, you could code for that. You might want to have a maximum number of records transferred ie 10k - which might allow you to drip feed the data in initially - there would be issues if many records have the same timestamp though.
I've used the above process to keep complete databases (of different types) in sync and it works reasonably well. You'll also want to monitor the process to make sure everything is still running.
My first thoughts are - copy TableA from ServerA to a temporary table on ServerB. Then your comparisons can simply be done via SQL with the utilization of the correct JOIN implementation ... Obviously, the flaw in this method is ... the tables are ENOURMOUS
Which is why this method should best be ignored
I don't agree that you'll need a program
So apart from being extremely slow and inefficient - how would this method run without a controlling program?
After reading Mikes post however, I must raise one question. Does this process need to be running real-time or would it be acceptable to have the data checked every hour/day/week*
The method I proposed was to be run periodically which means it could be run at any frequency the user wants (5 min, 30 min, an hour, a day ... ). Obviously you'll want an index on the timestamp field to ensure good performance. If you do need the data in sync closer than a minute then we might need to rethink.
I agree - we do need some clarification on how closely the tables need to be synchronized (ie within the minute, hour, day) and it would be good to have a real value for enormous - standard definitions for enormous are:
enormos < 10K rows
Enormous < 100K rows
ENORMOUS > 5m
"it works reasonably well" just meant it did the job as required and didn't ever fall over.
If you need to do more than one table then store the LAST_DONE timestamps in a table along with the names of the tables being transferred. The chief advantage of this approach is that it doesn't matter at all if the program or either server dies as nothing will ever be lost. Also please note I should of put the clear transfer table at the start of the process rather than at the end.