I have a local server with a linked server configured which contains the live data. The local server will run the stored procedure on a scheduled job to harvest the data from the linked server.
INSERT INTO local_server...local_table
SELECT * FROM linked_server...linked_table AS lnk
WHERE NOT EXISTS
(SELECT * FROM local_server...local_table AS loc
WHERE loc.key = lnk.key)
I have read there are some pitfalls to moving data across linked servers.
Will I have any problems with this type of transfer and if so, are there any alternatives?
Yeah, I could see it becoming slow especially as my archive tables get larger because it's going to have to check for existence on a larger and larger set of records each time.
All the tables do have a timestamp. My live (remote) server gets one new record per 32 tables about every minute. Let's say I run a job every hour on my local server to archive new records off the live server (Keep in mind the live server will only be cleaned once per week). I want to minimize the redundant comparisons of the 'where not exists.' If I have a weeks worth of data in my live table and the job on the archive db runs every hour, then I only want to pull new records from the last hour. The job would only consider records with a timestamp newer than one hour before the current time. I see where you're going with this and I've thought about it before.
What worries me is if the job happens to fail for some reason, then my archive server will have missed records for an hour or more. The redundant 'where not exists' would always catch back up if anything had been missed. I suppose before I clean the live database on the weekend during downtime, I could run the redundant 'where (not) exists' to see if anything was missed. It shouldn't matter how long the processing would take then.
The simplest idea I have is to add an archive bit to my 32 live tables. Then it will be easy to tell what's been archived or not and cut down on processing time. What I don't like is having to include the unused archive bit on my archived tables. Some of my tables have hundreds of fields so it would be too time consuming to specify individual fields like 'select field1, field2, field3,...,field100 when it's much easier like 'insert into archive_table select * from live_table where live_table.archived = 0'
OK, heres what I think, and this is a pretty robust solution I have implemented frequently.
Don't pull directly from your production tables on the remote server. Add INSERT/UPDATE/DELETE triggers on those tables that copy changes to to delta tables for staging. The delta tables should have an additional bit column so that they can be flagged for transfer, and this bit value should initially be set to zero.
Your local server then links to these delta tables and performs the following three actions:
1) Update the transfer flags of all records to 1.
2) Transfer all data where the transfer flag = 1 to the local server's tables.
3) Delete all records from the remote delta table where the transfer flag = 1.
The transfer flag is necessary to handle any new records that come into the delta table while the process is running. You do not want those inadvertently deleted at the end of the process.
The cool thing about this method is that if for some reason your connection fails, or your local server crashes, the remote server continues merrily on loading data into the delta tables, and the data will be automatically processed when the system is up and running again. Also, no locking or stress on your production tables while the transfer is occuring.
If it's not practically useful, then it's practically useless.