Unanswered: Is there a better way to do this... (replication question)?
Summary of customer requirements:
60+ remote nodes (not managed by me) running an application with SQL 2000 (SP3) backend databases. Schemas are nominally the same (but not perfectly identical, though that's being corrected as we go). As currently written, the application will not support integrating all the databases into one node and running a single application over the web (something I suggested early on).
Customer wants all 60 nodes to feed into a central repository (for reporting purposes).
A subset of the 196+ tables in the schema are to be controlled (populated) centrally and the data propogated out. The remaining tables (the majority) are to be populated at the remote nodes and the data fed into the central repository. Remote nodes must see only a subset of the data for the non-centrally managed tables.
Updates must be fed into the central repository at least once daily. Volume is "heavy" at 3-5 nodes, "modest" at about 40 nodes and very light at the remaining. "Heavy" means 250-500 updates per hour (eight hour days, M-F); initial synchronization of 2 years of data is expected to take up to 8 hours. Connections to remote nodes are T-1 over a secure WAN. Connectivity is generally good although intermittent outages do occur and can last for several hours to a couple of days (e.g. over a weekend).
Remote nodes must have the ability to manage their databases, to include adding objects (views and custom triggers).
My proposal was to connect up the remote nodes using Merge Replication and use dynamic horizontal filters to partition the data. It seemed reasonable until I started dealing with the mechanics of Merge Replication and the nitty gritty details of synching ROWGUIDs across multiple instances which were independently populated.
Is there a better way to do this? The killer for me is keeping the synched ROWGUIDs the same across all instances while keeping the ROWGUIDs that are not supposed to be synched different (e.g., if someone clever populates a table with a SELECT * INTO statement and includes the ROWGUID column from a table where they are supposed to be different).
No, this is not a homework question, nor is it a cert question. It's a legitimate business need. I've done a fair amount of work with Merge Replication at this point, though I am by no means an expert. I've done enough to know that it works pretty well, but I know that it was not specifically designed for this situation. As I understand Merge Replication the publisher is assumed to be the "master" and the subcribers receive updates from the publisher and the process is supposed to start in the middle (the publisher) and flow out.
Here, that WILL be the case eventually, but I have to get over the hump of initial deployment where the individual nodes are independent and not fully synchronized.
Your thoughts and comments are appreciated.
I'm going home for the night...
Add: By the way, we've recently upgraded from SQL 2000 (SP4) to SQL 2005 (SP1). The improvements in Merge Replication manageability are enormous and very satisfying. Set-up configuration, monitoring and tear-down all proceed very smoothly and it's been really nice to work with. A HUGE improvement over SQL 2000 which I found to be an absolute nightmare to manage when attempting to use dynamic horizontal partitions.
Have you hugged your backup today?