We have a MRP system that uses DB2. What we would like to do is pull data from this system into SS2005 Cube. The way we currently have defined the flow of the data goes something like this;
SSIS uses an ODBC connection to connect to and pull data from DB2 into a staging environment on SS2005. Another SSIS package then pulls the data from the staging, goes through the ETL process and into a history/relational database and then onto the data mart.
The problem that we are faced with is determining a good strategy for ensuring we do not get duplicate data and or miss any of the data. We originally designed to use a flag on the DB2 side to mark each record that was processed into warehouse; unfortunately, this approach is horrendously slow due to the number of records the system pulls during each load process. The load process runs once an hour and we estimate approximately 10-15,000 records per load process. Our current approach takes approximately 2.5 - 4 hrs to complete the update progress which means the package has run several times since its initial start and bombing due to the update process not being completed.
This method has worked great for us in the past from a SQL Server to Sql Server environment but it does not seem to be the optimal choice for this particular scenario.
Is there a better approach that I am not considering? Any help would be greatly appreciated. Thank you!