I wonder whether anyone has any experience or advice on my problem.
I am designing an analysis tool to monitor the performance of about 200 data extract files as they are loaded into a Data Warehouse.
The key feature is to adjust the loading sequence automatically if specific extract files arrive early or late by identifying the dependencies on the missing files.
I will be very grateful for any suggestions or advice.
You have many extract files that arrive at various points in times. These files are to be processed, sometimes in specified order.For example if FileA has to be processed before FileB can be processed, then, if FileB arrives first, it must not be processed.
The tool that you are creating should wait until FileA arrives and is processed successfully.
This is essentially a workflow problem. Many vendors provide software solutions for these types of situations.
If you are developing in-house solutions, then the process that runs FileB must check if FileA has been processed before starting work on FileB.
In a Unix environment, you could create a shell script that does this. In a loop, it could check if FileA has been processed. If yes, process FileB. If not, "sleep" for some time and then repeat until FileA has been processed.
More complicated situations can be handled in a similar manner. Alternatively, you could use your database management system's scheduler to accomplish the same task.
Was that what you were looking for? Or did I misunderstand your problem?
You have understood the problem very well.
My current situation is that I have a Data Dictionary in an Access Database.
I use this to generate UNIX Shell and SQL Scripts to control the Loading into a Data Warehouse, using Informatica and Oracle.
Now I am getting hit with many more files than I had been told to expect so I am having trouble loading all the files in the overnight window.
Therefore I would like to add the ability to evaluate several options for sequencing and loading files, based on performance stats taken from actual log files.
Then I would like to automatically generate the Scripts to load the files in the sequence I have decided is optimal.
Of course, this might change every night.
I now want to add functionality for this kind of Analysis Engine on top of my Scripting Engine.