Results 1 to 5 of 5
  1. #1
    Join Date
    Apr 2003
    Location
    London, England
    Posts
    42

    Performance of loading a Data Warehouse

    I wonder whether anyone has any experience or advice on my problem.

    I am designing an analysis tool to monitor the performance of about 200 data extract files as they are loaded into a Data Warehouse.
    The key feature is to adjust the loading sequence automatically if specific extract files arrive early or late by identifying the dependencies on the missing files.

    I will be very grateful for any suggestions or advice.

    Barry

  2. #2
    Join Date
    May 2004
    Posts
    1
    The best solution is to define a small database (you may use the warehouse) with tables that describe the required files and the dependencies. Then you just have to write several queries.

  3. #3
    Join Date
    Apr 2003
    Location
    London, England
    Posts
    42

    Thanks but ...

    I have already done what you suggested - what's your next suggestion ?

    Barry

  4. #4
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi,

    Let me see if I understand the problem.

    You have many extract files that arrive at various points in times. These files are to be processed, sometimes in specified order.For example if FileA has to be processed before FileB can be processed, then, if FileB arrives first, it must not be processed.

    The tool that you are creating should wait until FileA arrives and is processed successfully.

    This is essentially a workflow problem. Many vendors provide software solutions for these types of situations.

    If you are developing in-house solutions, then the process that runs FileB must check if FileA has been processed before starting work on FileB.

    In a Unix environment, you could create a shell script that does this. In a loop, it could check if FileA has been processed. If yes, process FileB. If not, "sleep" for some time and then repeat until FileA has been processed.

    More complicated situations can be handled in a similar manner. Alternatively, you could use your database management system's scheduler to accomplish the same task.

    Was that what you were looking for? Or did I misunderstand your problem?

    Ravi

  5. #5
    Join Date
    Apr 2003
    Location
    London, England
    Posts
    42

    This is my DW Data Loading problem ...

    You have understood the problem very well.
    My current situation is that I have a Data Dictionary in an Access Database.
    I use this to generate UNIX Shell and SQL Scripts to control the Loading into a Data Warehouse, using Informatica and Oracle.
    Now I am getting hit with many more files than I had been told to expect so I am having trouble loading all the files in the overnight window.
    Therefore I would like to add the ability to evaluate several options for sequencing and loading files, based on performance stats taken from actual log files.
    Then I would like to automatically generate the Scripts to load the files in the sequence I have decided is optimal.
    Of course, this might change every night.
    I now want to add functionality for this kind of Analysis Engine on top of my Scripting Engine.

    I would appreciate any advice or suggestions.

    Barry

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •