Results 1 to 3 of 3
  1. #1
    Join Date
    Mar 2004
    Posts
    2

    Unanswered: Delete identification Process

    Hi,

    We are planning to use an ETL tool(most
    probably Informatica) to load data from
    flat files to more Generic Database.

    So i have one general question. Is
    there any way I can identify the
    deletion of records from the database
    which is not there in the flat file.

    This is how it goes..I have around 14
    flat files where the data is related
    within these data files. Lets take an
    eg of Movie.data and Award.dat file. In
    this one movie is associated to n
    number of awards. Now the provider has
    given yesterday 4 awards are associated
    to one movie, i have linked these 4
    awards to the movie in the database.
    Now today he has given only 3 awards
    are linked to that movie. Now I have to
    identify the extra award which is
    linked to that movie in the database
    and knock of that record. The same way
    I got come complicated relation between
    these 14 flat files. And the provider
    doesnt provide any information like
    this record has to get deleted or
    something like that.

    So I need to know is there any general
    pattern I can use to identify the
    deletion of these records.

    looking for your valuable suggestions.

    Thanks
    Daison

  2. #2
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    If I understand the proposed design (which I don't think is a good one)
    one straightforward solution would be to TRUNCATE all 14 tables at
    the start of the load process.

  3. #3
    Join Date
    Mar 2004
    Posts
    2
    Hi,

    We cannot delete the records permanently from the DB. We have to flag the records because business want to know abt the deleted records and also we have to carry these deletion to the downstream systems also.

    Considering millions of records which is getting periodically refreshed, we cannot truncate the table and load the data daily. This will have a serious performance issue as well as higly complex for the downstram system to figure out what has changed what not.

    So we are trying to figure out a general pattern in which we can identify the deletes.

    One solution is that, we can do a UNIX diff and find the difference of the files which has chnaged and load it into a seperate table and finally find the difference. But considering the volume of the data and the number of tables. In our case we are having around 6 providers giving these kinds of different content and together its comes around 70-80 flat files.

    So I am wondering what is the general approach in DataWarehouse to handle these kind of scenarios.


    Thanks
    Daison

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •