We are planning to use an ETL tool(most
probably Informatica) to load data from
flat files to more Generic Database.
So i have one general question. Is
there any way I can identify the
deletion of records from the database
which is not there in the flat file.
This is how it goes..I have around 14
flat files where the data is related
within these data files. Lets take an
eg of Movie.data and Award.dat file. In
this one movie is associated to n
number of awards. Now the provider has
given yesterday 4 awards are associated
to one movie, i have linked these 4
awards to the movie in the database.
Now today he has given only 3 awards
are linked to that movie. Now I have to
identify the extra award which is
linked to that movie in the database
and knock of that record. The same way
I got come complicated relation between
these 14 flat files. And the provider
doesnt provide any information like
this record has to get deleted or
something like that.
So I need to know is there any general
pattern I can use to identify the
deletion of these records.
We cannot delete the records permanently from the DB. We have to flag the records because business want to know abt the deleted records and also we have to carry these deletion to the downstream systems also.
Considering millions of records which is getting periodically refreshed, we cannot truncate the table and load the data daily. This will have a serious performance issue as well as higly complex for the downstram system to figure out what has changed what not.
So we are trying to figure out a general pattern in which we can identify the deletes.
One solution is that, we can do a UNIX diff and find the difference of the files which has chnaged and load it into a seperate table and finally find the difference. But considering the volume of the data and the number of tables. In our case we are having around 6 providers giving these kinds of different content and together its comes around 70-80 flat files.
So I am wondering what is the general approach in DataWarehouse to handle these kind of scenarios.