Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Nov 2003
    Posts
    65

    Question Unanswered: comparing 2 files

    Hi,

    Lets say i have 2 files file1 and file2. Each file has rows of data, possibly dilimented by "|". Each row has an account or name and other data and also a date. I want to get that name and date in file1 and check to see if it exists in file2. If it does delete it from file1.
    If it doesn't exist in then copy that row to file2 and then delete from file1.
    The last If i might not do, i'll just end up with one large file and one small file instead of 2 large files with duplicate data if any.

    Any help on this would be appreciated.

    Thanks in advance!

  2. #2
    Join Date
    Jul 2003
    Location
    Calcutta, India
    Posts
    42
    Let me get the beter understanding of your query. You want to copy the records from file1 to file2 in case file2 doesnot have the record and then delete the record from file1? Is it correct? In case the file2 contains the record you simply want to delete the record from file1? is it correct?
    If answer to both the above question is correct, you will end up with only 1 file i.e. file2 and file1 will have no records. The logic is when you compare the records it will either yeild TRUE (record exists in file2) or FALSE (record does not exists in FILE2). If TRUE, simply delete the record from FILE1 otherwise (i.e. FALSE) Copy the record to FIle2 and then delete from File1. You are deleting the records from file1 in both the cases, So at the end of the script you will left with only one file i.e. File2 and File1 will not have any record.
    Let me know also that you want to do this in PERL script or Unix Script? Also How frequently you will be doing this? If not very frequently they I suggest you to use Unix scripting with the following 2 lines of code

    cat file1 >> file2 #Copy all the records from file1 to file2
    sort -u < file2 > file2.tmp #Sort the file2 uniquely and copy it to tmp fl
    mv file2.tmp file2 #Move the temp file to original file.

    Let me know if I am wrong or you need something else.

    Thanks
    adityanlal

  3. #3
    Join Date
    Nov 2003
    Posts
    65
    Your are correct with your logic. I will end up with just one file at the end of the process.
    I will be needing this in a perl script. I wish i could use unix. But we haven't moved over to unix yet, to be honest don't know if we ever will.
    This will be a monthly used script run through a batch process i create.

    Thanks so much for your help on this matter.

    Originally posted by adityanlal
    Let me know also that you want to do this in PERL script or Unix Script? Also How frequently you will be doing this? If not very frequently they I suggest you to use Unix scripting with the following 2 lines of code

    cat file1 >> file2 #Copy all the records from file1 to file2
    sort -u < file2 > file2.tmp #Sort the file2 uniquely and copy it to tmp fl
    mv file2.tmp file2 #Move the temp file to original file.

    Let me know if I am wrong or you need something else.

    Thanks
    adityanlal

  4. #4
    Join Date
    Jul 2003
    Location
    Calcutta, India
    Posts
    42
    Say File1 and File2 has data in the following format:
    Account|Date|Data1|Data2

    Now, we need to open the file1, read each line and foreach line read from file1, compare the data with file2. There can be 2 possibilities (1) File2 has the data - Delete the data from file1 (2) File2 doesnot have the data - Copy the data from file 1 and delete it from file 1. Here is the Script:

    open (FILE1,"< /home/usr/file1.dat);
    while ($rec1 = <FILE1>) {
    chop ($rec1);
    open (FILE2, "< /home/usr/file2.dat");
    while ($rec2 = <FILE2>) {
    chop ($rec2);
    $found = 0;
    if ($rec1 =~ $rec2) {
    $found = 1;
    last;
    }
    close FILE2_ORG;
    if ($found == 0) {
    open (FILE2, ">> /home/usr/file2.dat");
    print FILE2 $rec1;
    close FILE2;
    }
    }

    }

    The indenting mis not good bear with me.

    Let me know if you require any more help or if it is not working let me know.

    Thanks
    adityanlal

  5. #5
    Join Date
    Nov 2003
    Posts
    65
    Thanks for the fast reply. But i was wondering if this would work lets say if i had to do a datecheck .. dont move it over if the date in file1 is not the same date as in file2. Basically can this script be written to do a date check instead of the whole record being the same.

  6. #6
    Join Date
    Jul 2003
    Location
    Calcutta, India
    Posts
    42
    You need to use the function "substr", The syntax is
    <Scalar Variable> = substr(<Scalar Variable, OFFSET, LENGTH)

    You can take out the date part from the org. record for both the files and then compare the 2 date and do accordingly.

    I hope this will solve ur problem..

    Thanks
    adityanlal

  7. #7
    Join Date
    Nov 2003
    Posts
    65
    Thanks!
    That helped alot but i did not understand the need for the file2_org ?
    I replaced it with file2 and it worked. But now i get no data in any file.
    Both files get deleted.
    I modified a little bit so i give it a file1 and file2 at the prompt.

    Any ideas y both files end up with no data?

    Thanks again for helping me.

    Originally posted by adityanlal
    You need to use the function "substr", The syntax is
    <Scalar Variable> = substr(<Scalar Variable, OFFSET, LENGTH)

    You can take out the date part from the org. record for both the files and then compare the 2 date and do accordingly.

    I hope this will solve ur problem..

    Thanks
    adityanlal

  8. #8
    Join Date
    Nov 2003
    Posts
    65
    Ok. After some tweaking i got it work by creating a temp file. How do i stop it from duplicating. i see each record like abunch of times. A file that had 15 records is now like a file with over 100+ very interesting.
    This is my code:

    $inputFile1 = shift @ARGV;
    $inputFile2 = shift @ARGV;
    $outFile2 = ">>"."Temp\.csv";

    open (FILE1,"<$inputFile1");
    while ($rec1 = <FILE1> )
    {
    chop ($rec1);
    open (FILE2, "<$inputFile2");
    while ($rec2 = <FILE2> )
    {
    chop ($rec2);
    $found = 0;
    if ($rec1 =~ $rec2)
    {
    print "Match Found!\n";
    $found = 1;
    last;
    }
    close FILE2_ORG;
    if ($found == 0)
    {
    ##open (FILE2, ">>$inputFile2");
    open(FILE3, $outFile2)
    print FILE3 $rec1."\n";
    close FILE3;
    }
    }
    }
    print "Done!\n";


    Let me know if im doing something wrong?!

    Thanks,

    Originally posted by adityanlal
    You need to use the function "substr", The syntax is
    <Scalar Variable> = substr(<Scalar Variable, OFFSET, LENGTH)

    You can take out the date part from the org. record for both the files and then compare the 2 date and do accordingly.

    I hope this will solve ur problem..

    Thanks
    adityanlal

  9. #9
    Join Date
    Jul 2003
    Location
    Calcutta, India
    Posts
    42
    I will try to check the code tomorrow and will give u the complete working code. Today is the EOB for me so I am going home.

    Thanks
    adityanlal

  10. #10
    Join Date
    Nov 2003
    Posts
    65
    Hey there...

    Any luck with the code?!

    Originally posted by adityanlal
    I will try to check the code tomorrow and will give u the complete working code. Today is the EOB for me so I am going home.

    Thanks
    adityanlal

  11. #11
    Join Date
    Feb 2004
    Posts
    4
    If FILE1 is gonna be small, and FILE2 is where things will wind up being large, how about this:

    Code:
    my $inputFile = shift @ARGV;
    my $destinationFile = shift @ARGV;
    
    open (UPDATES,"<$inputFile");
    chomp ( my @rec1 = <UPDATES> );
    close UPDATES;
    
    # use "+<" to open for input AND for output
    open (DESTINATION, "+<$destinationFile");
    while ($rec2 = <DESTINATION> ) 
    {
    	chop ($rec2);
    	foreach my $ix ( 0 .. $#rec1 ) {
    		# use your own "sub my_match { ... }"
    		if ( &my_match( $rec2, $rec1[ $ix ] ) {
    			print "Match Found for '$rec2'\n";
    			delete $rec1[$ix];
    			last;
    		}
    	}
    }
    # now we append the ones that weren't found, to file2:
    print DESTINATION map { "$_\n" } @rec1;
    close DESTINATION;
    print "Done!\n";
    does that work? (we read from file2 until we're at the end, then we append output to that very same channel... in theory.)

    of course, i heavily recommend a real live database for this kind of data massagement. (postgresql, perhaps -- works like a champ even on windows -- see http://google.com/search?q=cache:ACi...wnload+windows for details.)

  12. #12
    Join Date
    Nov 2003
    Posts
    65

    Cool

    Hey thanks for the reply. I will test this out and let you know what comes of it. As for using otehr progs. I'm actually trying to clean 2 datafiles i get but they come with duplicate data. I thought i try to get all those out before i load them into a database.
    Which sounds like a good idea ... but you never know.
    Thanks again.

  13. #13
    Join Date
    Nov 2003
    Posts
    65
    Well i tried running but getting errors on "delete" any suggestions on a replacement for that .. unlink works for files , think that will work for arrays?

  14. #14
    Join Date
    Feb 2004
    Posts
    4

    Lightbulb

    Originally posted by llccoo
    Well i tried running but getting errors on "delete" any suggestions on a replacement for that .. unlink works for files , think that will work for arrays?
    according to "perldoc -f delete" you can use delete to zap array items, just as we've got in the code. what kind of error do you get? (cut and paste, if possible)

    BUT

    if you're just looking to eliminate duplicates in several input files before plopping them into a database, you may as well use perl's hashing techniques:

    Code:
    my %data = ();
    while (<>) { # iterate through cmd-line @ARGV
      my $stuff = &munge( $_ ); # line of text from one of our files
      $data{$stuff}++;
    }
    have your &munge() sburoutine take the line of text and return all pertinent data, which will be used as a key in the %data hash. once you're out of the loop, you can step through the (unique!) keys and break them up into data columns for database insertion.
    Code:
    sub munge {
      my $line = shift;
      chomp($line);
      # process input so it's sensible
      # yada yada
      return $line; # string, used as hash key, for uniqueifying
    }
    also see "How can I remove duplicate elements from a list or array?" and smiilars, under "perldoc -f perlfaq4"

  15. #15
    Join Date
    Nov 2003
    Posts
    65
    i get the following error:

    Code:
    I:\perl -c FileCompare3.pl
    syntax error at FileCompare3.pl line 29, near "delete"
    syntax error at FileCompare3.pl line 33, near "}"
    FileCompare3.pl had compilation errors.
    the only thing i can see is that we are accessing a array and the perldoc shows a hash example:
    Code:
     foreach $key (keys %HASH) {
         delete $HASH{$key};
     }
    probably syntax issue but i'm blind as to what it could be.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •