Results 1 to 6 of 6
  1. #1
    Join Date
    Oct 2003
    Location
    Ireland
    Posts
    9

    Unanswered: Find and replace duplicate names in a file?

    Hi there,
    I have a piece of code that reads in a nested parenthesis tree file that contains duplicate names. I want to add a number to the end of each name in the tree in order to get rid of these duplicates. When i find the name i want to add a number to it only adds a number to the first duplicate name and not any of the other instances of it. Is there any way i can change the code to move on to the next name each time or is there a better way of writing code to find and replace duplicates? Any help greatly appreciated. I'm adding the code below. Apologies for the state of it (i am a REAL beginner)!!
    Many thanks in advance,
    Rho

    #!/usr/bin/perl -w


    use warnings;

    open (INPUT, $ARGV[0]); # Where $ARGV[0] is a list of tree file names
    $count = 0;
    while (<INPUT>) {
    open (FILE, $_);
    open (OUTPUT, ">>Trees.out");

    while (<FILE>) {
    #Read in each tree from each file.
    if ($_ =~ /^\(/) {
    #Extract the tree and save it to a variable.
    chomp($_);
    $_ =~ s/^\s*//;
    $tree = $_;
    $tree1 = $tree;

    while ($tree =~ /(\w[\w'-]*)/g) {
    print "$1\n";
    $Rho = $1;



    $tree1 =~ s/$Rho/$Rho$count/;
    $count++;
    }



    }
    print "$tree1\n";

    }
    }

    close (OUTPUT);
    }

  2. #2
    Join Date
    Nov 2003
    Posts
    65

    Smile

    I would suggest using booleans, that way you will be able to only add to the first instance you see and not the following. Something like $found = 1 or 0
    Where 1=true and 0=false or vice versa.

    Quote Originally Posted by Rho
    Hi there,
    I have a piece of code that reads in a nested parenthesis tree file that contains duplicate names. I want to add a number to the end of each name in the tree in order to get rid of these duplicates. When i find the name i want to add a number to it only adds a number to the first duplicate name and not any of the other instances of it. Is there any way i can change the code to move on to the next name each time or is there a better way of writing code to find and replace duplicates? Any help greatly appreciated. I'm adding the code below. Apologies for the state of it (i am a REAL beginner)!!
    Many thanks in advance,
    Rho

    #!/usr/bin/perl -w


    use warnings;

    open (INPUT, $ARGV[0]); # Where $ARGV[0] is a list of tree file names
    $count = 0;
    while (<INPUT>) {
    open (FILE, $_);
    open (OUTPUT, ">>Trees.out");

    while (<FILE>) {
    #Read in each tree from each file.
    if ($_ =~ /^\(/) {
    #Extract the tree and save it to a variable.
    chomp($_);
    $_ =~ s/^\s*//;
    $tree = $_;
    $tree1 = $tree;

    while ($tree =~ /(\w[\w'-]*)/g) {
    print "$1\n";
    $Rho = $1;



    $tree1 =~ s/$Rho/$Rho$count/;
    $count++;
    }



    }
    print "$tree1\n";

    }
    }

    close (OUTPUT);
    }

  3. #3
    Join Date
    Jun 2004
    Location
    Nowhere Near You
    Posts
    89
    Rho,

    When looking for duplicates, think in terms of a hash where your "word" is the key. Bump the value!

    Code:
    my(%h_Counts);
    while (<IN>) {
      chomp;
      foreach $s_Word (split(/\s+/)) {
        if (! exists($h_Counts{$s_Word})) { # $s_Word has not been previously encountered
           }
        elsif ($h_Counts{$s_Word} == 1) { # $s_Word has been previously encountered once
           }
        else { # $s_Word has been previously encountered $h_Counts{$s_Word} times
           };
        $h_Words{$s_Word}++;
         };
       };
    If case is immaterial then you can upper case your word (or just upper case the line as it comes in). Of course you can always use Tie::Case.

    Perhaps this helps?

  4. #4
    Join Date
    May 2004
    Posts
    28
    I needed to do exactly this. Here's how I did it:

    Code:
    for ($i=0;$i<$count;$i++)
    {
    	@dup[$i]=1;
    	for ($j=0;$j<$count;$j++)
    	{
    		if ($i == $j)
    		{
    			$j = $j + 1;
    		}
    
    		if (@user_name[$i] eq @user_name[$j])
    		{
    			@user_name[$j] = @user_name[$j]."@dup[$i]";
    			@dup[$i] = @dup[$i] + 1;
    		}				
    	}
    }
    Let me explain it a little. Basically, it will go through the array @user_name and anytime it finds a duplicate name, it will attach 1,2,3 and so on to the end of each name, depending on how many times the name has shown up as a duplicate. So this code is basically set up to handle and infinite number of duplicates.

    So all you need to do is get your data in an array and then you can use my code

  5. #5
    Join Date
    Jun 2004
    Location
    Nowhere Near You
    Posts
    89
    Is this what you want?

    Code:
    #!/usr/bin/perl -w
    
    use Strict;
    # Our test string 
    $_="this is a test of the early warning system! if this were not a test, you would already be dead! early is relative!";
    
    # split $_ into words and count their occurrences
    my(%h_WordCounts,$s_Word);
    foreach $s_Word (split(/\W+/,$_)) {
      $h_WordCounts{$s_Word}++;
       };
    
    # If a word say $s_Word appears more than once then $h_WordCounts{$s_Word} will be greater than 1, so 
    my($count,$s_Counter);
    foreach $s_Word (grep {$h_WordCounts{$_} > 1} keys %h_WordCounts) {
      $s_Counter++;
      # The first time a word occurs, it is unchanged after that it get $s_Counter as a suffix
      $count=0; s{\b$s_Word\b}{++$count == 1 ? $s_Word : $s_Word.$s_Counter}gex;
       };
    print ;
    When fed this
    "this is a test of the early warning system! if this were not a test, you would already be dead! early is relative!"
    It will return this
    "this is a test of the early warning system! if this5 were not a1 test2, you would already be dead! early4 is3 relative!"

  6. #6
    Join Date
    Jun 2004
    Location
    Nowhere Near You
    Posts
    89
    Code:
    #!/usr/bin/perl -w
    use Strict;
    
    $_="this is a test of the early warning system! if this were not a test, you would already be dead! early is relative!\n";
    
    print;
    
    # split and stuff into a hash
    my(%h_WordOccurs,$s_Word);
    foreach $s_Word (split(/\W+/,$_)) {
      $h_WordOccurs{$s_Word}++;
       };
    
    # If a word say $s_Word appears more than once then $h_WordOccurs{$s_Word} will be greater than 1 so
    my(@a_Words)=sort grep {$h_WordOccurs{$_} > 1} keys %h_WordOccurs; # NB: the sort is only for our amusement
    
    # Create a hash which will assigns the "label" to the "word"
    my(%h_WordToLabel);
    @h_WordToLabel{@a_Words}=(1..@a_Words);
    
    # Create a string for the regex
    my($s_Words)=join("\|",@a_Words);
    
    # We need a hash to store the number of times that a word has appeared at "this point in the string"
    my(%h_WordHasOccurred);
    
    # Now we do our substitutions:
    s{\b($s_Words)\b}{(++$h_WordHasOccurred{$1}) == 1 ? "$1($h_WordToLabel{$1})" : "($h_WordToLabel{$1})"}gex;
    
    print ;
    Transforms this:
    "this is a test of the early warning system! if this were not a test, you would already be dead! early is relative!"
    to this:
    "this(5) is(3) a(1) test(4) of the early(2) warning system! if (5) were not (1) (4), you would already be dead! (2) (3) relative!"

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •