Results 1 to 4 of 4
  1. #1
    Join Date
    Nov 2004
    Posts
    1

    Unanswered: regexp - merge lines that do not meet a specific criteria

    I have not done this often, I have spent some time trying to come up with a solution. I think I have the right idea but it is eluding me. File fragment to be processed:

    *****************

    10915,"S","Phil","ing Valley Middle School",$0.00,0,,0
    10916,"Tr","Ny",,,,,,"999-999-9999",,,,"715000 works at Re-Max
    Wishing for housecoat with dogs, no luck",$0.00,0,,0
    10917,"Ro","Ox","3677 As Dr","W","BC","V4T 2W5",,"1111111111",,,,,$0.00,0,,0
    10918,"Sa","Fri",,"K","BC",,,"2222222222",,,,,$0.0 0,0,,0
    .
    .
    .
    .
    10355,"Val","Woj",,,,,,"3333333333",,,,"Solutions" ,$0.00,0,,0
    10356,"Ter","Bes",,,,,,"1211211212",,,,,$0.00,0,,0
    10357,"Phi","Har",,,,,,"1231231234",,,,"6 Woodcroft Ave
    St Catns
    x1x1x1
    999-999-3203
    check the address to see if it is still her's.",$0.00,0,,0
    10358,"Ra","Gak",,"Kel",,,,"3453453456",,,,,$0.00, 0,,0
    10359,"J","Ru",,"V","BC",,,"7777777777",,,,"st ing to tell J to come again…
    (555) 899-9999",$0.00,0,,0
    10360,"Li","Sa",,"Win",,,,"4444444444",,,,"LDr Claremont",$0.00,0,,0
    10361,"Ke","Ta",,"K","BC",,,"5555555555",,,,,$0.00 ,0,,0
    10362,"Kat","son",,"V","BC",,,"6666666666",,,,,$0. 00,0,,0

    *****************

    The following script processes the file and prints the ID of the"GOOD" records:

    {
    use FileHandle;
    $fh = new FileHandle;

    $file = "D:/Data/Access Data/ModtblCustomers.txt";
    local( $/ ) ;
    open( $fh, $file ) or die "sudden flaming death\n";
    $text = <$fh>;
    while ($text =~ /(\n(\d{1,5},))/g)
    {
    print "$2 \n";
    }
    exit;
    }

    What I need to do is find each record where /\n^(d{1,5},)/ (I know this is not correct syntax, but that is part of the problem) and replace the newline with a space, therefore joining the lines, leaving me with only "GOOD" records.

    The data file is an export that can include multi-line comments. It must be processed for import.

    Any help will be greatly appreciated.

    John
    Last edited by jgatschuff; 11-26-04 at 14:46. Reason: remove sensitive data

  2. #2
    Join Date
    Jun 2004
    Location
    Nowhere Near You
    Posts
    89

    Regexp: Substitute EOL (\n) characters inside the quotations

    Hi,

    Perhaps what you really want to do is "Regexp: Substitute EOL (\n) characters inside the quotations?"

  3. #3
    Join Date
    Jan 2004
    Location
    Germany
    Posts
    167
    I Think you shouldn't work with Regex. There are several modules on CPAN that handle work with CSV-files, e.g. DBD::CSV or Text::CSV:imple. These modules avoid mistakes in parsing.
    board.perl-community.de - The German Perl-Community

  4. #4
    Join Date
    Jun 2004
    Location
    Nowhere Near You
    Posts
    89
    I presume that you really want to read in a csv file and that csv file has "quoted newlines" and or "quoted commas". So, here you go:

    Code:
    #!\user\bin\perl -w
    
    use Inline::Files;
    
    # Version 2 --- number of fields is not known
    while (@a_Field=GetCSV(@a_Field)) {
      print join(' | ',@a_Field)."\n\n";
       };
    
      sub GetCSV{
        my($s_Line,$s_Text);
        while ($s_Line=<CSV2>) {
          #print "'$s_Line'\n";
          my(@a_Field);
          $s_Text.=$s_Line;
          while ($s_Text =~ m{([^",]*)([,\n])|"((?:[^"]|"")*)"([,\n])}gs) {
            unless (substr($`,-1,1) eq '"') {
              if (defined $1) {
                push(@a_Field,$1);
                if ($2 eq "\n") {
                  return @a_Field;
                   };
                 } else {
                my($s_Field);
                ($s_Field=$3)=~s/""/"/g;
                push(@a_Field,$s_Field);
                if ($4 eq "\n") {
                  return @a_Field;
                   };
                 };
               };
             };
           };
        return ();
         };
    
    __CSV2__
    10915,"S","Phil","ing Valley Middle School",$0.00,0,,0
    10916,"Tr","Ny",,,,,,"999-999-9999",,,,"715000 works at Re-Max
    Wishing for housecoat with dogs, no luck",$0.00,0,,0
    10917,"Ro","Ox","3677 As Dr","W","BC","V4T 2W5",,"1111111111",,,,,$0.00,0,,0
    10918,"Sa","Fri",,"K","BC",,,"2222222222",,,,,$0.00,0,,0
    10355,"Val","Woj",,,,,,"3333333333",,,,"Solutions",$0.00,0,,0
    10356,"Ter","Bes",,,,,,"1211211212",,,,,$0.00,0,,0
    10357,"Phi","Har",,,,,,"1231231234",,,,"6 Woodcroft Ave
    St Catns
    x1x1x1
    999-999-3203
    check the address to see if it is still her's.",$0.00,0,,0
    10358,"Ra","Gak",,"Kel",,,,"3453453456",,,,,$0.00,0,,0
    10359,"J","Ru",,"V","BC",,,"7777777777",,,,"st ing to tell J to come again…
    (555) 899-9999",$0.00,0,,0
    10360,"Li","Sa",,"Win",,,,"4444444444",,,,"LDr Claremont",$0.00,0,,0
    10361,"Ke","Ta",,"K","BC",,,"5555555555",,,,,$0.00,0,,0
    10362,"Kat","son",,"V","BC",,,"6666666666",,,,,$0.00,0,,0
    gives
    Code:
    C:\Felix\MARS 1st\code\hacks>perl csv.pl
    10915 | S | Phil | ing Valley Middle School | $0.00 | 0 |  | 0
    
    10916 | Tr | Ny |  |  |  |  |  | 999-999-9999 |  |  |  | 715000 works at Re-Max
    Wishing for housecoat with dogs, no luck | $0.00 | 0 |  | 0
    
    10917 | Ro | Ox | 3677 As Dr | W | BC | V4T 2W5 |  | 1111111111 |  |  |  |  | $0.00 | 0 |  | 0
    
    10918 | Sa | Fri |  | K | BC |  |  | 2222222222 |  |  |  |  | $0.00 | 0 |  | 0
    
    10355 | Val | Woj |  |  |  |  |  | 3333333333 |  |  |  | Solutions | $0.00 | 0 |  | 0
    
    10356 | Ter | Bes |  |  |  |  |  | 1211211212 |  |  |  |  | $0.00 | 0 |  | 0
    
    10357 | Phi | Har |  |  |  |  |  | 1231231234 |  |  |  | 6 Woodcroft Ave
    St Catns
    x1x1x1
    999-999-3203
    check the address to see if it is still her's. | $0.00 | 0 |  | 0
    
    10358 | Ra | Gak |  | Kel |  |  |  | 3453453456 |  |  |  |  | $0.00 | 0 |  | 0
    
    10359 | J | Ru |  | V | BC |  |  | 7777777777 |  |  |  | st ing to tell J to come againŕ
    (555) 899-9999 | $0.00 | 0 |  | 0
    
    10360 | Li | Sa |  | Win |  |  |  | 4444444444 |  |  |  | LDr Claremont | $0.00 | 0 |  | 0
    
    10361 | Ke | Ta |  | K | BC |  |  | 5555555555 |  |  |  |  | $0.00 | 0 |  | 0
    
    10362 | Kat | son |  | V | BC |  |  | 6666666666 |  |  |  |  | $0.00 | 0 |  | 0

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •