Results 1 to 7 of 7
  1. #1
    Join Date
    Mar 2004
    Posts
    2

    Unanswered: Pattern Recognition and Extracting into files

    Hi,

    First post to these forums.

    Basically i need to take a file, sort it, then extract different patterns in it.

    The data is IP addresses and i want to split it up by Subnet.

    I.e. sample data hosts.log (made up)

    1.0.0.1
    1.0.0.2
    1.0.0.3
    1.0.1.4
    1.0.1.5
    1.0.1.6
    1.0.2.7
    1.0.2.8
    1.0.2.9
    (etc.etc.....)
    9.1.1.1
    9.1.2.2
    (etc.etc.....)

    Then split into

    1.0.0.log ::
    1.0.0.1
    1.0.0.2
    1.0.0.3

    1.0.1.log ::
    1.0.1.4
    1.0.1.5
    1.0.1.6

    Is this possible or alot of scripting? I dont mind how the output is renamed, but if it was by subnet's that would be great.

    I have had a look at Awk, Sed etc. etc. and have not been able to find anything that matches what i want to do.

    My apologies if this is a little OTT! :-)

  2. #2
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    Try this :

    Code:
    sort -t. -n -k1,4 ip.txt |
    awk '
    {
       subnet = $1;
       sub(/\.[^.]*$/,"",subnet)
       if (subnet != subnet_prv) {
          if (! prv_subnet) close(subnet_file);
          subnet_file = subnet ".log";
          print $1  > subnet_file;
       } else {
          print $1 >>  subnet_file;
       }
       subnet_prv = subnet;
    }
    '
    Jean-Pierre.

  3. #3
    Join Date
    Mar 2004
    Posts
    2
    Fantastic thank you.

    Now my next task is actually understanding how that just happened..

    :-)

  4. #4
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    sort -t. -n -k1,4 ip.txt|
    Sort the input file
    -t. field delimiter = '.'
    -n numeric sort
    -k1,4 sort fields 1, 2 3 4

    awk '{ ... }'
    Proceed each line from result of sort.

    subnet = $1;
    sub(/\.[^.]*$/,"",subnet)

    determine the subnet by removing last '.' and all chars after, from the ip addr ($1)

    if (subnet != subnet_prv)
    Test if the subnet is the same as the previous.

    if (! prv_subnet) close(subnet_file);
    subnet_file = subnet ".log";
    print $1 > subnet_file;

    Then clause. The subnet is different from previous.
    If an output file is already open (for previous subnet), close it.
    The new output filename is "subnet".log
    The ip add is written to the outfile which is created

    print $1 >> subnet_file;
    Else clause. The subnet is the same
    The ip addr is written (append) to the output file

    subnet_prv = subnet;
    memorize subnet
    Jean-Pierre.

  5. #5
    Join Date
    Dec 2003
    Posts
    6

    Thumbs up

    Hi All,

    Please check this.....

    ************************************************** *********
    # pre - req : temp should contain all ur input ;

    for i in ` sed 's/\./#/3' temp | cut -f 1 -d "#" | sort | uniq `
    do
    grep $i temp >> $i.log ;
    done ;

    ************************************************** *********


    Best Regards,
    Hanuman.

  6. #6
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    Another way to do the work ...

    The advantage of 'awk' is that the input file is read only one time, with
    this sed/grep solution the file is read for each subnet value + 1.
    Jean-Pierre.

  7. #7
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    @hanuman

    There is little bug in your script ...

    For example, if the input file is :
    1.0.0.1
    1.0.0.2
    2.1.0.0

    The result is :
    1.0.0.log
    1.0.0.1
    1.0.0.2
    2.1.0.0

    2.1.0.log
    2.1.0.0

    This is because, the pattern of the grep command is not anchored.

    Code:
    for i in ` sed 's/\./#/3' temp | cut -f 1 -d "#" | sort | uniq `
    do
    grep "^$i" temp >> $i.log ;
    done ;
    You can also use '$i." for the grep pattern.
    Jean-Pierre.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •