If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > Pattern Recognition and Extracting into files

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-08-04, 07:52
mattystorey001 mattystorey001 is offline
Registered User
 
Join Date: Mar 2004
Posts: 2
Pattern Recognition and Extracting into files

Hi,

First post to these forums.

Basically i need to take a file, sort it, then extract different patterns in it.

The data is IP addresses and i want to split it up by Subnet.

I.e. sample data hosts.log (made up)

1.0.0.1
1.0.0.2
1.0.0.3
1.0.1.4
1.0.1.5
1.0.1.6
1.0.2.7
1.0.2.8
1.0.2.9
(etc.etc.....)
9.1.1.1
9.1.2.2
(etc.etc.....)

Then split into

1.0.0.log ::
1.0.0.1
1.0.0.2
1.0.0.3

1.0.1.log ::
1.0.1.4
1.0.1.5
1.0.1.6

Is this possible or alot of scripting? I dont mind how the output is renamed, but if it was by subnet's that would be great.

I have had a look at Awk, Sed etc. etc. and have not been able to find anything that matches what i want to do.

My apologies if this is a little OTT! :-)
Reply With Quote
  #2 (permalink)  
Old 03-08-04, 08:19
aigles aigles is offline
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
Try this :

Code:
sort -t. -n -k1,4 ip.txt |
awk '
{
   subnet = $1;
   sub(/\.[^.]*$/,"",subnet)
   if (subnet != subnet_prv) {
      if (! prv_subnet) close(subnet_file);
      subnet_file = subnet ".log";
      print $1  > subnet_file;
   } else {
      print $1 >>  subnet_file;
   }
   subnet_prv = subnet;
}
'
__________________
Jean-Pierre.
Reply With Quote
  #3 (permalink)  
Old 03-08-04, 08:58
mattystorey001 mattystorey001 is offline
Registered User
 
Join Date: Mar 2004
Posts: 2
Fantastic thank you.

Now my next task is actually understanding how that just happened..

:-)
Reply With Quote
  #4 (permalink)  
Old 03-08-04, 09:29
aigles aigles is offline
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
sort -t. -n -k1,4 ip.txt|
Sort the input file
-t. field delimiter = '.'
-n numeric sort
-k1,4 sort fields 1, 2 3 4

awk '{ ... }'
Proceed each line from result of sort.

subnet = $1;
sub(/\.[^.]*$/,"",subnet)

determine the subnet by removing last '.' and all chars after, from the ip addr ($1)

if (subnet != subnet_prv)
Test if the subnet is the same as the previous.

if (! prv_subnet) close(subnet_file);
subnet_file = subnet ".log";
print $1 > subnet_file;

Then clause. The subnet is different from previous.
If an output file is already open (for previous subnet), close it.
The new output filename is "subnet".log
The ip add is written to the outfile which is created

print $1 >> subnet_file;
Else clause. The subnet is the same
The ip addr is written (append) to the output file

subnet_prv = subnet;
memorize subnet
__________________
Jean-Pierre.
Reply With Quote
  #5 (permalink)  
Old 03-11-04, 10:08
hanuman hanuman is offline
Registered User
 
Join Date: Dec 2003
Posts: 6
Thumbs up

Hi All,

Please check this.....

************************************************** *********
# pre - req : temp should contain all ur input ;

for i in ` sed 's/\./#/3' temp | cut -f 1 -d "#" | sort | uniq `
do
grep $i temp >> $i.log ;
done ;

************************************************** *********


Best Regards,
Hanuman.
Reply With Quote
  #6 (permalink)  
Old 03-11-04, 16:26
aigles aigles is offline
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
Another way to do the work ...

The advantage of 'awk' is that the input file is read only one time, with
this sed/grep solution the file is read for each subnet value + 1.
__________________
Jean-Pierre.
Reply With Quote
  #7 (permalink)  
Old 03-12-04, 02:04
aigles aigles is offline
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
@hanuman

There is little bug in your script ...

For example, if the input file is :
1.0.0.1
1.0.0.2
2.1.0.0

The result is :
1.0.0.log
1.0.0.1
1.0.0.2
2.1.0.0

2.1.0.log
2.1.0.0

This is because, the pattern of the grep command is not anchored.

Code:
for i in ` sed 's/\./#/3' temp | cut -f 1 -d "#" | sort | uniq `
do
grep "^$i" temp >> $i.log ;
done ;
You can also use '$i." for the grep pattern.
__________________
Jean-Pierre.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On