If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > awk + if ?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-13-04, 10:00
Serg Serg is offline
Registered User
 
Join Date: Feb 2004
Posts: 52
awk + if ?

Hi all.

I have been working to find a solution for this problem but
it is beyond my actual capabilities at this point. If you could
at least say whether this is possible or not I will be very thankful.
I wonder whether awk may be used or not...someone suggested
perl but I prefer to stick with awk.

Here is how the raw data is defined. There are N columns (let's say
8) like this (I add the labels A-H just to make things clear)

A B C D E F G H
-13.1057 3.2476 -9.4736 1 41 251 6 3.7889
-8.2136 -1.2834 -7.6682 1 44 251 8 3.9791
-6.5500 -3.0177 -8.1940 1 99 251 9 4.5311
-1.0685 0.0483 -1.40046 1 144 251 3 3.3905
-17.3394 3.3012 -5.5183 2 18 251 4 3.2886
-13.7365 3.9043 -7.7776 2 41 251 6 3.3796
-18.7815 0.6845 -4.5467 3 18 251 4 3.9160
-13.0219 3.4592 -7.7978 3 41 251 1 3.5433
-8.0730 -4.3466 -6.3149 3 99 251 9 3.8411
-12.4067 -1.4076 -10.1356 3 144 251 3 3.2097
-12.9859 3.0015 -8.7701 4 41 251 1 3.3707
-8.0811 -2.0228 -4.0008 4 44 251 7 4.8075
-8.5788 -3.8426 -7.8051 4 99 251 9 4.0662
-11.9980 -1.7409 -9.9496 4 144 251 3 3.2608
... . .. .. .. ...

The relevant columns are the fourth (D) and the last (H).
What I am looking for is someway to print only the
lines that contain the lowest value of H for each set of
numbers D. In this case,

D H
-1.0685 0.0483 -1.40046 1 144 251 3 3.3905
-17.3394 3.3012 -5.5183 2 18 251 4 3.2886
-12.4067 -1.4076 -10.1356 3 144 251 3 3.2097
-11.9980 -1.7409 -9.9496 4 144 251 3 3.2608

This would be much easier if the total number of lines
for each set D is known but this is not the case.

There are many other problems that I solved using your help
and previous posts around. At this time I know that I need to use
two "if"s. One to filter column D and another to find the lowest
value of H. I am stuck in the second conditional.

Thanks for any suggestion,

Serg
Reply With Quote
  #2 (permalink)  
Old 12-13-04, 10:15
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 325
nawk -f serg.awk myInput.txt

here's serg.awk:

Code:
BEGIN {
   FLD_d=4
   FLD_h=8
}

!($FLD_d in arr) { arr[$FLD_d] = $0; next }

{
  split(arr[$FLD_d], tmp, FS);
  if ( $FLD_h <= tmp[FLD_h] )
    arr[$FLD_d] = $0;
}

END {
   for (i in arr)
      print arr[i];
}
__________________
vlad
+-----------------------+
| #include <disclaimer.h> |
+-----------------------+
Reply With Quote
  #3 (permalink)  
Old 12-13-04, 10:54
Serg Serg is offline
Registered User
 
Join Date: Feb 2004
Posts: 52
Fantastic, Vlad !
It works very well. I will need to run a 'sort' command to rearrange
the lines because the fourth column D must be in crescent
order. This is no big deal, considering what your script is capable of.

Thanks a lot !

Serg.

--------------------------------------------------------------
-11.9980 -1.7409 -9.9496 4 144 251 3 3.2608
-1.0685 0.0483 -1.40046 1 144 251 3 3.3905
-17.3394 3.3012 -5.5183 2 18 251 4 3.2886
-12.4067 -1.4076 -10.1356 3 144 251 3 3.2097

sort -n +3 output > answer

-1.0685 0.0483 -1.40046 1 144 251 3 3.3905
-17.3394 3.3012 -5.5183 2 18 251 4 3.2886
-12.4067 -1.4076 -10.1356 3 144 251 3 3.2097
-11.9980 -1.7409 -9.9496 4 144 251 3 3.2608
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On