Results 1 to 3 of 3

Thread: awk + if ?

  1. #1
    Join Date
    Feb 2004
    Posts
    52

    Unanswered: awk + if ?

    Hi all.

    I have been working to find a solution for this problem but
    it is beyond my actual capabilities at this point. If you could
    at least say whether this is possible or not I will be very thankful.
    I wonder whether awk may be used or not...someone suggested
    perl but I prefer to stick with awk.

    Here is how the raw data is defined. There are N columns (let's say
    8) like this (I add the labels A-H just to make things clear)

    A B C D E F G H
    -13.1057 3.2476 -9.4736 1 41 251 6 3.7889
    -8.2136 -1.2834 -7.6682 1 44 251 8 3.9791
    -6.5500 -3.0177 -8.1940 1 99 251 9 4.5311
    -1.0685 0.0483 -1.40046 1 144 251 3 3.3905
    -17.3394 3.3012 -5.5183 2 18 251 4 3.2886
    -13.7365 3.9043 -7.7776 2 41 251 6 3.3796
    -18.7815 0.6845 -4.5467 3 18 251 4 3.9160
    -13.0219 3.4592 -7.7978 3 41 251 1 3.5433
    -8.0730 -4.3466 -6.3149 3 99 251 9 3.8411
    -12.4067 -1.4076 -10.1356 3 144 251 3 3.2097
    -12.9859 3.0015 -8.7701 4 41 251 1 3.3707
    -8.0811 -2.0228 -4.0008 4 44 251 7 4.8075
    -8.5788 -3.8426 -7.8051 4 99 251 9 4.0662
    -11.9980 -1.7409 -9.9496 4 144 251 3 3.2608
    ... . .. .. .. ...

    The relevant columns are the fourth (D) and the last (H).
    What I am looking for is someway to print only the
    lines that contain the lowest value of H for each set of
    numbers D. In this case,

    D H
    -1.0685 0.0483 -1.40046 1 144 251 3 3.3905
    -17.3394 3.3012 -5.5183 2 18 251 4 3.2886
    -12.4067 -1.4076 -10.1356 3 144 251 3 3.2097
    -11.9980 -1.7409 -9.9496 4 144 251 3 3.2608

    This would be much easier if the total number of lines
    for each set D is known but this is not the case.

    There are many other problems that I solved using your help
    and previous posts around. At this time I know that I need to use
    two "if"s. One to filter column D and another to find the lowest
    value of H. I am stuck in the second conditional.

    Thanks for any suggestion,

    Serg

  2. #2
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    nawk -f serg.awk myInput.txt

    here's serg.awk:

    Code:
    BEGIN {
       FLD_d=4
       FLD_h=8
    }
    
    !($FLD_d in arr) { arr[$FLD_d] = $0; next }
    
    {
      split(arr[$FLD_d], tmp, FS);
      if ( $FLD_h <= tmp[FLD_h] )
        arr[$FLD_d] = $0;
    }
    
    END {
       for (i in arr)
          print arr[i];
    }
    vlad
    +-----------------------+
    | #include <disclaimer.h> |
    +-----------------------+

  3. #3
    Join Date
    Feb 2004
    Posts
    52
    Fantastic, Vlad !
    It works very well. I will need to run a 'sort' command to rearrange
    the lines because the fourth column D must be in crescent
    order. This is no big deal, considering what your script is capable of.

    Thanks a lot !

    Serg.

    --------------------------------------------------------------
    -11.9980 -1.7409 -9.9496 4 144 251 3 3.2608
    -1.0685 0.0483 -1.40046 1 144 251 3 3.3905
    -17.3394 3.3012 -5.5183 2 18 251 4 3.2886
    -12.4067 -1.4076 -10.1356 3 144 251 3 3.2097

    sort -n +3 output > answer

    -1.0685 0.0483 -1.40046 1 144 251 3 3.3905
    -17.3394 3.3012 -5.5183 2 18 251 4 3.2886
    -12.4067 -1.4076 -10.1356 3 144 251 3 3.2097
    -11.9980 -1.7409 -9.9496 4 144 251 3 3.2608

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •