Page 1 of 2 12 LastLast
Results 1 to 15 of 18
  1. #1
    Join Date
    Jun 2004
    Posts
    20

    Unanswered: sorting SN Numbers

    Hello everyone,
    I was just wondering if anyone would happen to know how to sort scientific notation in a text file.
    I have numbers that are of the form:
    1.0000E+00
    9.9999E-01
    1.0000E+01
    1.0030E+01
    9.9998E-01
    1.0004E+03
    6.0000E+00
    9.9999E-02

    (for anyone who is unclear: 9.999E-2 = 9.999 x 10^(-2) = 0.09999)
    So as you can see sorting this is a bit more difficult than just doing a normal sort since there are exponents. so I figured why not just sort the column representing the exponents.
    so this is what I tried:
    sort +0.6 -0.11 infile.txt
    and the output recieved was:
    1.0000E+00
    6.0000E+00
    1.0000E+01
    1.0030E+01
    1.0004E+03
    9.9998E-01
    9.9999E-01
    9.9999E-02
    as you can see, it sorts the +'s and -'s first, and then the numbers after the + or - get put into accending order. Note how the accending order resets once it gets to "-".

    The output I want would be:
    1.0004E+03
    1.0030E+01
    1.0000E+01
    6.0000E+00
    1.0000E+00
    9.9998E-01
    9.9999E-01
    9.9999E-02
    where the exponents are ordered numerically correct. (if it was ordered from lowest to highest that would be fine too)
    I know there is a sort -n infile.txt which sorts numerically, but it doesnt handle the +'s and -'s
    In fact, if I replace the +'s with just spaces the -n does handle the negative values just fine. but this is going to be used on a Large file which has +'s and -'s in other areas so I cant just replace all the +'s with spaces...
    Please keep in mind Im showing only one column of many columns that are similar to this one.

    If someone could help, it would be greatly appreciated.
    Thanks a lot,
    nullpointer

  2. #2
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    how about:

    Code:
    nawk '{printf("%f\n", $0)}' file.txt | sort -n | nawk '{printf("%.4E\n", $0)}'

  3. #3
    Join Date
    Jun 2004
    Posts
    20

    sorting

    wow, thats great. It works perfectly. Thanks a lot.
    Is there any type of command so I can run this set of commands on a specific column in a text file?
    Ultimatly I have many columns of these numbers, and I want to sort all the rows by a certain column.

    Your help is much appreciated.
    Thanks

  4. #4
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    the solution assumes SINGLE column of input - not 'embedded' numbers in a text file.
    To do what I THINK you want might be a bit more difficult, but not by too much

    Pls provide a sample file.

  5. #5
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    Quote Originally Posted by nullptr
    wow, thats great. It works perfectly. Thanks a lot.
    Is there any type of command so I can run this set of commands on a specific column in a text file?
    Ultimatly I have many columns of these numbers, and I want to sort all the rows by a certain column.

    Your help is much appreciated.
    Thanks
    assuming a sample file like this:
    a 1.0000E+00 a
    b 9.9999E-01 b
    c 1.0000E+01 c
    e 1.0030E+01 e
    f 9.9998E-01 f
    g 1.0004E+03 g
    h 6.0000E+00 h
    i 9.9999E-02 i

    and sorting on the SECOND column - here's the modified solution.

    Code:
    c=2
    nawk -v c=${c}'{$c=sprintf("%f", $c);print}' file | sort -t ' ' -n -k ${c},${c} | nawk -v c=${c} '{$c=sprintf("%.4E", $c);print}'

  6. #6
    Join Date
    Jun 2004
    Posts
    20
    Quote Originally Posted by vgersh99
    the solution assumes SINGLE column of input - not 'embedded' numbers in a text file.
    To do what I THINK you want might be a bit more difficult, but not by too much

    Pls provide a sample file.
    Sorry, I should have been more specific. Ok so lets say I have a file that contains columns with numbers in Scientific Notation:

    Col1 Col2 Col3 Col4
    1.0000E+00 2.3333E-02 1.4500E-05 2.1233E-02
    9.9999E-01 3.6286E+03 1.0023E+03 1.6643E-02
    1.0000E+01 9.8754E+00 1.0234E-02 2.2845E-02
    1.0030E+01 3.7896E-01 1.9934E+08 2.3443E-02
    9.9998E-01 2.6541E-02 1.5386E+05 3.3333E-02
    1.3434E+03 1.9999E+04 1.1275E+07 2.3333E-02
    6.0000E+00 1.8441E-01 9.9999E-06 2.3333E-02
    9.9999E-02 1.8441E+01 1.1111E-01 4.3333E-02

    all the labels (ex. Col1, Col2...etc) should be above their corresponding column...
    Note: Im not sure if it makes a difference but all columns are not seperated by just 1 space there are some with multiple spaces.

    so how would I be able to sort this whole file by Col3?
    Thanks so much for the help
    Last edited by nullptr; 06-04-04 at 09:51.

  7. #7
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    Have you tried the latest post with c=3 ?

    Code:
    c=3
    nawk -v c=${c} '{$c=sprintf("%f", $c);print}' file | sort -t ' ' -n -k ${c},${c} | nawk -v c=${c} '{$c=sprintf("%.4E", $c);print}'

  8. #8
    Join Date
    Jun 2004
    Posts
    20
    Quote Originally Posted by vgersh99
    Have you tried the latest post with c=3 ?

    Code:
    c=3
    nawk -v c=${c} '{$c=sprintf("%f", $c);print}' file | sort -t ' ' -n -k ${c},${c} | nawk -v c=${c} '{$c=sprintf("%.4E", $c);print}'
    yeah I tried that. It works for the test file. But when I try it on my actual file which has the same format except, there are more spaces between certain columns and much larger...what this code does is reduce all the spacing between the columns to 1 space. which kinda makes a mess of things.

  9. #9
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    could you post a sample file with some rows with single spaces between columns AND some with multiple spaces, pls!

    Also, are all your columns of a certain width? Maybe padded with extra spaces?

  10. #10
    Join Date
    Jun 2004
    Posts
    20
    sorry of the inconvienience. ok, since all my spaces get compressed to 1 space when I try to show you the spacing..Im going to be using "#" to represent how many spaces there are. (Maximize your window to see this better)
    2.2012E+00#1.0000E+00###3.2330E-01###2.7671E-06#2.3437E-02#1.4687E-05
    that is one line...
    2.0306E-03#9.9993E-01###2.8104E-02##-1.6147E-06#6.4133E-02#1.6365E-05
    and that would be the second line I underlined what column I was looking to sort and I made bold the differences.
    and now heres a bunch more just so you'd be able to see the sort better...
    2.0327E-05#1.7712E+00#1.0000E+00###4.6113E-01###7.4498E-07#5.8594E-03#6.9177E-06
    1.4080E-07#4.1622E-01#1.0000E+00###4.0306E-01##-5.3213E-08#9.7656E-04#5.3213E-08
    7.5148E-05#2.2012E+00#1.0000E+00###3.2330E-01###2.7671E-06#2.3437E-02#1.4687E-05
    1.0866E-04#2.0306E-03#9.9993E-01###2.8104E-02##-1.6147E-06#6.4133E-02#1.6365E-05
    1.0730E-04#2.0164E-03#9.9993E-01###2.7961E-02##-1.5490E-06#6.3782E-02#1.6199E-05
    1.0777E-04#2.0328E-03#9.9993E-01###2.8131E-02##-1.6138E-06#6.4148E-02#1.6369E-05
    5.3249E-05#7.1753E-01#1.0000E+00###3.7768E-01##-2.3946E-07#2.0996E-02#5.9332E-06
    1.3083E-05#1.2323E-01#1.0000E+00###2.7064E-01##-2.6606E-08#5.8594E-03#1.6496E-06

    I hope this helps, thanks again for the help
    The column I want to sort (underlined above) always stays the same length, and never gets shifted from any other column changing, say if a number gets bigger in another column. This will not effect the position of the other columns.
    Last edited by nullptr; 06-25-04 at 10:26.

  11. #11
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    ok, how about:

    Code:
    #!/bin/ksh
    c=10
    nawk -v c=${c} '{a=sprintf("%f", $c);print a, $0}' file |  sort -n | cut -d' ' -f2-

  12. #12
    Join Date
    Jun 2004
    Posts
    20

    slight bug

    That code is really close.

    but there is a small problem...There are just over 2000 of those lines in the file Im sorting. The majority of those lines have a "1.0000E+00" but in the whole file there are roughly 100 that are "9.9999E-01" down to "9.9993E-01".
    when I ran your code, it put a bunch of the "1.0000E+00" at the top...then it sorted correctly.
    Just to be clear...this is what I saw:
    <SO2#>#AURV-UG/M3#####285#m########200###0#2.7567E-02#2.0327E-05#1.7712E+00#1.0000E+00###4.6113E-01###7.4498E-07#5.8594E-03#6.9177E-06
    ...a bunch like this with the 1.0000E+00
    then it jumps to 9.9993E-01 like so:
    <AM25>#AURV-UG/M3#####195#m########300###0#1.1366E-01#1.0730E-04#2.0164E-03#9.9993E-01###2.7961E-02##-1.5490E-06#6.3782E-02#1.6199E-05
    then it sorts correctly from that point on. Keep in mind not all of the 1.0000E+00 are at the top...there are some below as well when the sorting gets up to that number.
    The range of that column is actually 9.9993E-01 to 1.0000E+00

    Something I noticed was how you are converting all the numbers in the file to their orginal form then sorting the perticular column...but there are some columns which have an exponent like 3.2145E-23...would this mess up the results in any way?

    Thank you very much for all your help

  13. #13
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    I think you problem is with the 'first' field: sometimes what's supposed to be a single field contain a space:
    <NH3 > AURV-UG/M3

    and sometimes it does not:
    <AMTO> AURV-UG/M3

    Having an extra spaces 'screws up' the numbering of the following columns that nawk relies on.

    You can either make your input file 'well-behaved' making sure that the first field is "contigious" [no embeded space]
    OR
    do it "on the fly" and then recreate the original format.

    Try the code below for the second alternative:
    Code:
    #!/bin/ksh
    c=10
    sed -e 's/ >/#>/g' file | nawk -v c=${c} '{a=sprintf("%f", $c);print a " " $0}' | sort -n | cut -d' ' -f2- | sed -e 's/#>/ >/g'
    As for the 'sorting' question... make sure you're sorting on the correct column [c=10] as you would expect.

  14. #14
    Join Date
    Jun 2004
    Posts
    20

    Talking

    It works! Thats perfect!
    There was no way I'd be able to do that on my own.
    You've been a great help, thank you very much!

  15. #15
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    I'm glad it worked out for you - you're welcome!

    vlad

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •