1. Registered User
Join Date
Jun 2004
Posts
20

Hello everyone,
I was just wondering if anyone would happen to know how to sort scientific notation in a text file.
I have numbers that are of the form:
1.0000E+00
9.9999E-01
1.0000E+01
1.0030E+01
9.9998E-01
1.0004E+03
6.0000E+00
9.9999E-02

(for anyone who is unclear: 9.999E-2 = 9.999 x 10^(-2) = 0.09999)
So as you can see sorting this is a bit more difficult than just doing a normal sort since there are exponents. so I figured why not just sort the column representing the exponents.
so this is what I tried:
sort +0.6 -0.11 infile.txt
and the output recieved was:
1.0000E+00
6.0000E+00
1.0000E+01
1.0030E+01
1.0004E+03
9.9998E-01
9.9999E-01
9.9999E-02
as you can see, it sorts the +'s and -'s first, and then the numbers after the + or - get put into accending order. Note how the accending order resets once it gets to "-".

The output I want would be:
1.0004E+03
1.0030E+01
1.0000E+01
6.0000E+00
1.0000E+00
9.9998E-01
9.9999E-01
9.9999E-02
where the exponents are ordered numerically correct. (if it was ordered from lowest to highest that would be fine too)
I know there is a sort -n infile.txt which sorts numerically, but it doesnt handle the +'s and -'s
In fact, if I replace the +'s with just spaces the -n does handle the negative values just fine. but this is going to be used on a Large file which has +'s and -'s in other areas so I cant just replace all the +'s with spaces...
Please keep in mind Im showing only one column of many columns that are similar to this one.

If someone could help, it would be greatly appreciated.
Thanks a lot,
nullpointer

2. Registered User
Join Date
Apr 2004
Location
Boston, MA
Posts
325

Code:
`nawk '{printf("%f\n", \$0)}' file.txt | sort -n | nawk '{printf("%.4E\n", \$0)}'`

3. Registered User
Join Date
Jun 2004
Posts
20

## sorting

wow, thats great. It works perfectly. Thanks a lot.
Is there any type of command so I can run this set of commands on a specific column in a text file?
Ultimatly I have many columns of these numbers, and I want to sort all the rows by a certain column.

Thanks

4. Registered User
Join Date
Apr 2004
Location
Boston, MA
Posts
325
the solution assumes SINGLE column of input - not 'embedded' numbers in a text file.
To do what I THINK you want might be a bit more difficult, but not by too much

Pls provide a sample file.

5. Registered User
Join Date
Apr 2004
Location
Boston, MA
Posts
325
Originally Posted by nullptr
wow, thats great. It works perfectly. Thanks a lot.
Is there any type of command so I can run this set of commands on a specific column in a text file?
Ultimatly I have many columns of these numbers, and I want to sort all the rows by a certain column.

Thanks
assuming a sample file like this:
a 1.0000E+00 a
b 9.9999E-01 b
c 1.0000E+01 c
e 1.0030E+01 e
f 9.9998E-01 f
g 1.0004E+03 g
h 6.0000E+00 h
i 9.9999E-02 i

and sorting on the SECOND column - here's the modified solution.

Code:
```c=2
nawk -v c=\${c}'{\$c=sprintf("%f", \$c);print}' file | sort -t ' ' -n -k \${c},\${c} | nawk -v c=\${c} '{\$c=sprintf("%.4E", \$c);print}'```

6. Registered User
Join Date
Jun 2004
Posts
20
Originally Posted by vgersh99
the solution assumes SINGLE column of input - not 'embedded' numbers in a text file.
To do what I THINK you want might be a bit more difficult, but not by too much

Pls provide a sample file.
Sorry, I should have been more specific. Ok so lets say I have a file that contains columns with numbers in Scientific Notation:

Col1 Col2 Col3 Col4
1.0000E+00 2.3333E-02 1.4500E-05 2.1233E-02
9.9999E-01 3.6286E+03 1.0023E+03 1.6643E-02
1.0000E+01 9.8754E+00 1.0234E-02 2.2845E-02
1.0030E+01 3.7896E-01 1.9934E+08 2.3443E-02
9.9998E-01 2.6541E-02 1.5386E+05 3.3333E-02
1.3434E+03 1.9999E+04 1.1275E+07 2.3333E-02
6.0000E+00 1.8441E-01 9.9999E-06 2.3333E-02
9.9999E-02 1.8441E+01 1.1111E-01 4.3333E-02

all the labels (ex. Col1, Col2...etc) should be above their corresponding column...
Note: Im not sure if it makes a difference but all columns are not seperated by just 1 space there are some with multiple spaces.

so how would I be able to sort this whole file by Col3?
Thanks so much for the help
Last edited by nullptr; 06-04-04 at 08:51.

7. Registered User
Join Date
Apr 2004
Location
Boston, MA
Posts
325
Have you tried the latest post with c=3 ?

Code:
```c=3
nawk -v c=\${c} '{\$c=sprintf("%f", \$c);print}' file | sort -t ' ' -n -k \${c},\${c} | nawk -v c=\${c} '{\$c=sprintf("%.4E", \$c);print}'```

8. Registered User
Join Date
Jun 2004
Posts
20
Originally Posted by vgersh99
Have you tried the latest post with c=3 ?

Code:
```c=3
nawk -v c=\${c} '{\$c=sprintf("%f", \$c);print}' file | sort -t ' ' -n -k \${c},\${c} | nawk -v c=\${c} '{\$c=sprintf("%.4E", \$c);print}'```
yeah I tried that. It works for the test file. But when I try it on my actual file which has the same format except, there are more spaces between certain columns and much larger...what this code does is reduce all the spacing between the columns to 1 space. which kinda makes a mess of things.

9. Registered User
Join Date
Apr 2004
Location
Boston, MA
Posts
325
could you post a sample file with some rows with single spaces between columns AND some with multiple spaces, pls!

Also, are all your columns of a certain width? Maybe padded with extra spaces?

10. Registered User
Join Date
Jun 2004
Posts
20
sorry of the inconvienience. ok, since all my spaces get compressed to 1 space when I try to show you the spacing..Im going to be using "#" to represent how many spaces there are. (Maximize your window to see this better)
2.2012E+00#1.0000E+00###3.2330E-01###2.7671E-06#2.3437E-02#1.4687E-05
that is one line...
2.0306E-03#9.9993E-01###2.8104E-02##-1.6147E-06#6.4133E-02#1.6365E-05
and that would be the second line I underlined what column I was looking to sort and I made bold the differences.
and now heres a bunch more just so you'd be able to see the sort better...
2.0327E-05#1.7712E+00#1.0000E+00###4.6113E-01###7.4498E-07#5.8594E-03#6.9177E-06
1.4080E-07#4.1622E-01#1.0000E+00###4.0306E-01##-5.3213E-08#9.7656E-04#5.3213E-08
7.5148E-05#2.2012E+00#1.0000E+00###3.2330E-01###2.7671E-06#2.3437E-02#1.4687E-05
1.0866E-04#2.0306E-03#9.9993E-01###2.8104E-02##-1.6147E-06#6.4133E-02#1.6365E-05
1.0730E-04#2.0164E-03#9.9993E-01###2.7961E-02##-1.5490E-06#6.3782E-02#1.6199E-05
1.0777E-04#2.0328E-03#9.9993E-01###2.8131E-02##-1.6138E-06#6.4148E-02#1.6369E-05
5.3249E-05#7.1753E-01#1.0000E+00###3.7768E-01##-2.3946E-07#2.0996E-02#5.9332E-06
1.3083E-05#1.2323E-01#1.0000E+00###2.7064E-01##-2.6606E-08#5.8594E-03#1.6496E-06

I hope this helps, thanks again for the help
The column I want to sort (underlined above) always stays the same length, and never gets shifted from any other column changing, say if a number gets bigger in another column. This will not effect the position of the other columns.
Last edited by nullptr; 06-25-04 at 09:26.

11. Registered User
Join Date
Apr 2004
Location
Boston, MA
Posts
325

Code:
```#!/bin/ksh
c=10
nawk -v c=\${c} '{a=sprintf("%f", \$c);print a, \$0}' file |  sort -n | cut -d' ' -f2-```

12. Registered User
Join Date
Jun 2004
Posts
20

## slight bug

That code is really close.

but there is a small problem...There are just over 2000 of those lines in the file Im sorting. The majority of those lines have a "1.0000E+00" but in the whole file there are roughly 100 that are "9.9999E-01" down to "9.9993E-01".
when I ran your code, it put a bunch of the "1.0000E+00" at the top...then it sorted correctly.
Just to be clear...this is what I saw:
<SO2#>#AURV-UG/M3#####285#m########200###0#2.7567E-02#2.0327E-05#1.7712E+00#1.0000E+00###4.6113E-01###7.4498E-07#5.8594E-03#6.9177E-06
...a bunch like this with the 1.0000E+00
then it jumps to 9.9993E-01 like so:
<AM25>#AURV-UG/M3#####195#m########300###0#1.1366E-01#1.0730E-04#2.0164E-03#9.9993E-01###2.7961E-02##-1.5490E-06#6.3782E-02#1.6199E-05
then it sorts correctly from that point on. Keep in mind not all of the 1.0000E+00 are at the top...there are some below as well when the sorting gets up to that number.
The range of that column is actually 9.9993E-01 to 1.0000E+00

Something I noticed was how you are converting all the numbers in the file to their orginal form then sorting the perticular column...but there are some columns which have an exponent like 3.2145E-23...would this mess up the results in any way?

Thank you very much for all your help

13. Registered User
Join Date
Apr 2004
Location
Boston, MA
Posts
325
I think you problem is with the 'first' field: sometimes what's supposed to be a single field contain a space:
<NH3 > AURV-UG/M3

and sometimes it does not:
<AMTO> AURV-UG/M3

Having an extra spaces 'screws up' the numbering of the following columns that nawk relies on.

You can either make your input file 'well-behaved' making sure that the first field is "contigious" [no embeded space]
OR
do it "on the fly" and then recreate the original format.

Try the code below for the second alternative:
Code:
```#!/bin/ksh
c=10
sed -e 's/ >/#>/g' file | nawk -v c=\${c} '{a=sprintf("%f", \$c);print a " " \$0}' | sort -n | cut -d' ' -f2- | sed -e 's/#>/ >/g'```
As for the 'sorting' question... make sure you're sorting on the correct column [c=10] as you would expect.

14. Registered User
Join Date
Jun 2004
Posts
20
It works! Thats perfect!
There was no way I'd be able to do that on my own.
You've been a great help, thank you very much!

15. Registered User
Join Date
Apr 2004
Location
Boston, MA
Posts
325
I'm glad it worked out for you - you're welcome!