| |
Welcome to the dBforums forums.
You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!
If you have any problems with the registration process or your account login, please contact contact support.
If you prefer not to see double-underlined words and corresponding ads, place your cursor here for ContentLink opt out.
|
 |
|

06-03-04, 10:56
|
|
Registered User
|
|
Join Date: Jun 2004
Posts: 20
|
|
|
sorting SN Numbers
|
Hello everyone,
I was just wondering if anyone would happen to know how to sort scientific notation in a text file.
I have numbers that are of the form:
1.0000E+00
9.9999E-01
1.0000E+01
1.0030E+01
9.9998E-01
1.0004E+03
6.0000E+00
9.9999E-02
(for anyone who is unclear: 9.999E-2 = 9.999 x 10^(-2) = 0.09999)
So as you can see sorting this is a bit more difficult than just doing a normal sort since there are exponents. so I figured why not just sort the column representing the exponents.
so this is what I tried:
sort +0.6 -0.11 infile.txt
and the output recieved was:
1.0000E+00
6.0000E+00
1.0000E+01
1.0030E+01
1.0004E+03
9.9998E-01
9.9999E-01
9.9999E-02
as you can see, it sorts the +'s and -'s first, and then the numbers after the + or - get put into accending order. Note how the accending order resets once it gets to "-".
The output I want would be:
1.0004E+03
1.0030E+01
1.0000E+01
6.0000E+00
1.0000E+00
9.9998E-01
9.9999E-01
9.9999E-02
where the exponents are ordered numerically correct. (if it was ordered from lowest to highest that would be fine too)
I know there is a sort -n infile.txt which sorts numerically, but it doesnt handle the +'s and -'s
In fact, if I replace the +'s with just spaces the -n does handle the negative values just fine. but this is going to be used on a Large file which has +'s and -'s in other areas so I cant just replace all the +'s with spaces...
Please keep in mind Im showing only one column of many columns that are similar to this one.
If someone could help, it would be greatly appreciated.
Thanks a lot,
nullpointer
|
|

06-03-04, 15:24
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
|
|
how about :
Code:
nawk '{printf("%f\n", $0)}' file.txt | sort -n | nawk '{printf("%.4E\n", $0)}'
|
|

06-03-04, 16:55
|
|
Registered User
|
|
Join Date: Jun 2004
Posts: 20
|
|
|
sorting
|
wow, thats great. It works perfectly. Thanks a lot.
Is there any type of command so I can run this set of commands on a specific column in a text file?
Ultimatly I have many columns of these numbers, and I want to sort all the rows by a certain column.
Your help is much appreciated.
Thanks
|
|

06-03-04, 17:06
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
|
|
the solution assumes SINGLE column of input - not 'embedded' numbers in a text file.
To do what I THINK you want might be a bit more difficult, but not by too much
Pls provide a sample file.
|
|

06-03-04, 17:28
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
|
|
Quote:
|
Originally Posted by nullptr
wow, thats great. It works perfectly. Thanks a lot.
Is there any type of command so I can run this set of commands on a specific column in a text file?
Ultimatly I have many columns of these numbers, and I want to sort all the rows by a certain column.
Your help is much appreciated.
Thanks
|
assuming a sample file like this:
a 1.0000E+00 a
b 9.9999E-01 b
c 1.0000E+01 c
e 1.0030E+01 e
f 9.9998E-01 f
g 1.0004E+03 g
h 6.0000E+00 h
i 9.9999E-02 i
and sorting on the SECOND column - here's the modified solution.
Code:
c=2
nawk -v c=${c}'{$c=sprintf("%f", $c);print}' file | sort -t ' ' -n -k ${c},${c} | nawk -v c=${c} '{$c=sprintf("%.4E", $c);print}'
|
|

06-04-04, 09:35
|
|
Registered User
|
|
Join Date: Jun 2004
Posts: 20
|
|
Quote:
|
Originally Posted by vgersh99
the solution assumes SINGLE column of input - not 'embedded' numbers in a text file.
To do what I THINK you want might be a bit more difficult, but not by too much
Pls provide a sample file.
|
Sorry, I should have been more specific. Ok so lets say I have a file that contains columns with numbers in Scientific Notation:
Col1 Col2 Col3 Col4
1.0000E+00 2.3333E-02 1.4500E-05 2.1233E-02
9.9999E-01 3.6286E+03 1.0023E+03 1.6643E-02
1.0000E+01 9.8754E+00 1.0234E-02 2.2845E-02
1.0030E+01 3.7896E-01 1.9934E+08 2.3443E-02
9.9998E-01 2.6541E-02 1.5386E+05 3.3333E-02
1.3434E+03 1.9999E+04 1.1275E+07 2.3333E-02
6.0000E+00 1.8441E-01 9.9999E-06 2.3333E-02
9.9999E-02 1.8441E+01 1.1111E-01 4.3333E-02
all the labels (ex. Col1, Col2...etc) should be above their corresponding column...
Note: Im not sure if it makes a difference but all columns are not seperated by just 1 space there are some with multiple spaces.
so how would I be able to sort this whole file by Col3?
Thanks so much for the help
|
Last edited by nullptr : 06-04-04 at 09:51.
|

06-04-04, 09:49
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
|
|
Have you tried the latest post with c=3 ?
Code:
c=3
nawk -v c=${c} '{$c=sprintf("%f", $c);print}' file | sort -t ' ' -n -k ${c},${c} | nawk -v c=${c} '{$c=sprintf("%.4E", $c);print}'
|
|

06-04-04, 10:10
|
|
Registered User
|
|
Join Date: Jun 2004
Posts: 20
|
|
Quote:
|
Originally Posted by vgersh99
Have you tried the latest post with c=3 ?
Code:
c=3
nawk -v c=${c} '{$c=sprintf("%f", $c);print}' file | sort -t ' ' -n -k ${c},${c} | nawk -v c=${c} '{$c=sprintf("%.4E", $c);print}'
|
yeah I tried that. It works for the test file. But when I try it on my actual file which has the same format except, there are more spaces between certain columns and much larger...what this code does is reduce all the spacing between the columns to 1 space. which kinda makes a mess of things. 
|
|

06-04-04, 10:53
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
|
|
could you post a sample file with some rows with single spaces between columns AND some with multiple spaces, pls!
Also, are all your columns of a certain width? Maybe padded with extra spaces?
|
|

06-04-04, 11:45
|
|
Registered User
|
|
Join Date: Jun 2004
Posts: 20
|
|
sorry of the inconvienience. ok, since all my spaces get compressed to 1 space when I try to show you the spacing..Im going to be using "#" to represent how many spaces there are. (Maximize your window to see this better)
2.2012E+00#1.0000E+00###3.2330E-01###2.7671E-06#2.3437E-02#1.4687E-05
that is one line...
2.0306E-03#9.9993E-01###2.8104E-02##-1.6147E-06#6.4133E-02#1.6365E-05
and that would be the second line I underlined what column I was looking to sort and I made bold the differences.
and now heres a bunch more just so you'd be able to see the sort better...
2.0327E-05#1.7712E+00#1.0000E+00###4.6113E-01###7.4498E-07#5.8594E-03#6.9177E-06
1.4080E-07#4.1622E-01#1.0000E+00###4.0306E-01##-5.3213E-08#9.7656E-04#5.3213E-08
7.5148E-05#2.2012E+00#1.0000E+00###3.2330E-01###2.7671E-06#2.3437E-02#1.4687E-05
1.0866E-04#2.0306E-03#9.9993E-01###2.8104E-02##-1.6147E-06#6.4133E-02#1.6365E-05
1.0730E-04#2.0164E-03#9.9993E-01###2.7961E-02##-1.5490E-06#6.3782E-02#1.6199E-05
1.0777E-04#2.0328E-03#9.9993E-01###2.8131E-02##-1.6138E-06#6.4148E-02#1.6369E-05
5.3249E-05#7.1753E-01#1.0000E+00###3.7768E-01##-2.3946E-07#2.0996E-02#5.9332E-06
1.3083E-05#1.2323E-01#1.0000E+00###2.7064E-01##-2.6606E-08#5.8594E-03#1.6496E-06
I hope this helps, thanks again for the help
The column I want to sort (underlined above) always stays the same length, and never gets shifted from any other column changing, say if a number gets bigger in another column. This will not effect the position of the other columns.
|
Last edited by nullptr : 06-25-04 at 10:26.
|

06-04-04, 19:21
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
|
|
ok, how about :
Code:
#!/bin/ksh
c=10
nawk -v c=${c} '{a=sprintf("%f", $c);print a, $0}' file | sort -n | cut -d' ' -f2-
|
|

06-07-04, 11:55
|
|
Registered User
|
|
Join Date: Jun 2004
Posts: 20
|
|
|
slight bug
That code is really close.
but there is a small problem...There are just over 2000 of those lines in the file Im sorting. The majority of those lines have a "1.0000E+00" but in the whole file there are roughly 100 that are "9.9999E-01" down to "9.9993E-01".
when I ran your code, it put a bunch of the "1.0000E+00" at the top...then it sorted correctly.
Just to be clear...this is what I saw:
<SO2#>#AURV-UG/M3#####285#m########200###0#2.7567E-02#2.0327E-05#1.7712E+00#1.0000E+00###4.6113E-01###7.4498E-07#5.8594E-03#6.9177E-06
...a bunch like this with the 1.0000E+00
then it jumps to 9.9993E-01 like so:
<AM25>#AURV-UG/M3#####195#m########300###0#1.1366E-01#1.0730E-04#2.0164E-03#9.9993E-01###2.7961E-02##-1.5490E-06#6.3782E-02#1.6199E-05
then it sorts correctly from that point on. Keep in mind not all of the 1.0000E+00 are at the top...there are some below as well when the sorting gets up to that number.
The range of that column is actually 9.9993E-01 to 1.0000E+00
Something I noticed was how you are converting all the numbers in the file to their orginal form then sorting the perticular column...but there are some columns which have an exponent like 3.2145E-23...would this mess up the results in any way?
Thank you very much for all your help
|
|

06-07-04, 12:31
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
|
|
I think you problem is with the 'first' field: sometimes what's supposed to be a single field contain a space:
<NH3 > AURV-UG/M3
and sometimes it does not:
<AMTO> AURV-UG/M3
Having an extra spaces 'screws up' the numbering of the following columns that nawk relies on.
You can either make your input file 'well-behaved' making sure that the first field is "contigious" [no embeded space]
OR
do it "on the fly" and then recreate the original format.
Try the code below for the second alternative:
Code:
#!/bin/ksh
c=10
sed -e 's/ >/#>/g' file | nawk -v c=${c} '{a=sprintf("%f", $c);print a " " $0}' | sort -n | cut -d' ' -f2- | sed -e 's/#>/ >/g'
As for the 'sorting' question... make sure you're sorting on the correct column [c=10] as you would expect.
|
|

06-07-04, 12:55
|
|
Registered User
|
|
Join Date: Jun 2004
Posts: 20
|
|
It works! Thats perfect!
There was no way I'd be able to do that on my own.
You've been a great help, thank you very much! 
|
|

06-07-04, 12:59
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
|
|
I'm glad it worked out for you - you're welcome!
vlad
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|