Welcome to the dBforums forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact support.

If you prefer not to see double-underlined words and corresponding ads, place your cursor
here for ContentLink opt out.

Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > sorting SN Numbers

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-03-04, 10:56
nullptr nullptr is offline
Registered User
 
Join Date: Jun 2004
Posts: 20
sorting SN Numbers

Hello everyone,
I was just wondering if anyone would happen to know how to sort scientific notation in a text file.
I have numbers that are of the form:
1.0000E+00
9.9999E-01
1.0000E+01
1.0030E+01
9.9998E-01
1.0004E+03
6.0000E+00
9.9999E-02

(for anyone who is unclear: 9.999E-2 = 9.999 x 10^(-2) = 0.09999)
So as you can see sorting this is a bit more difficult than just doing a normal sort since there are exponents. so I figured why not just sort the column representing the exponents.
so this is what I tried:
sort +0.6 -0.11 infile.txt
and the output recieved was:
1.0000E+00
6.0000E+00
1.0000E+01
1.0030E+01
1.0004E+03
9.9998E-01
9.9999E-01
9.9999E-02
as you can see, it sorts the +'s and -'s first, and then the numbers after the + or - get put into accending order. Note how the accending order resets once it gets to "-".

The output I want would be:
1.0004E+03
1.0030E+01
1.0000E+01
6.0000E+00
1.0000E+00
9.9998E-01
9.9999E-01
9.9999E-02
where the exponents are ordered numerically correct. (if it was ordered from lowest to highest that would be fine too)
I know there is a sort -n infile.txt which sorts numerically, but it doesnt handle the +'s and -'s
In fact, if I replace the +'s with just spaces the -n does handle the negative values just fine. but this is going to be used on a Large file which has +'s and -'s in other areas so I cant just replace all the +'s with spaces...
Please keep in mind Im showing only one column of many columns that are similar to this one.

If someone could help, it would be greatly appreciated.
Thanks a lot,
nullpointer
Reply With Quote
  #2 (permalink)  
Old 06-03-04, 15:24
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
how about:

Code:
nawk '{printf("%f\n", $0)}' file.txt | sort -n | nawk '{printf("%.4E\n", $0)}'
Reply With Quote
  #3 (permalink)  
Old 06-03-04, 16:55
nullptr nullptr is offline
Registered User
 
Join Date: Jun 2004
Posts: 20
sorting

wow, thats great. It works perfectly. Thanks a lot.
Is there any type of command so I can run this set of commands on a specific column in a text file?
Ultimatly I have many columns of these numbers, and I want to sort all the rows by a certain column.

Your help is much appreciated.
Thanks
Reply With Quote
  #4 (permalink)  
Old 06-03-04, 17:06
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
the solution assumes SINGLE column of input - not 'embedded' numbers in a text file.
To do what I THINK you want might be a bit more difficult, but not by too much

Pls provide a sample file.
Reply With Quote
  #5 (permalink)  
Old 06-03-04, 17:28
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
Quote:
Originally Posted by nullptr
wow, thats great. It works perfectly. Thanks a lot.
Is there any type of command so I can run this set of commands on a specific column in a text file?
Ultimatly I have many columns of these numbers, and I want to sort all the rows by a certain column.

Your help is much appreciated.
Thanks

assuming a sample file like this:
a 1.0000E+00 a
b 9.9999E-01 b
c 1.0000E+01 c
e 1.0030E+01 e
f 9.9998E-01 f
g 1.0004E+03 g
h 6.0000E+00 h
i 9.9999E-02 i

and sorting on the SECOND column - here's the modified solution.

Code:
c=2 nawk -v c=${c}'{$c=sprintf("%f", $c);print}' file | sort -t ' ' -n -k ${c},${c} | nawk -v c=${c} '{$c=sprintf("%.4E", $c);print}'
Reply With Quote
  #6 (permalink)  
Old 06-04-04, 09:35
nullptr nullptr is offline
Registered User
 
Join Date: Jun 2004
Posts: 20
Quote:
Originally Posted by vgersh99
the solution assumes SINGLE column of input - not 'embedded' numbers in a text file.
To do what I THINK you want might be a bit more difficult, but not by too much

Pls provide a sample file.

Sorry, I should have been more specific. Ok so lets say I have a file that contains columns with numbers in Scientific Notation:

Col1 Col2 Col3 Col4
1.0000E+00 2.3333E-02 1.4500E-05 2.1233E-02
9.9999E-01 3.6286E+03 1.0023E+03 1.6643E-02
1.0000E+01 9.8754E+00 1.0234E-02 2.2845E-02
1.0030E+01 3.7896E-01 1.9934E+08 2.3443E-02
9.9998E-01 2.6541E-02 1.5386E+05 3.3333E-02
1.3434E+03 1.9999E+04 1.1275E+07 2.3333E-02
6.0000E+00 1.8441E-01 9.9999E-06 2.3333E-02
9.9999E-02 1.8441E+01 1.1111E-01 4.3333E-02

all the labels (ex. Col1, Col2...etc) should be above their corresponding column...
Note: Im not sure if it makes a difference but all columns are not seperated by just 1 space there are some with multiple spaces.

so how would I be able to sort this whole file by Col3?
Thanks so much for the help

Last edited by nullptr : 06-04-04 at 09:51.
Reply With Quote
  #7 (permalink)  
Old 06-04-04, 09:49
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
Have you tried the latest post with c=3 ?

Code:
c=3 nawk -v c=${c} '{$c=sprintf("%f", $c);print}' file | sort -t ' ' -n -k ${c},${c} | nawk -v c=${c} '{$c=sprintf("%.4E", $c);print}'
Reply With Quote
  #8 (permalink)  
Old 06-04-04, 10:10
nullptr nullptr is offline
Registered User
 
Join Date: Jun 2004
Posts: 20
Quote:
Originally Posted by vgersh99
Have you tried the latest post with c=3 ?

Code:
c=3 nawk -v c=${c} '{$c=sprintf("%f", $c);print}' file | sort -t ' ' -n -k ${c},${c} | nawk -v c=${c} '{$c=sprintf("%.4E", $c);print}'

yeah I tried that. It works for the test file. But when I try it on my actual file which has the same format except, there are more spaces between certain columns and much larger...what this code does is reduce all the spacing between the columns to 1 space. which kinda makes a mess of things.
Reply With Quote
  #9 (permalink)  
Old 06-04-04, 10:53
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
could you post a sample file with some rows with single spaces between columns AND some with multiple spaces, pls!

Also, are all your columns of a certain width? Maybe padded with extra spaces?
Reply With Quote
  #10 (permalink)  
Old 06-04-04, 11:45
nullptr nullptr is offline
Registered User
 
Join Date: Jun 2004
Posts: 20
sorry of the inconvienience. ok, since all my spaces get compressed to 1 space when I try to show you the spacing..Im going to be using "#" to represent how many spaces there are. (Maximize your window to see this better)
2.2012E+00#1.0000E+00###3.2330E-01###2.7671E-06#2.3437E-02#1.4687E-05
that is one line...
2.0306E-03#9.9993E-01###2.8104E-02##-1.6147E-06#6.4133E-02#1.6365E-05
and that would be the second line I underlined what column I was looking to sort and I made bold the differences.
and now heres a bunch more just so you'd be able to see the sort better...
2.0327E-05#1.7712E+00#1.0000E+00###4.6113E-01###7.4498E-07#5.8594E-03#6.9177E-06
1.4080E-07#4.1622E-01#1.0000E+00###4.0306E-01##-5.3213E-08#9.7656E-04#5.3213E-08
7.5148E-05#2.2012E+00#1.0000E+00###3.2330E-01###2.7671E-06#2.3437E-02#1.4687E-05
1.0866E-04#2.0306E-03#9.9993E-01###2.8104E-02##-1.6147E-06#6.4133E-02#1.6365E-05
1.0730E-04#2.0164E-03#9.9993E-01###2.7961E-02##-1.5490E-06#6.3782E-02#1.6199E-05
1.0777E-04#2.0328E-03#9.9993E-01###2.8131E-02##-1.6138E-06#6.4148E-02#1.6369E-05
5.3249E-05#7.1753E-01#1.0000E+00###3.7768E-01##-2.3946E-07#2.0996E-02#5.9332E-06
1.3083E-05#1.2323E-01#1.0000E+00###2.7064E-01##-2.6606E-08#5.8594E-03#1.6496E-06

I hope this helps, thanks again for the help
The column I want to sort (underlined above) always stays the same length, and never gets shifted from any other column changing, say if a number gets bigger in another column. This will not effect the position of the other columns.

Last edited by nullptr : 06-25-04 at 10:26.
Reply With Quote
  #11 (permalink)  
Old 06-04-04, 19:21
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
ok, how about:

Code:
#!/bin/ksh c=10 nawk -v c=${c} '{a=sprintf("%f", $c);print a, $0}' file | sort -n | cut -d' ' -f2-
Reply With Quote
  #12 (permalink)  
Old 06-07-04, 11:55
nullptr nullptr is offline
Registered User
 
Join Date: Jun 2004
Posts: 20
slight bug

That code is really close.

but there is a small problem...There are just over 2000 of those lines in the file Im sorting. The majority of those lines have a "1.0000E+00" but in the whole file there are roughly 100 that are "9.9999E-01" down to "9.9993E-01".
when I ran your code, it put a bunch of the "1.0000E+00" at the top...then it sorted correctly.
Just to be clear...this is what I saw:
<SO2#>#AURV-UG/M3#####285#m########200###0#2.7567E-02#2.0327E-05#1.7712E+00#1.0000E+00###4.6113E-01###7.4498E-07#5.8594E-03#6.9177E-06
...a bunch like this with the 1.0000E+00
then it jumps to 9.9993E-01 like so:
<AM25>#AURV-UG/M3#####195#m########300###0#1.1366E-01#1.0730E-04#2.0164E-03#9.9993E-01###2.7961E-02##-1.5490E-06#6.3782E-02#1.6199E-05
then it sorts correctly from that point on. Keep in mind not all of the 1.0000E+00 are at the top...there are some below as well when the sorting gets up to that number.
The range of that column is actually 9.9993E-01 to 1.0000E+00

Something I noticed was how you are converting all the numbers in the file to their orginal form then sorting the perticular column...but there are some columns which have an exponent like 3.2145E-23...would this mess up the results in any way?

Thank you very much for all your help
Reply With Quote
  #13 (permalink)  
Old 06-07-04, 12:31
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
I think you problem is with the 'first' field: sometimes what's supposed to be a single field contain a space:
<NH3 > AURV-UG/M3

and sometimes it does not:
<AMTO> AURV-UG/M3

Having an extra spaces 'screws up' the numbering of the following columns that nawk relies on.

You can either make your input file 'well-behaved' making sure that the first field is "contigious" [no embeded space]
OR
do it "on the fly" and then recreate the original format.

Try the code below for the second alternative:
Code:
#!/bin/ksh c=10 sed -e 's/ >/#>/g' file | nawk -v c=${c} '{a=sprintf("%f", $c);print a " " $0}' | sort -n | cut -d' ' -f2- | sed -e 's/#>/ >/g'

As for the 'sorting' question... make sure you're sorting on the correct column [c=10] as you would expect.
Reply With Quote
  #14 (permalink)  
Old 06-07-04, 12:55
nullptr nullptr is offline
Registered User
 
Join Date: Jun 2004
Posts: 20
Talking

It works! Thats perfect!
There was no way I'd be able to do that on my own.
You've been a great help, thank you very much!
Reply With Quote
  #15 (permalink)  
Old 06-07-04, 12:59
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 324
I'm glad it worked out for you - you're welcome!

vlad
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On