Results 1 to 8 of 8

013009, 01:44 #1Registered User
 Join Date
 Jan 2009
 Posts
 4
Unanswered: Urgent: Unix Median And Average/mean Problem. Urgent need help from UNIX expert...
Median Problem:
GAAAAGAGGA
ATATTAGGTTTTTAC
TATATTTAACGCGAATGATT
At the above three sequence, ATATTAGGTTTTTAC is the median and its length is 15. How can I use the unix script to automatic show that the length of the median is 15?What command line I should type?
Mean problem:
GAAAAGAGGA
ATATTAGGTTTTTAC
TATATTTAACGCGAATGATT
The average/mean of the above sequence is 15. How can I use the unix script to automatic calculate the average/mean of the sequence is 15?What command line I should type? My senior advised me to use awk command line, but I don't know how to type it out. No matter what command line used, as long as can solve this problem. Really thanks all of your advise.

013009, 05:11 #2vaguely human
 Join Date
 Jun 2007
 Location
 London
 Posts
 2,527
My senior advised me to use awk command line, but I don't know how to type it out. No matter what command line used, as long as can solve this problemAverage is quite easy but median is a bit more difficult. I had to remind myself what the median was by looking it up on the web  I found about 5 forums with your question on! I think I'm reasonably close with the following :
Code:#!/bin/sh echo "GAAAAGAGGA ATATTAGGTTTTTAC TATATTTAACGCGAATGATT" > original_file.dat # show average cat original_file.dat  \ awk ' BEGIN { total=0 } { total = total + length($0) } END { print "AVG=" total/NR}' # get lengths and sort them cat original_file.dat  \ awk '{ print length($0) }'  \ sort n \ > tmp.dat # how many recs in file RECS=`cat tmp.dat  wc l` RECS=`expr $RECS / 2` if test `expr $RECS % 2` eq 0 then # if record length is even then average two middle values RECS=`expr $RECS + 1` cat tmp.dat  head $RECS  tail 2  awk ' BEGIN { total=0 } { total = total + $0 } END { print "MEDIAN=" total/NR}' else # else just use middle value echo "MEDIAN="`cat tmp.dat  head $RECS  tail 1` fi exit
Mike

020209, 20:22 #3Registered User
 Join Date
 Jan 2009
 Posts
 4
Originally Posted by mike_bike_kite

020309, 04:54 #4vaguely human
 Join Date
 Jun 2007
 Location
 London
 Posts
 2,527
Originally Posted by patrick chiaIt's been 5 days since I gave you a complete solution to your problem. Something tells me I wouldn't even of got this note of thanks if it wasn't for the fact you can't get the program to work!
Originally Posted by patrick chiaWhy can't it work? it works perfectly well for me when I tried it with your data and more complex examples. I assume you realise it's a program and not something you type out line by line.
Originally Posted by patrick chiaUnix generally supplies an error message that indicates the problem. I suggest you give me this. Is the tmp.dat file being created and does it contain data? What does the $RECS variable contain after it is set. Are you passing your data through the program?
Originally Posted by patrick chiaNo  the program given works perfectly well as it stands.

020309, 20:03 #5Registered User
 Join Date
 Jan 2009
 Posts
 4
Originally Posted by mike_bike_kite
I try it again the way you teach me d. It's worked now. But the median is 10 instead of 15. You know what problem is going on? The average is calculated correctly which is 15.
Mike, you got any better suggestion to make the median is 15 instead of 10. Hope you can help me think of one advance command line that I can apply to find out the median and average for another huge file at next time. Really thanks for your help.

020409, 05:28 #6vaguely human
 Join Date
 Jun 2007
 Location
 London
 Posts
 2,527
To get the median you need to get the length of each sequence, sort it, then pick the middle value if it's an odd number of rows or average the two middle values if it's even. You're unlikely to be able to do this in a single line of script hence I produced a small program for you. I've tried to improve the median part of the code and came up with this :
Code:#!/bin/sh echo "GAAAAGAGGA ATATTAGGTTTTTAC TATATTTAACGCGAATGATT" > original_file.dat # show average cat original_file.dat  \ awk ' BEGIN { total=0 } { total = total + length($0) } END { print "AVG=" total/NR}' # get lengths and sort them cat original_file.dat  \ awk '{ print length($0) }'  \ sort n \ > tmp.dat # how many recs in file RECS=`cat tmp.dat  wc l` ODD_NUM=`expr $RECS % 2` CUTOFF_POINT=`expr $RECS / 2` CUTOFF_POINT=`expr $CUTOFF_POINT + 1` if test $ODD_NUM eq 0 then cat tmp.dat  \ head $CUTOFF_POINT  \ tail 2  \ awk ' BEGIN { total=0 } { total = total + $0 } END { print "MEDIAN=" total/NR}' else echo "MEDIAN="`cat tmp.dat  head $CUTOFF_POINT  tail 1` fi exit

020409, 19:55 #7Registered User
 Join Date
 Jan 2009
 Posts
 4
Originally Posted by mike_bike_kite
Mike, I try the program that you modified already. The result end up just show that "MEDIAN=ATATTAGGTTTTTAC". How come will like this? Actually I planned that the answer is showed "MEDIAN=15". The program for average is worked d. You got any better suggestion to solve this problem? Really thanks a lot for your advise. Have a nice day.

020509, 04:16 #8vaguely human
 Join Date
 Jun 2007
 Location
 London
 Posts
 2,527
Originally Posted by patrick chiaI just copied the program above and pasted it into a file called tmp.sh
I then ran it and it produced the following :
Code:# sh tmp.sh AVG=15 MEDIAN=15
Code:# sh tmp.sh AVG=12 MEDIAN=12.5
Originally Posted by patrick chiaI suggest you simply paste your codes into Excel and use the functions that excel provides to do what you need. Alternatively you can ask your senior to show you how the above program is run. If you need to know Unix shell scripting for your course then I can recommend Unix Programming Environment.