Quote:
|
Originally Posted by mike_bike_kite
To get the median you need to get the length of each sequence, sort it, then pick the middle value if it's an odd number of rows or average the two middle values if it's even. You're unlikely to be able to do this in a single line of script hence I produced a small program for you. I've tried to improve the median part of the code and came up with this :
Code:
#!/bin/sh
echo "GAAAAGAGGA
ATATTAGGTTTTTAC
TATATTTAACGCGAATGATT" > original_file.dat
# show average
cat original_file.dat | \
awk '
BEGIN { total=0 }
{ total = total + length($0) }
END { print "AVG=" total/NR}'
# get lengths and sort them
cat original_file.dat | \
awk '{ print length($0) }' | \
sort -n \
> tmp.dat
# how many recs in file
RECS=`cat tmp.dat | wc -l`
ODD_NUM=`expr $RECS % 2`
CUTOFF_POINT=`expr $RECS / 2`
CUTOFF_POINT=`expr $CUTOFF_POINT + 1`
if test $ODD_NUM -eq 0
then
cat tmp.dat | \
head -$CUTOFF_POINT | \
tail -2 | \
awk '
BEGIN { total=0 }
{ total = total + $0 }
END { print "MEDIAN=" total/NR}'
else
echo "MEDIAN="`cat tmp.dat | head -$CUTOFF_POINT | tail -1`
fi
exit
You still haven't explained what you're doing with DNA - is it anything interesting?
|
Yup. I deal with the DNA assignment that use the UNIX command line to find out the length of median.
Mike, I try the program that you modified already. The result end up just show that "MEDIAN=ATATTAGGTTTTTAC". How come will like this? Actually I planned that the answer is showed "MEDIAN=15". The program for average is worked d. You got any better suggestion to solve this problem? Really thanks a lot for your advise. Have a nice day.