To get the median you need to get the length of each sequence, sort it, then pick the middle value if it's an odd number of rows or average the two middle values if it's even. You're unlikely to be able to do this in a single line of script hence I produced a small program for you. I've tried to improve the median part of the code and came up with this :
Code:
#!/bin/sh
echo "GAAAAGAGGA
ATATTAGGTTTTTAC
TATATTTAACGCGAATGATT" > original_file.dat
# show average
cat original_file.dat | \
awk '
BEGIN { total=0 }
{ total = total + length($0) }
END { print "AVG=" total/NR}'
# get lengths and sort them
cat original_file.dat | \
awk '{ print length($0) }' | \
sort -n \
> tmp.dat
# how many recs in file
RECS=`cat tmp.dat | wc -l`
ODD_NUM=`expr $RECS % 2`
CUTOFF_POINT=`expr $RECS / 2`
CUTOFF_POINT=`expr $CUTOFF_POINT + 1`
if test $ODD_NUM -eq 0
then
cat tmp.dat | \
head -$CUTOFF_POINT | \
tail -2 | \
awk '
BEGIN { total=0 }
{ total = total + $0 }
END { print "MEDIAN=" total/NR}'
else
echo "MEDIAN="`cat tmp.dat | head -$CUTOFF_POINT | tail -1`
fi
exit
You still haven't explained what you're doing with DNA - is it anything interesting?