Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    Join Date
    Apr 2004
    Posts
    2

    how to count a specific letter or word in a text file

    i am new to shell scripting and this is killing me.

    Let say i have a script which reads:

    peter and paul went to the park and peter fell over paul and then they went home.

    Know what could i use to count a specific word such as "the" which would return 3 or a specif letter such as "a" whether it is in caps or not.

    Also how could i count the number of sentances. define a sentance as a line of strings ending in . ! ? : ;.

    I have tried using grep and wc but it just returns the number of overall number of words the specific word appears on.

    Any way if you can help i would be very grateful and if i work out the answer before you i will post the reply myself.

    thanks.

  2. #2
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    Try the three following awk scripts :


    Code:
    #!/usr/bin/awk -f
    #Usage: count_word word=<word_to_count> input_file(s)
    #Assume word is delimited by non alphabetic characters 
    NR==1 { re_word = "(^|[^[:alpha:]])" word "([^[:alpha:]]|$)" ;print re_word}
    { count += gsub(re_word,"_") }
    END { print count }
    Code:
    #!/usr/bin/awk -f
    #Usage: count_char char=<char_to_count> input_file(s)
    NR==1 { char =substr(char,1,1) }
    { count += gsub(char,"_") }
    END { print count }
    Code:
    #!/usr/bin/awk -f
    #Usage: count_sentances input_file(s)
    { count += gsub(/.([.!?:]+|$)/,"_") }
    END { print count }
    Jean-Pierre.

  3. #3
    Join Date
    Apr 2004
    Posts
    2
    thanks alot for the reply but their are a few things i do not understand,

    i am not used to the awk command so how do i dgo about implementing the scripts into my script

    and aslo where do i declare the word i am looking for and where do i declare the text file i want to search.

    If you do not mind could you give an example so i can work out how to implement the script in my bourne shell script.

    thanks for your time and help.

  4. #4
    Join Date
    Feb 2004
    Posts
    5

    Re: how to count a specific letter or word in a text file

    You can use grep -c:

    Code:
    sad@nezumi:~$ grep -c the /usr/share/doc/mozilla/release-1.5.html 
    3

  5. #5
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    Creates the three script files : count_word count_char and count_sentances.
    Make them executable with chmod +x.

    In your script :
    Code:
    # Count word 'the' in input_file
    
    the_count=`count_word word="the" input_file`
    
    # Count char 'a' in input_file
    
    a_count=`wount_char char='a' input_file`
    
    # Count sentances in input_file
    
    sentance_count=`count_sentances input_file`

    nezumi, grep -c count lines not words or sentances.
    Jean-Pierre.

  6. #6
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    Private message posted by laide
    hi thanks for the reply but when i try to execute the command i keep getting the error message;

    ./name_of_file: count_word : command not found

    this is the script i entered maybe their is a problem that i cannot see but you can,

    # count word 'the' in input_file

    the_count=`count_word word= "the word that i am looking for" input file`

    i not sure if this is right could you please let me know.

    thanks alot i am very grateful
    1) Verify that the directory where you put the script is in the PATH.
    If it's not the case add the directory to the path or specify full path when executing the script.

    2)The first line of the script specify the full path of awk. Verify that the path is correct (which awk or which gawk).
    Jean-Pierre.

  7. #7
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    Private message posted by laide
    do i execute it like ./name of script file

    or awk -f name of script file

    thanlks
    ./name

    The first line in the script (#!)specify to the shell the path of the command to use to proceed the script :
    /usr/bin/awk -f
    Jean-Pierre.

  8. #8
    Join Date
    Apr 2004
    Posts
    4
    the anwsers on this forum are so hard to under stand.

    here is a simple specific letter count command that works for me
    cat > specific_word_count
    echo “enter the file-name you would like to count” #enter file name
    read file
    echo "Enter the letter to be counted:" #enter letter
    read letter
    new=$letter@
    echo "the number of letter(s)” $letter “are "
    sed “s/[ ]$letter[ , .]/ $new/g” $file | tr ‘@’ ‘/12’ | grep $letter $file | wc -l
    echo "enter any key to continue"
    read key

    copy and pate it and it should work

    now please someone post a simple to understand sentence count like the one above. i'm stuck with that as well

  9. #9
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    Sorry jayroc2k, but i consider my solution is easier and more readable than yours.
    If you work with UNIX, awk is an essential tool to know.


    how to count a specific letter in a text file

    Assume that the script file is named 'count_char' and is executable (chmod +x count_char)

    Code:
    01 #!/usr/bin/awk -f
    02 #Usage: count_char char=<char_to_count> input_file(s)
    03 NR==1 { char =substr(char,1,1) }
    04 { count += gsub(char,"_") }
    05 END { print count }
    01 #!/usr/bin/awk -f
    If the first line of a script begins with the two characters `#!', the remainder of the line specifies an interpreter for the program. The interpreter for this cript is awk (perhaps nawk or [gawk] on your system).

    When you execute the script :
    count_char args...
    the shell run :
    /usr/bin/awk -f count_char args...

    02 #Usage: count_char char=<char_to_count> input_file(s)

    The comment line show the calling syntax for this script (if the file is not in the PATH, speecify full or relative path for the file name).

    For example, if you want to count the letter 'a' in the file 'article.txt', you can do :

    count_char char="a" article.txt
    ./tools/count_char char=a article.txt
    a_count=`count_char char=a article.txt`

    The assigment 'char=a' defines and initialize the variable 'char' that will be used in the awk script.

    03 NR==1 { char =substr(char,1,1) }
    An awk script of a series of "rules". Each rule specifies one pattern to search for, and one action to perform when that pattern is found.
    Syntactically, a rule consists of a pattern followed by an action.
    The action is enclosed in curly braces to separate it from the pattern.
    Rules are usually separated by newlines. Therefore, an `awk' program looks like this:

    PATTERN { ACTION }
    PATTERN { ACTION }
    ...

    If the PATTERN is omited, the action applies on every line on the input file (see line 04).
    If the ACTION is omited, the selected record is printed.
    The special pattern END specify the action to execute when the last line of the last input file
    has been processed (see line 05).

    The line 03 is a rule that specify the action that must be excuted for when the first line of the first input file is read. The variable NR is the number of input records 'awk' has processed since the beginning of the program's execution, 'NR==1'.

    The variable 'count_char' contains the letter to count.
    We keep only the first character of 'count_char'.

    04 { count += gsub(char,"_") }
    There is no PATTERN specified, so the ACTION is executed for all input records.
    The 'gsub(char,'_')' function call replaces all the characters 'char' by "_" in the input record and returns the number of substitutions made.
    The number of substitution (which is the char count in the record) is cumulked in the 'count' variable.

    'count += gsub()' is the same thing that 'count = count + gsub()'
    The 'count' variable is initialized to zero the first time it is used.

    05 END { print count }
    When all input records have been proceed, the number of times the letter (variable 'char') appears in the input file(s) is printed (variable 'count')


    how to count a specific word in a text file

    Assume that the script file is named 'count_word' and is executable (chmod +x count_char)

    Code:
    01 #!/usr/bin/awk -f
    02 #Usage: count_word word=<word_to_count> input_file(s)
    03 NR==1 { re_word = "(^|[^[:alpha:]])" word "([^[:alpha:]]|$)" }
    04 { count += gsub(re_word,"_") }
    05 END { print count }
    02 #Usage: count_word word=<word_to_count> input_file(s)
    Comment line that specify script usage.
    For example, if you want to count the word 'the' in the file 'article.txt', you can do :

    count_word word="the" article.txt
    the_count=`count_word char=word article.txt`

    The assigment 'word="the"' defines and initialize the variable 'word' that will be used in the awk script.

    03 NR==1 { re_word = "(^|[^[:alpha:]])" word "([^[:alpha:]]|$)" }
    We assume that a word is delimited by non alphabetics characters (alphabetics characters are A to Z upper and lower case).

    When the first record is read, we initialize the 're_word' variable which will be used as a pattern to select words. The pattern is a regular expression :
    [:alpha:] => alphabetic character
    [^[:alpha:]] => non alphhabetic character ('^' means any characters *except*)
    ^ => beginning of record
    (^|[^[:alpha:]]) => begining of record or non alphabetic character
    $ => end of record
    ([^[:alpha:]]|$) => non alphabetic character or end of record
    "(^|[^[:alpha:]])" word "([^[:alpha:]]|$)" => Searched word delimited by non alphabetics characters (or begining or end of record).

    04 { count += gsub(re_word,"_") }
    All the occurences of word are substitued by _ and the number of words is cumulated in the 'count' variable.

    how to count sentances in a text file

    Assume that the script file is named 'count_sentances' and is executable (chmod +x count_char)

    Code:
    01 #!/usr/bin/awk -f
    02 #Usage: count_sentances input_file(s)
    03 { count += gsub(/.([.!?:;]+|$)/,"_") }
    04 END { print count }
    02 #Usage: count_sentances input_file(s)
    Comment line that specify script usage.
    For example, if you want to count the sentances in the file 'article.txt', you can do :

    count_sentances article.txt
    s_count=`count_sentances article.txt`
    03 { count += gsub(/.([.!?:;]+|$)/,"_") }
    A sentance as a suit of strings ending in . ! ? : ; or end of record.
    [.!?:;] => ending character
    [.!?:;]+ => one or more consecutives ending characters
    ([.!?:;]+|$) => one or more consecutives ending characters or end of record.
    . => single character
    .([.!?:;]+|$) => a character followed by end of sentance.
    /.([.!?:;]+|$)/ => regular expression

    If you consider that a sentance may split overs records, you can simplify the re:
    /.[.!?:;]+/

    The sentances are replaced by "_" and number of sentances is cumulated in the 'count variable that will be printed by line 04.


    Sorry for my very bad english
    Jean-Pierre.

  10. #10
    Join Date
    Apr 2004
    Posts
    4
    i tried it

    01 #!/usr/bin/awk -f
    02 #Usage: count_char char=<char_to_count> input_file(s)
    03 NR==1 { char =substr(char,1,1) }
    04 { count += gsub(char,"_") }
    05 END { print count }

    i could not get it to work, it seems like one one those commands you have to know a little bit more the sed, grep and cat to be able to implement them.
    mine like cumbersome but at leasts it works. i still can't get the sentence count. i am not sure how to make a file "executable using the chmod.

    can u pls write a sentence count that does not need an executable file to run, but can be run by
    sh sentence_count

  11. #11
    Join Date
    Jan 2004
    Location
    Bordeaux, France
    Posts
    320
    jayroc2k , the numbers in the first column of the code are just for the explantions. Don't put them in your script file.

    You can also execute the count_sentances like this :

    awk -f count_sentances input_file

    In that case, the script file don't need to be executable.

    Note: If you want to make a file executable (for everybody) : chmod +x file


    Another solution (with the same awk script) :

    Code:
    #!/usr/bin/sh
    
    awk '
    { count += gsub(/.([.!?:]+|$)/,"_") }
    END { print count }
    ' $*
    Execute it with :

    sh count_sentances input_file(s)


    Another way to do the work :

    Code:
    tr -d '@\012' < input_file | \
    sed -e '$s/$/./' -e 's/[.!?:;]\+/@/g' | \
    tr '@' '\012' | \
    wc -l
    Last edited by aigles; 04-15-04 at 10:58.
    Jean-Pierre.

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    14,914
    Originally posted by aigles
    If you work with UNIX, awk is an essential tool to know.
    I feel the same way about Perl that you feel about awk. I still have gawk, and both used and loved it for years on several platforms. Once I made the switch to Perl, I've never really looked back.

    Are you familiar with both awk and Perl? If so, why do you choose awk over Perl, other than the fact that awk is more succinct because it is much more special purpose?

    -PatP

  13. #13
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    14,914
    Originally posted by aigles
    Another way to do the work :

    Code:
    tr -d '@\012' < input_file | \
    sed -e '$s/$/./' -e 's/[.!?:;]\+/@/g' | \
    tr '@' '\012' | \
    wc -l
    Won't numbers with embedded "." characters trip this up?

    -PatP

  14. #14
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    I feel the same way about Perl that you feel about awk. I still have gawk, and both used and loved it for years on several platforms. Once I made the switch to Perl, I've never really looked back.

    Are you familiar with both awk and Perl? If so, why do you choose awk over Perl, other than the fact that awk is more succinct because it is much more special purpose?

    -PatP
    I don't follow your argument here. I often hear the likes of "Why use Awk when you can use Sed?", which is the converse of what you are trying to say. The argument being that Sed is more efficient but can be difficult to read, whereas Awk is not quite so efficient but is far easier to read.

    Perl is great for developing applications but you have the overhead of the runtime compilation which makes it nonsensical to use for simplistic tasks. You might aswell say why use Awk when you can use Java but nobody in their right mind would attempt to perform this task with Java over Awk if they are familiar with both.

    You would also have to consider the coding practices of your particular organisation. I can safely say that every organisation working on Unix platforms will have resource capable of writing/understanding Sed and Awk scripts. With Perl, it is a different story and I have worked in at least 2 organisations where maverick programmers have gone off and developed in Perl, leaving the organisation with a headache after they have left!

    Just my 2 cents.

    Damian
    Last edited by Damian Ibbotson; 04-16-04 at 05:34.

  15. #15
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •