If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > how to count a specific letter or word in a text file

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old
Registered User
 
Join Date: Apr 2004
Posts: 2
how to count a specific letter or word in a text file

i am new to shell scripting and this is killing me.

Let say i have a script which reads:

peter and paul went to the park and peter fell over paul and then they went home.

Know what could i use to count a specific word such as "the" which would return 3 or a specif letter such as "a" whether it is in caps or not.

Also how could i count the number of sentances. define a sentance as a line of strings ending in . ! ? : ;.

I have tried using grep and wc but it just returns the number of overall number of words the specific word appears on.

Any way if you can help i would be very grateful and if i work out the answer before you i will post the reply myself.

thanks.
Reply With Quote
  #2 (permalink)  
Old
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 320
Try the three following awk scripts :


Code:
#!/usr/bin/awk -f
#Usage: count_word word=<word_to_count> input_file(s)
#Assume word is delimited by non alphabetic characters 
NR==1 { re_word = "(^|[^[:alpha:]])" word "([^[:alpha:]]|$)" ;print re_word}
{ count += gsub(re_word,"_") }
END { print count }
Code:
#!/usr/bin/awk -f
#Usage: count_char char=<char_to_count> input_file(s)
NR==1 { char =substr(char,1,1) }
{ count += gsub(char,"_") }
END { print count }
Code:
#!/usr/bin/awk -f
#Usage: count_sentances input_file(s)
{ count += gsub(/.([.!?:]+|$)/,"_") }
END { print count }
__________________
Jean-Pierre.
Reply With Quote
  #3 (permalink)  
Old
Registered User
 
Join Date: Apr 2004
Posts: 2
thanks alot for the reply but their are a few things i do not understand,

i am not used to the awk command so how do i dgo about implementing the scripts into my script

and aslo where do i declare the word i am looking for and where do i declare the text file i want to search.

If you do not mind could you give an example so i can work out how to implement the script in my bourne shell script.

thanks for your time and help.
Reply With Quote
  #4 (permalink)  
Old
Registered User
 
Join Date: Feb 2004
Posts: 5
Re: how to count a specific letter or word in a text file

You can use grep -c:

Code:
sad@nezumi:~$ grep -c the /usr/share/doc/mozilla/release-1.5.html 
3
Reply With Quote
  #5 (permalink)  
Old
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 320
Creates the three script files : count_word count_char and count_sentances.
Make them executable with chmod +x.

In your script :
Code:
# Count word 'the' in input_file

the_count=`count_word word="the" input_file`

# Count char 'a' in input_file

a_count=`wount_char char='a' input_file`

# Count sentances in input_file

sentance_count=`count_sentances input_file`

nezumi, grep -c count lines not words or sentances.
__________________
Jean-Pierre.
Reply With Quote
  #6 (permalink)  
Old
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 320
Quote:
Private message posted by laide
hi thanks for the reply but when i try to execute the command i keep getting the error message;

./name_of_file: count_word : command not found

this is the script i entered maybe their is a problem that i cannot see but you can,

# count word 'the' in input_file

the_count=`count_word word= "the word that i am looking for" input file`

i not sure if this is right could you please let me know.

thanks alot i am very grateful
1) Verify that the directory where you put the script is in the PATH.
If it's not the case add the directory to the path or specify full path when executing the script.

2)The first line of the script specify the full path of awk. Verify that the path is correct (which awk or which gawk).
__________________
Jean-Pierre.
Reply With Quote
  #7 (permalink)  
Old
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 320
Quote:
Private message posted by laide
do i execute it like ./name of script file

or awk -f name of script file

thanlks
./name

The first line in the script (#!)specify to the shell the path of the command to use to proceed the script :
/usr/bin/awk -f
__________________
Jean-Pierre.
Reply With Quote
  #8 (permalink)  
Old
Registered User
 
Join Date: Apr 2004
Posts: 4
the anwsers on this forum are so hard to under stand.

here is a simple specific letter count command that works for me
cat > specific_word_count
echo “enter the file-name you would like to count” #enter file name
read file
echo "Enter the letter to be counted:" #enter letter
read letter
new=$letter@
echo "the number of letter(s)” $letter “are "
sed “s/[ ]$letter[ , .]/ $new/g” $file | tr ‘@’ ‘/12’ | grep $letter $file | wc -l
echo "enter any key to continue"
read key

copy and pate it and it should work

now please someone post a simple to understand sentence count like the one above. i'm stuck with that as well
Reply With Quote
  #9 (permalink)  
Old
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 320
Sorry jayroc2k, but i consider my solution is easier and more readable than yours.
If you work with UNIX, awk is an essential tool to know.


how to count a specific letter in a text file

Assume that the script file is named 'count_char' and is executable (chmod +x count_char)

Code:
01 #!/usr/bin/awk -f
02 #Usage: count_char char=<char_to_count> input_file(s)
03 NR==1 { char =substr(char,1,1) }
04 { count += gsub(char,"_") }
05 END { print count }
01 #!/usr/bin/awk -f
If the first line of a script begins with the two characters `#!', the remainder of the line specifies an interpreter for the program. The interpreter for this cript is awk (perhaps nawk or [gawk] on your system).

When you execute the script :
count_char args...
the shell run :
/usr/bin/awk -f count_char args...

02 #Usage: count_char char=<char_to_count> input_file(s)

The comment line show the calling syntax for this script (if the file is not in the PATH, speecify full or relative path for the file name).

For example, if you want to count the letter 'a' in the file 'article.txt', you can do :

count_char char="a" article.txt
./tools/count_char char=a article.txt
a_count=`count_char char=a article.txt`

The assigment 'char=a' defines and initialize the variable 'char' that will be used in the awk script.

03 NR==1 { char =substr(char,1,1) }
An awk script of a series of "rules". Each rule specifies one pattern to search for, and one action to perform when that pattern is found.
Syntactically, a rule consists of a pattern followed by an action.
The action is enclosed in curly braces to separate it from the pattern.
Rules are usually separated by newlines. Therefore, an `awk' program looks like this:

PATTERN { ACTION }
PATTERN { ACTION }
...

If the PATTERN is omited, the action applies on every line on the input file (see line 04).
If the ACTION is omited, the selected record is printed.
The special pattern END specify the action to execute when the last line of the last input file
has been processed (see line 05).

The line 03 is a rule that specify the action that must be excuted for when the first line of the first input file is read. The variable NR is the number of input records 'awk' has processed since the beginning of the program's execution, 'NR==1'.

The variable 'count_char' contains the letter to count.
We keep only the first character of 'count_char'.

04 { count += gsub(char,"_") }
There is no PATTERN specified, so the ACTION is executed for all input records.
The 'gsub(char,'_')' function call replaces all the characters 'char' by "_" in the input record and returns the number of substitutions made.
The number of substitution (which is the char count in the record) is cumulked in the 'count' variable.

'count += gsub()' is the same thing that 'count = count + gsub()'
The 'count' variable is initialized to zero the first time it is used.

05 END { print count }
When all input records have been proceed, the number of times the letter (variable 'char') appears in the input file(s) is printed (variable 'count')


how to count a specific word in a text file

Assume that the script file is named 'count_word' and is executable (chmod +x count_char)

Code:
01 #!/usr/bin/awk -f
02 #Usage: count_word word=<word_to_count> input_file(s)
03 NR==1 { re_word = "(^|[^[:alpha:]])" word "([^[:alpha:]]|$)" }
04 { count += gsub(re_word,"_") }
05 END { print count }
02 #Usage: count_word word=<word_to_count> input_file(s)
Comment line that specify script usage.
For example, if you want to count the word 'the' in the file 'article.txt', you can do :

count_word word="the" article.txt
the_count=`count_word char=word article.txt`

The assigment 'word="the"' defines and initialize the variable 'word' that will be used in the awk script.

03 NR==1 { re_word = "(^|[^[:alpha:]])" word "([^[:alpha:]]|$)" }
We assume that a word is delimited by non alphabetics characters (alphabetics characters are A to Z upper and lower case).

When the first record is read, we initialize the 're_word' variable which will be used as a pattern to select words. The pattern is a regular expression :
[:alpha:] => alphabetic character
[^[:alpha:]] => non alphhabetic character ('^' means any characters *except*)
^ => beginning of record
(^|[^[:alpha:]]) => begining of record or non alphabetic character
$ => end of record
([^[:alpha:]]|$) => non alphabetic character or end of record
"(^|[^[:alpha:]])" word "([^[:alpha:]]|$)" => Searched word delimited by non alphabetics characters (or begining or end of record).

04 { count += gsub(re_word,"_") }
All the occurences of word are substitued by _ and the number of words is cumulated in the 'count' variable.

how to count sentances in a text file

Assume that the script file is named 'count_sentances' and is executable (chmod +x count_char)

Code:
01 #!/usr/bin/awk -f
02 #Usage: count_sentances input_file(s)
03 { count += gsub(/.([.!?:;]+|$)/,"_") }
04 END { print count }
02 #Usage: count_sentances input_file(s)
Comment line that specify script usage.
For example, if you want to count the sentances in the file 'article.txt', you can do :

count_sentances article.txt
s_count=`count_sentances article.txt`
03 { count += gsub(/.([.!?:;]+|$)/,"_") }
A sentance as a suit of strings ending in . ! ? : ; or end of record.
[.!?:;] => ending character
[.!?:;]+ => one or more consecutives ending characters
([.!?:;]+|$) => one or more consecutives ending characters or end of record.
. => single character
.([.!?:;]+|$) => a character followed by end of sentance.
/.([.!?:;]+|$)/ => regular expression

If you consider that a sentance may split overs records, you can simplify the re:
/.[.!?:;]+/

The sentances are replaced by "_" and number of sentances is cumulated in the 'count variable that will be printed by line 04.


Sorry for my very bad english
__________________
Jean-Pierre.
Reply With Quote
  #10 (permalink)  
Old
Registered User
 
Join Date: Apr 2004
Posts: 4
i tried it

01 #!/usr/bin/awk -f
02 #Usage: count_char char=<char_to_count> input_file(s)
03 NR==1 { char =substr(char,1,1) }
04 { count += gsub(char,"_") }
05 END { print count }

i could not get it to work, it seems like one one those commands you have to know a little bit more the sed, grep and cat to be able to implement them.
mine like cumbersome but at leasts it works. i still can't get the sentence count. i am not sure how to make a file "executable using the chmod.

can u pls write a sentence count that does not need an executable file to run, but can be run by
sh sentence_count
Reply With Quote
  #11 (permalink)  
Old
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 320
jayroc2k , the numbers in the first column of the code are just for the explantions. Don't put them in your script file.

You can also execute the count_sentances like this :

awk -f count_sentances input_file

In that case, the script file don't need to be executable.

Note: If you want to make a file executable (for everybody) : chmod +x file


Another solution (with the same awk script) :

Code:
#!/usr/bin/sh

awk '
{ count += gsub(/.([.!?:]+|$)/,"_") }
END { print count }
' $*
Execute it with :

sh count_sentances input_file(s)


Another way to do the work :

Code:
tr -d '@\012' < input_file | \
sed -e '$s/$/./' -e 's/[.!?:;]\+/@/g' | \
tr '@' '\012' | \
wc -l
__________________
Jean-Pierre.

Last edited by aigles; 04-15-04 at 10:58.
Reply With Quote
  #12 (permalink)  
Old
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 14,445
Quote:
Originally posted by aigles
If you work with UNIX, awk is an essential tool to know.
I feel the same way about Perl that you feel about awk. I still have gawk, and both used and loved it for years on several platforms. Once I made the switch to Perl, I've never really looked back.

Are you familiar with both awk and Perl? If so, why do you choose awk over Perl, other than the fact that awk is more succinct because it is much more special purpose?

-PatP
Reply With Quote
  #13 (permalink)  
Old
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 14,445
Quote:
Originally posted by aigles
Another way to do the work :

Code:
tr -d '@\012' < input_file | \
sed -e '$s/$/./' -e 's/[.!?:;]\+/@/g' | \
tr '@' '\012' | \
wc -l
Won't numbers with embedded "." characters trip this up?

-PatP
Reply With Quote
  #14 (permalink)  
Old
Padawan
 
Join Date: Jun 2002
Location: UK
Posts: 525
Quote:
I feel the same way about Perl that you feel about awk. I still have gawk, and both used and loved it for years on several platforms. Once I made the switch to Perl, I've never really looked back.

Are you familiar with both awk and Perl? If so, why do you choose awk over Perl, other than the fact that awk is more succinct because it is much more special purpose?

-PatP
I don't follow your argument here. I often hear the likes of "Why use Awk when you can use Sed?", which is the converse of what you are trying to say. The argument being that Sed is more efficient but can be difficult to read, whereas Awk is not quite so efficient but is far easier to read.

Perl is great for developing applications but you have the overhead of the runtime compilation which makes it nonsensical to use for simplistic tasks. You might aswell say why use Awk when you can use Java but nobody in their right mind would attempt to perform this task with Java over Awk if they are familiar with both.

You would also have to consider the coding practices of your particular organisation. I can safely say that every organisation working on Unix platforms will have resource capable of writing/understanding Sed and Awk scripts. With Perl, it is a different story and I have worked in at least 2 organisations where maverick programmers have gone off and developed in Perl, leaving the organisation with a headache after they have left!

Just my 2 cents.

Damian

Last edited by Damian Ibbotson; 04-16-04 at 05:34.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On