[python] How to count the number of words in a sentence, ignoring numbers, punctuation and whitespace?

How would I go about counting the words in a sentence? I'm using Python.

For example, I might have the string:

string = "I     am having  a   very  nice  23!@$      day. "

That would be 7 words. I'm having trouble with the random amount of spaces after/before each word as well as when numbers or symbols are involved.

This question is related to python list text counter python-textprocessing

The answer is


Ok here is my version of doing this. I noticed that you want your output to be 7, which means you dont want to count special characters and numbers. So here is regex pattern:

re.findall("[a-zA-Z_]+", string)

Where [a-zA-Z_] means it will match any character beetwen a-z (lowercase) and A-Z (upper case).


About spaces. If you want to remove all extra spaces, just do:

string = string.rstrip().lstrip() # Remove all extra spaces at the start and at the end of the string
while "  " in string: # While  there are 2 spaces beetwen words in our string...
    string = string.replace("  ", " ") # ... replace them by one space!

How about using a simple loop to count the occurrences of number of spaces!?

_x000D_
_x000D_
txt = "Just an example here move along" _x000D_
count = 1_x000D_
for i in txt:_x000D_
if i == " ":_x000D_
   count += 1_x000D_
print(count)
_x000D_
_x000D_
_x000D_


This is a simple word counter using regex. The script includes a loop which you can terminate it when you're done.

#word counter using regex
import re
while True:
    string =raw_input("Enter the string: ")
    count = len(re.findall("[a-zA-Z_]+", string))
    if line == "Done": #command to terminate the loop
        break
    print (count)
print ("Terminated")

s = "I     am having  a   very  nice  23!@$      day. "
sum([i.strip(string.punctuation).isalpha() for i in s.split()])

The statement above will go through each chunk of text and remove punctuations before verifying if the chunk is really string of alphabets.


    def wordCount(mystring):  
        tempcount = 0  
        count = 1  

        try:  
            for character in mystring:  
                if character == " ":  
                    tempcount +=1  
                    if tempcount ==1:  
                        count +=1  

                    else:  
                        tempcount +=1
                 else:
                     tempcount=0

             return count  

         except Exception:  
             error = "Not a string"  
             return error  

    mystring = "I   am having   a    very nice 23!@$      day."           

    print(wordCount(mystring))  

output is 8


You can use regex.findall():

import re
line = " I am having a very nice day."
count = len(re.findall(r'\w+', line))
print (count)

import string 

sentence = "I     am having  a   very  nice  23!@$      day. "
# Remove all punctuations
sentence = sentence.translate(str.maketrans('', '', string.punctuation))
# Remove all numbers"
sentence = ''.join([word for word in sentence if not word.isdigit()])
count = 0;
for index in range(len(sentence)-1) :
    if sentence[index+1].isspace() and not sentence[index].isspace():
        count += 1 
print(count)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to text

Difference between opening a file in binary vs text How do I center text vertically and horizontally in Flutter? How to `wget` a list of URLs in a text file? Convert txt to csv python script Reading local text file into a JavaScript array Python: How to increase/reduce the fontsize of x and y tick labels? How can I insert a line break into a <Text> component in React Native? How to split large text file in windows? Copy text from nano editor to shell Atom menu is missing. How do I re-enable

Examples related to counter

HTML/Javascript Button Click Counter How to sort Counter by value? - python How to count the number of words in a sentence, ignoring numbers, punctuation and whitespace? Counter increment in Bash loop not working Get loop counter/index using for…of syntax in JavaScript Count characters in textarea Counter in foreach loop in C# jQuery counter to count up to a target number How to count the frequency of the elements in an unordered list? Code for a simple JavaScript countdown timer?

Examples related to python-textprocessing

How to count the number of words in a sentence, ignoring numbers, punctuation and whitespace?