Count the number of all words in a string

Question

Is there a function to count the number of words in a string  For example   str1  lt -  How many words are in this sentence    to return a result of 7

User · Answer

Try this function from stringi package     require stringi      gt  s  lt - c  Lorem ipsum dolor sit amet  consectetur adipisicing elit                  nibh augue  suscipit a  scelerisque sed  lacinia in  mi                  Cras vel lorem  Etiam pellentesque aliquet tellus                          gt  stri stats latex s          CharsWord CharsCmdEnvir    CharsWhite         Words          Cmds        Envirs                133             0            30            24             0             0

User · Answer

require stringr   Define a very simple function  str words  lt - function sentence       str count sentence         1      Check  str words This is a sentence with six words

User · Answer

str2  lt - gsub    2        str1  length strsplit str2       1      The gsub    2        str1  makes sure all words are separated by one space only  by replacing all occurences of two or more spaces with one space   The strsplit str      splits the sentence at every space and returns the result in a list  The   1   grabs the vector of words out of that list  The length counts up how many words    gt  str1  lt -  How many words are in this     sentence   gt  str2  lt - gsub    2        str1   gt  str2  1   How many words are in this sentence   gt  strsplit str2        1    1   How        many       words      are        in         this       sentence   gt  strsplit str2       1    1   How        many       words      are        in         this       sentence   gt  length strsplit str2       1     1  7

User · Answer

I use the str count function from the stringr library with the escape sequence  w that represents      any    word    character  letter  digit or underscore in the current   locale  in UTF-8 mode only ASCII letters and digits are considered    Example    gt  str count  How many words are in this sentence      w     1  7     Of all other 9 answers that I was able to test  only two  by Vincent Zoonekynd  and by petermeissner  worked for all inputs presented here so far  but they also require stringr   But only this solution works with all inputs presented so far  plus inputs such as  foo bar baz spam eggs  or  Combien de mots sont dans cette phrase      Benchmark   library stringr   questions  lt -   c           x    x y    x y     x y  z        foo bar baz spam eggs        one    two three 4     5 6        How many words are in this sentence        How  many words    are in this   sentence        Combien de mots sont dans cette phrase               Day after day  day after day      We stuck  nor breath nor motion             answers  lt - c 0  1  2  2  3  5  6  7  7  7  12   score  lt - function f  sum unlist lapply questions  f      answers   funs  lt -   c      function s  sapply gregexpr    W    s   length    1      function s  sapply gregexpr     alpha       s   function x  sum x  gt  0        function s  vapply strsplit s     W     length  integer 1        function s  length strsplit gsub    2          s         1         function s  length str match all s     S     1         function s  str count s     S         function s  sapply gregexpr    W    s   function x  sum x  gt  0     1      function s  length unlist strsplit s             function s  sapply strsplit s        length       function s  str count s     w         unlist lapply funs  score     Output   6 10 10  8  9  9  7  6  6 11

User · Answer

Use nchar  if vector of strings is called x   nchar x  - nchar gsub        x      1   Find out number of spaces then add one

User · Answer

The solution 7 does not give the correct result in the case there s just one word  You should not just count the elements in gregexpr s result  which is -1 if there where not matches  but count the elements   0   Ergo   sapply gregexpr    W    str1   function x  sum x gt 0      1

User · Answer

Try this  length unlist strsplit str1

User · Answer

Also from stringi package  the straight forward function stri count words  stringi  stri count words str1    1  7

User · Answer

Most simple way would be   require stringr  str count  one    two three 4     5 6      S          counting all sequences on non-space characters    S     But what about a little function that lets us also decide which kind of words we would like to count and which works on whole vectors as well   require stringr  nwords  lt - function string  pseudo F     ifelse  pseudo             pattern  lt -    S               pattern  lt -     alpha                   str count string  pattern     nwords  one    two three 4     5 6     3  nwords  one    two three 4     5 6   pseudo T    6

User · Answer

Use the regular expression symbol   W to match non-word characters  using   to indicate one or more in a row  along with gregexpr to find all matches in a string  Words are the number of word separators plus 1   lengths gregexpr    W    str1     1   This will fail with blank strings at the beginning or end of the character vector  when a  word  doesn t satisfy   W s notion of non-word  one could work with other regular expressions    S      alpha     etc   but there will always be edge cases with a regex approach   etc  It is likely more efficient than strsplit solutions  which will allocate memory for each word  Regular expressions are described in  regex   Update As noted in the comments and in a different answer by  Andri the  approach fails with  zero  and one-word strings  and with trailing punctuation  str1   c      x    x y    x y      x y  z   lengths gregexpr   A-z   W    str1     1L    1  2 2 2 3 3   Many of the other answers also fail in these or similar  e g   multiple spaces  cases  I think my answer s caveat about  notion of one word  in the original answer covers problems with punctuation  solution  choose a different regular expression  e g      space       but the zero and one word cases are a problem   Andri s solution fails to distinguish between zero and one words  So taking a  positive  approach to finding words one might  sapply gregexpr     alpha       str1   function x  sum x  gt  0     Leading to  sapply gregexpr     alpha       str1   function x  sum x  gt  0      1  0 1 2 2 3   Again the regular expression might be refined for different notions of  word    I like the use of gregexpr   because it s memory efficient  An alternative using strsplit    like  user813966  but with a regular expression to delimit words  and making use of the original notion of delimiting words is  lengths strsplit str1     W        1  0 1 2 2 3   This needs to allocate new memory for each word that is created  and for the intermediate list-of-words  This could be relatively expensive when the data is  big   but probably it s effective and understandable for most purposes

User · Answer

You can use strsplit and sapply functions  sapply strsplit str1        length

User · Answer

require stringr  str count x    w      will be fine with double triple spaces between words  All other answers have issues with more than one space between the words

User · Answer

With stringr package  one can also write a simple script that could traverse a vector of strings for example through a for loop   Let s say      df text   contains a vector of strings that we are interested in analysing  First  we add additional columns to the existing dataframe df as below   df strings      as integer NA  df characters   as integer NA    Then we run a for-loop over the vector of strings as below   for  i in 1 nrow df         df strings i       str count df text i      S      counts the strings    df characters i    str count df text i             counts the characters  amp  spaces     The resulting columns  strings and character will contain the counts of words and characters and this will be achieved in one-go for a vector of strings

User · Answer

You can remove double spaces and count the number of     in the string to get the count of words  Use stringr and rm white  qdapRegex   str count rm white s         1

User · Answer

You can use wc function in library qdap    gt  str1  lt -  How many words are in this sentence   gt  wc str1   1  7

User · Answer

You can use str match all  with a regular expression that would identify your words  The following works with initial  final and duplicated spaces   library stringr  s  lt -      Day after day  day after day    We stuck  nor breath nor motion    m  lt - str match all  s     S        Sequences of non-spaces length m  1

User · Answer

I ve found the following function and regex useful for word counts  especially in dealing with single vs  double hyphens  where the former generally should not count as a word break  eg  well-known  hi-fi  whereas double hyphen is a punctuation delimiter that is not bounded by white-space--such as for parenthetical remarks   txt  lt -  Don t you think e-mail is one word--and not two    10 words words  lt - function txt     length attributes gregexpr     w   w  -  w   w     w    txt   1    match length      words txt   10 words   Stringi is a useful package   But it over-counts words in this example due to hyphen   stringi  stri count words txt   11 words

[r] Count the number of all words in a string

Examples related to r

Examples related to string

Examples related to word-count