How to calculate the number of occurrence of a given character in each row of a column of strings

Question

I have a data frame in which certain variables contain a text string   I wish to count the number of occurrences of a given character in each individual string   Example   q data lt -data frame number 1 3  string c  greatgreat    magic    not      I wish to create a new column for q data with the number of occurence of  a  in string  ie  c 2 1 0     The only convoluted approach I have managed is   string counter lt -function strings  pattern       counts lt -NULL   for i in 1 length strings        counts i  lt -length attr gregexpr pattern strings i    1     match length   attr gregexpr pattern strings i    1     match length   gt 0       return counts     string counter strings q data string  pattern  a     number     string number of a 1      1 greatgreat           2 2      2      magic           1 3      3        not           0

User · Answer

A variation of https   stackoverflow com a 12430764 589165 is   gt  nchar gsub    a        q data string    1  2 1 0

User · Answer

nchar as character q data string   -nchar  gsub  a       q data string    1  2 1 0   Notice that I coerce the factor variable to character  before passing to nchar  The regex functions appear to do that internally   Here s benchmark results  with a scaled up size of the test to 3000 rows    q data lt -q data rep 1 NROW q data   1000     str q data   data frame     3000 obs  of  3 variables     number       int  1 2 3 1 2 3 1 2 3 1        string       Factor w  3 levels  greatgreat   magic      1 2 3 1 2 3 1 2 3 1        number of a  int  2 1 0 2 1 0 2 1 0 2       benchmark  Dason     q data number of a  lt - str count as character q data string    a       Tim    resT  lt - sapply as character q data string   function x  letter    a                                sum unlist strsplit x  split           letter           DWin    resW  lt - nchar as character q data string   -nchar  gsub  a       q data string      Josh    x  lt - sapply regmatches q data string  gregexpr  g  q data string     length    replications 100   -----------------------    test replications elapsed  relative user self sys self user child sys child 1 Dason          100   4 173  9 959427     2 985    1 204          0         0 3  DWin          100   0 419  1 000000     0 417    0 003          0         0 4  Josh          100  18 635 44 474940    17 883    0 827          0         0 2   Tim          100   3 705  8 842482     3 646    0 072          0         0

User · Answer

Yet another base R option could be   lengths lapply q data string  grepRaw  pattern    a   all   TRUE  fixed   TRUE     1  2 1 0

User · Answer

The next expression does the job and also works for symbols  not only letters   The expression works as follows       1  it uses lapply on the columns of the dataframe q data to iterate over the rows of the column 2   lapply q data  2            2  it apply to each row of the column 2 a function  function x  sum  a     strsplit as character x        1         The function takes each row value of column 2  x   convert to character  in case it is a factor for example   and it does the split of the string on every character   strsplit as character x          As a result we have a a vector with each character of the string value for each row of the column 2       3  Each vector value of the vector is compared with the desired character to be counted  in this case  a      a         This operation will return a vector of True and False values  c True False True         being True when the value in the vector matches the desired character to be counted        4  The total times the character  a  appears in the row is calculated as the sum of all the  True  values in the vector  sum               5  Then it is applied the  unlist  function to unpack the result of the  lapply  function and assign it to a new column in the dataframe   q data number of a lt -unlist          q data number of a lt -unlist lapply q data  2  function x  sum  a     strsplit as character x        1         gt q data     number     string     number of a  1   greatgreat         2  2      magic           1  3      not             0

User · Answer

Another good option  using charToRaw  sum charToRaw  quot abc d aa quot      charToRaw

User · Answer

The easiest and the cleanest way IMHO is    q data number of a  lt - lengths gregexpr  a   q data string       number     string number of a   1      1 greatgreat           2   2      2      magic           1   3      3        not           0

User · Answer

The question below has been moved here  but it seems this page doesn t directly answer to Farah El s question  How to find number 1s in 101 in R  So  I ll write an answer here  just in case   library magrittr  n   gt     n is a number you d like to inspect   as character     gt     str count pattern    1     https   stackoverflow com users 8931457 farah-el

User · Answer

The stringi package provides the functions stri count and stri count fixed which are very fast   stringi  stri count q data string  fixed    a      1  2 1 0   benchmark  Compared to the fastest approach from  42- s answer and to the equivalent function from the stringr package for a vector with 30 000 elements   library microbenchmark   benchmark  lt - microbenchmark    stringi   stringi  stri count test data string  fixed    a      baseR   nchar test data string  - nchar gsub  a       test data string  fixed   TRUE      stringr   str count test data string   a      autoplot benchmark    data  q data  lt - data frame number 1 3  string c  greatgreat    magic    not    stringsAsFactors   FALSE  test data  lt - q data rep 1 NROW q data   10000

User · Answer

You could just use string division  require roperators  my strings  lt - c  apple   banana    pear    melon   my strings  s    a    Which will give you 1  3  1  0  You can also use string division with regular expressions and whole words

User · Answer

s  lt -  aababacababaaathhhhhslsls jsjsjjsaa ghhaalll  p  lt -  a  s2  lt - gsub p    s  numOcc  lt - nchar s  - nchar s2    May not be the efficient one but solve my purpose

User · Answer

I m sure someone can do better  but this works   sapply as character q data string   function x  letter    a      sum unlist strsplit x  split           letter     greatgreat      magic        not       2          1          0    or in a function   countLetter  lt - function charvec  letter     sapply charvec  function x  letter       sum unlist strsplit x  split           letter       letter   letter    countLetter as character q data string   a

User · Answer

If you don t want to leave base R  here s a fairly succinct and expressive possibility   x  lt - q data string lengths regmatches x  gregexpr  a   x       1  2 1 0

User · Answer

The stringr package provides the str count function which seems to do what you re interested in    Load your example data q data lt -data frame number 1 3  string c  greatgreat    magic    not    stringsAsFactors   F  library stringr     Count the number of  a s in each element of string q data number of a  lt - str count q data string   a   q data    number     string number of a  1      1 greatgreat           2  2      2      magic           1  3      3        not           0

[regex] How to calculate the number of occurrence of a given character in each row of a column of strings?

Examples related to regex

Examples related to r

Examples related to dataframe