How to calculate the number of occurrence of a given character in each row of a column of strings?


I have a data.frame in which certain variables contain a text string. I wish to count the number of occurrences of a given character in each individual string.

Example:<-data.frame(number=1:3, string=c("greatgreat", "magic", "not"))

I wish to create a new column for with the number of occurence of "a" in string (ie. c(2,1,0)).

The only convoluted approach I have managed is:

string.counter<-function(strings, pattern){  
  for(i in 1:length(strings)){
    counts[i]<-length(attr(gregexpr(pattern,strings[i])[[1]], "match.length")[attr(gregexpr(pattern,strings[i])[[1]], "match.length")>0])

string.counter($string, pattern="a")

 number     string number.of.a
1      1 greatgreat           2
2      2      magic           1
3      3        not           0

This question is tagged with regex r dataframe

~ Asked on 2012-09-14 15:17:55

The Best Answer is


The stringr package provides the str_count function which seems to do what you're interested in

# Load your example data<-data.frame(number=1:3, string=c("greatgreat", "magic", "not"), stringsAsFactors = F)

# Count the number of 'a's in each element of string$number.of.a <- str_count($string, "a")
#  number     string number.of.a
#1      1 greatgreat           2
#2      2      magic           1
#3      3        not           0

~ Answered on 2012-09-14 15:25:40


If you don't want to leave base R, here's a fairly succinct and expressive possibility:

x <-$string
lengths(regmatches(x, gregexpr("a", x)))
# [1] 2 1 0

~ Answered on 2012-09-14 15:44:03

Most Viewed Questions: