I am trying to use grep
to test whether a vector of strings are present in an another vector or not, and to output the values that are present (the matching patterns).
I have a data frame like this:
FirstName Letter
Alex A1
Alex A6
Alex A7
Bob A1
Chris A9
Chris A6
I have a vector of strings patterns to be found in the "Letter" columns, for example: c("A1", "A9", "A6")
.
I would like to check whether the any of the strings in the pattern vector is present in the "Letter" column. If they are, I would like the output of unique values.
The problem is, I don't know how to use grep
with multiple patterns. I tried:
matches <- unique (
grep("A1| A9 | A6", myfile$Letter, value=TRUE, fixed=TRUE)
)
But it gives me 0 matches which is not true, any suggestions?
Good answers, however don't forget about filter()
from dplyr:
patterns <- c("A1", "A9", "A6")
>your_df
FirstName Letter
1 Alex A1
2 Alex A6
3 Alex A7
4 Bob A1
5 Chris A9
6 Chris A6
result <- filter(your_df, grepl(paste(patterns, collapse="|"), Letter))
>result
FirstName Letter
1 Alex A1
2 Alex A6
3 Bob A1
4 Chris A9
5 Chris A6
Have you tried the match()
or charmatch()
functions?
Example use:
match(c("A1", "A9", "A6"), myfile$Letter)
Using the sapply
patterns <- c("A1", "A9", "A6")
df <- data.frame(name=c("A","Ale","Al","lex","x"),Letters=c("A1","A2","A9","A1","A9"))
name Letters
1 A A1
2 Ale A2
3 Al A9
4 lex A1
5 x A9
df[unlist(sapply(patterns, grep, df$Letters, USE.NAMES = F)), ]
name Letters
1 A A1
4 lex A1
3 Al A9
5 x A9
Not sure whether this answer has already appeared...
For the particular pattern in the question, you can just do it with a single grep()
call,
grep("A[169]", myfile$Letter)
Based on Brian Digg's post, here are two helpful functions for filtering lists:
#Returns all items in a list that are not contained in toMatch
#toMatch can be a single item or a list of items
exclude <- function (theList, toMatch){
return(setdiff(theList,include(theList,toMatch)))
}
#Returns all items in a list that ARE contained in toMatch
#toMatch can be a single item or a list of items
include <- function (theList, toMatch){
matches <- unique (grep(paste(toMatch,collapse="|"),
theList, value=TRUE))
return(matches)
}
This should work:
grep(pattern = 'A1|A9|A6', x = myfile$Letter)
Or even more simply:
library(data.table)
myfile$Letter %like% 'A1|A9|A6'
Take away the spaces. So do:
matches <- unique(grep("A1|A9|A6", myfile$Letter, value=TRUE, fixed=TRUE))
In addition to @Marek's comment about not including fixed==TRUE
, you also need to not have the spaces in your regular expression. It should be "A1|A9|A6"
.
You also mention that there are lots of patterns. Assuming that they are in a vector
toMatch <- c("A1", "A9", "A6")
Then you can create your regular expression directly using paste
and collapse = "|"
.
matches <- unique (grep(paste(toMatch,collapse="|"),
myfile$Letter, value=TRUE))
To add to Brian Diggs answer.
another way using grepl will return a data frame containing all your values.
toMatch <- myfile$Letter
matches <- myfile[grepl(paste(toMatch, collapse="|"), myfile$Letter), ]
matches
Letter Firstname
1 A1 Alex
2 A6 Alex
4 A1 Bob
5 A9 Chris
6 A6 Chris
Maybe a bit cleaner... maybe?
I suggest writing a little script and doing multiple searches with Grep. I've never found a way to search for multiple patterns, and believe me, I've looked!
Like so, your shell file, with an embedded string:
#!/bin/bash
grep *A6* "Alex A1 Alex A6 Alex A7 Bob A1 Chris A9 Chris A6";
grep *A7* "Alex A1 Alex A6 Alex A7 Bob A1 Chris A9 Chris A6";
grep *A8* "Alex A1 Alex A6 Alex A7 Bob A1 Chris A9 Chris A6";
Then run by typing myshell.sh.
If you want to be able to pass in the string on the command line, do it like this, with a shell argument--this is bash notation btw:
#!/bin/bash
$stingtomatch = "${1}";
grep *A6* "${stingtomatch}";
grep *A7* "${stingtomatch}";
grep *A8* "${stingtomatch}";
And so forth.
If there are a lot of patterns to match, you can put it in a for loop.
Source: Stackoverflow.com