How to test if a string contains one of the substrings in a list in pandas

Question

Is there any function that would be the equivalent of a combination of df isin   and df col  str contains      For example  say I have the series s   pd Series   cat   hat   dog   fog   pet     and I want to find all places where s contains any of   og    at    I would want to get everything but  pet    I have a solution  but it s rather inelegant   searchfor     og    at   found    s str contains x  for x in searchfor  result   pd DataFrame found  result any     Is there a better way to do this

User · Accepted Answer

One option is just to use the regex   character to try to match each of the substrings in the words in your Series s  still using str contains     You can construct the regex by joining the words in searchfor with      gt  gt  gt  searchfor     og    at    gt  gt  gt  s s str contains     join searchfor    0    cat 1    hat 2    dog 3    fog dtype  object   As  AndyHayden noted in the comments below  take care if your substrings have special characters such as   and   which you want to match literally  These characters have specific meanings in the context of regular expressions and will affect the matching   You can make your list of substrings safer by escaping non-alphanumeric characters with re escape    gt  gt  gt  import re  gt  gt  gt  matches      money    x y    gt  gt  gt  safe matches    re escape m  for m in matches   gt  gt  gt  safe matches      money    x   y     The strings with in this new list will match each character literally when  used with str contains

User · Answer

Here is a one line lambda that also works   df  TrueFalse     df  col1   apply lambda x  1 if any i in x for i in searchfor  else 0    Input   searchfor     og    at    df   pd DataFrame    cat   1000 0     hat   2000000 0     dog   1000 0     fog   330000 0    pet   330000 0    columns   col1    col2        col1  col2 0   cat 1000 0 1   hat 2000000 0 2   dog 1000 0 3   fog 330000 0 4   pet 330000 0   Apply Lambda   df  TrueFalse     df  col1   apply lambda x  1 if any i in x for i in searchfor  else 0    Output       col1    col2        TrueFalse 0   cat     1000 0      1 1   hat     2000000 0   1 2   dog     1000 0      1 3   fog     330000 0    1 4   pet     330000 0    0

User · Answer

You can use str contains alone with a regex pattern using OR       s s str contains  og at      Or you could add the series to a dataframe then use str contains   df   pd DataFrame s  df s str contains  og at       Output   0 cat 1 hat 2 dog 3 fog

[python] How to test if a string contains one of the substrings in a list, in pandas?

Examples related to python

Examples related to string

Examples related to pandas

Examples related to dataframe

Examples related to match