Filter dataframe rows if value in column is in a set list of values

Question

I have a Python pandas DataFrame rpt   rpt  lt class  pandas core frame DataFrame  gt  MultiIndex  47518 entries    000002    20120331   to   603366    20091231   Data columns  STK ID                    47518  non-null values STK Name                  47518  non-null values RPT Date                  47518  non-null values sales                     47518  non-null values   I can filter the rows whose stock id is  600809  like this  rpt rpt  STK ID       600809     lt class  pandas core frame DataFrame  gt  MultiIndex  25 entries    600809    20120331   to   600809    20060331   Data columns  STK ID                    25  non-null values STK Name                  25  non-null values RPT Date                  25  non-null values sales                     25  non-null values   and I want to get all the rows of some stocks together  such as   600809   600141   600329    That means I want a syntax like this    stk list     600809   600141   600329    rst   rpt rpt  STK ID   in stk list    this does not works in pandas    Since pandas not accept above command  how to achieve the target

User · Accepted Answer

Use the isin method    rpt rpt  STK ID   isin stk list

User · Answer

You can also achieve similar results by using  query  and     eg   df   pd DataFrame   A    1  2  3    B     a    b    f     df   pd DataFrame   A     5 6 3 4    B     1 2 3  5    list of values    3 6  result  df query  A in  list of values   result    A  B 1  6  2 2  3  3

User · Answer

You can use query  i e    b   df query  a  gt  1  amp  a  lt  5

User · Answer

Slicing data with pandas Given a dataframe like this      RPT Date  STK ID STK Name  sales 0 1980-01-01       0   Arthur      0 1 1980-01-02       1    Beate      4 2 1980-01-03       2    Cecil      2 3 1980-01-04       3     Dana      8 4 1980-01-05       4     Eric      4 5 1980-01-06       5    Fidel      5 6 1980-01-07       6   George      4 7 1980-01-08       7     Hans      7 8 1980-01-09       8   Ingrid      7 9 1980-01-10       9    Jones      4  There are multiple ways of selecting or slicing the data  Using  isin The most obvious is the  isin feature  You can create a mask that gives you a series of True False statements  which can be applied to a dataframe like this  mask   df  STK ID   isin  4  2  6    mask 0    False 1    False 2     True 3    False 4     True 5    False 6     True 7    False 8    False 9    False Name  STK ID  dtype  bool  df mask      RPT Date  STK ID STK Name  sales 2 1980-01-03       2    Cecil      2 4 1980-01-05       4     Eric      4 6 1980-01-07       6   George      4  Masking is the ad-hoc solution to the problem  but does not always perform well in terms of speed and memory  With indexing By setting the index to the STK ID column  we can use the pandas builtin slicing object  loc df set index  STK ID   inplace True           RPT Date STK Name  sales STK ID                            0      1980-01-01   Arthur      0 1      1980-01-02    Beate      4 2      1980-01-03    Cecil      2 3      1980-01-04     Dana      8 4      1980-01-05     Eric      4 5      1980-01-06    Fidel      5 6      1980-01-07   George      4 7      1980-01-08     Hans      7 8      1980-01-09   Ingrid      7 9      1980-01-10    Jones      4  df loc  4  2  6            RPT Date STK Name  sales STK ID                            4      1980-01-05     Eric      4 2      1980-01-03    Cecil      2 6      1980-01-07   George      4  This is the fast way of doing it  even if the indexing can take a little while  it saves time if you want to do multiple queries like this  Merging dataframes This can also be done by merging dataframes  This would fit more for a scenario where you have a lot more data than in these examples  stkid df   pd DataFrame   quot STK ID quot    4 2 6    df merge stkid df  on  STK ID      STK ID   RPT Date STK Name  sales 0       2 1980-01-03    Cecil      2 1       4 1980-01-05     Eric      4 2       6 1980-01-07   George      4  Note All the above methods work even if there are multiple rows with the same  STK ID

User · Answer

You can also directly query your DataFrame for this information   rpt query  STK ID in  600809 600141 600329      Or similarly search for ranges   rpt query  60000  lt  STK ID  lt  70000

User · Answer

isin   is ideal if you have a list of exact matches  but if you have a list of partial matches or substrings to look for  you can filter using the str contains method and regular expressions   For example  if we want to return a DataFrame where all of the stock IDs which begin with  600  and then are followed by any three digits    gt  gt  gt  rpt rpt  STK ID   str contains r  600 0-9  3          means start of string       STK ID                                             0-9  3  means any three digits       600809                                              means end of string       600141             600329                            Suppose now we have a list of strings which we want the values in  STK ID  to end with  e g   endstrings     01     02     05      We can join these strings with the regex  or  character   and pass the string to str contains to filter the DataFrame     gt  gt  gt  rpt rpt  STK ID   str contains     join endstrings         STK ID             155905             633101             210302                            Finally  contains can ignore case  by setting case False   allowing you to be more general when specifying the strings you want to match   For example   str contains  pandas   case False    would match PANDAS  PanDAs  paNdAs123  and so on

User · Answer

you can also use ranges by using   b   df  df  a    gt  1   amp   df  a    lt  5

[python] Filter dataframe rows if value in column is in a set list of values

Examples related to python

Examples related to pandas

Examples related to dataframe