How to select rows in a DataFrame between two values in Python Pandas

Question

I am trying to modify a DataFrame df to only contain rows for which the values in the column closing price are between 99 and 101 and trying to do this with the code below    However  I get the error      ValueError  The truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all     and I am wondering if there is a way to do this without using loops   df   df  99  lt   df  closing price    lt   101

User · Accepted Answer

You should use    to group your boolean vector to remove ambiguity    df   df  df  closing price    gt   99   amp   df  closing price    lt   101

User · Answer

you can also use  between   method  emp   pd read csv  C   py  programs  pandas 2  pandas  employees csv    emp emp  Salary   between 60000  61000        Output

User · Answer

there is a nicer alternative - use query   method   In  58   df   pd DataFrame   closing price   np random randint 95  105  10     In  59   df Out 59      closing price 0            104 1             99 2             98 3             95 4            103 5            101 6            101 7             99 8             95 9             96  In  60   df query  99  lt   closing price  lt   101   Out 60      closing price 1             99 5            101 6            101 7             99   UPDATE  answering the comment      I like the syntax here but fell down when trying to combine with   expresison  df query   mean   2  sd   lt   closing price  lt   mean   2    sd      In  161   qry     closing price mean   - 2 closing price std                           lt   closing price  lt                           closing price mean     2 closing price std                In  162   df query qry  Out 162      closing price 0             97 1            101 2             97 3             95 4            100 5             99 6            100 7            101 8             99 9             95

User · Answer

newdf   df query  closing price mean    lt   closing price  lt   closing price std       or   mean   closing price mean   std   closing price std    newdf   df query   mean  lt   closing price  lt    std

User · Answer

Instead of this   df   df  99  lt   df  closing price    lt   101     You should use this  df   df  df  closing price   gt  99    amp   df  closing price   lt  101     We have to use NumPy s bitwise Logic operators     amp        for compounding queries  Also  the parentheses are important for operator precedence   For more info  you can visit the link  Comparisons  Masks  and Boolean Logic

User · Answer

Consider also series between   df   df df  closing price   between 99  101

User · Answer

If you re dealing with multiple values and multiple inputs you could also set up an apply function like this  In this case filtering a dataframe for GPS locations that fall withing certain ranges   def filter values lat lon       if abs lat - 33 77   lt   01 and abs lon - -118 16   lt   01          return True     elif abs lat - 37 79   lt   01 and abs lon - -122 39   lt   01          return True     else          return False   df   df df apply lambda x  filter values x  lat   x  lon    axis 1

User · Answer

If one has to call pd Series between l r  repeatedly  for different bounds l and r   a lot of work is repeated unnecessarily  In this case  it s beneficial to sort the frame series once and then use pd Series searchsorted    I measured a speedup of up to 25x  see below  def between indices x  lower  upper  inclusive True        quot  quot  quot      Returns smallest and largest index i for which holds      lower  lt   x i   lt   upper  under the assumption that x is sorted       quot  quot  quot      i   x searchsorted lower  side  quot left quot  if inclusive else  quot right quot       j   x searchsorted upper  side  quot right quot  if inclusive else  quot left quot       return i  j    Sort x once before repeated calls of between   x   x sort values   reset index drop True    x   x sort values ignore index True    for pandas gt  1 0 ret1   between indices x  lower 0 1  upper 0 9  ret2   between indices x  lower 0 2  upper 0 8  ret3         Benchmark Measure repeated evaluations  n reps 100  of pd Series between   as well as the method based on pd Series searchsorted    for different arguments lower and upper  On my MacBook Pro 2015 with Python v3 8 0 and Pandas v1 0 3  the below code results in the following outpu   pd Series searchsorted     5 87 ms    321   s per loop  mean    std  dev  of 7 runs  100 loops each    pd Series between lower  upper    155 ms    6 08 ms per loop  mean    std  dev  of 7 runs  10 loops each    Logical expressions   x gt  lower   amp   x lt  upper    153 ms    3 52 ms per loop  mean    std  dev  of 7 runs  10 loops each   import numpy as np import pandas as pd  def between indices x  lower  upper  inclusive True         Assumption  x is sorted      i   x searchsorted lower  side  quot left quot  if inclusive else  quot right quot       j   x searchsorted upper  side  quot right quot  if inclusive else  quot left quot       return i  j  def between fast x  lower  upper  inclusive True        quot  quot  quot      Equivalent to pd Series between   under the assumption that x is sorted       quot  quot  quot      i  j   between indices x  lower  upper  inclusive      if True          return x iloc i j      else            Mask creation is slow          mask   np zeros like x  dtype bool          mask i j    True         mask   pd Series mask  index x index          return x mask   def between x  lower  upper  inclusive True       mask   x between lower  upper  inclusive inclusive      return x mask   def between expr x  lower  upper  inclusive True       if inclusive          mask    x gt  lower   amp   x lt  upper      else          mask    x gt lower   amp   x lt upper      return x mask   def benchmark func  x  lowers  uppers       for l u in zip lowers  uppers           func x lower l upper u   n samples   1000 n reps   100 x   pd Series np random randn n samples     Sort the Series    For pandas gt  1 0    x   x sort values ignore index True  x   x sort values   reset index drop True     Assert equivalence of different methods  assert between fast x  0  1  True   equals between x  0  1  True    assert between expr x  0  1  True   equals between x  0  1  True    assert between fast x  0  1  False  equals between x  0  1  False    assert between expr x  0  1  False  equals between x  0  1  False       Benchmark repeated evaluations of between    uppers   np linspace 0  3  n reps  lowers   -uppers  timeit benchmark between fast  x  lowers  uppers   timeit benchmark between  x  lowers  uppers   timeit benchmark between expr  x  lowers  uppers

[python] How to select rows in a DataFrame between two values, in Python Pandas?

Examples related to python

Examples related to pandas