Selection with loc in python

Question

I saw this code in someone s iPython notebook  and I m very confused as to how this code works  As far as I understood  pd loc   is used as a location based indexer where the format is   df loc index column name    However  in this case  the first index seems to be a series of boolean values  Could someone please explain to me how this selection works  I tried to read through the documentation but I couldn t figure out an explanation  Thanks   iris data loc iris data  class       versicolor    class      Iris-versicolor

User · Accepted Answer

pd DataFrame loc can take one or two indexers   For the rest of the post  I ll represent the first indexer as i and the second indexer as j   If only one indexer is provided  it applies to the index of the dataframe and the missing indexer is assumed to represent all columns   So the following two examples are equivalent    df loc i  df loc i       Where   is used to represent all columns   If both indexers are present  i references index values and j references column values     Now we can focus on what types of values i and j can assume   Let s use the following dataframe df as our example       df   pd DataFrame   1  2    3  4    index   A    B    columns   X    Y      loc has been written such that i and j can be   scalars that should be values in the respective index objects  df loc  A    Y    2  arrays whose elements are also members of the respective index object  notice that the order of the array I pass to loc is respected  df loc   B    A     X    B    3 A    1 Name  X  dtype  int64    Notice the dimensionality of the return object when passing arrays   i is an array as it was above  loc returns an object in which an index with those values is returned   In this case  because j was a scalar  loc returned a pd Series object   We could ve manipulated this to return a dataframe if we passed an array for i and j  and the array could ve have just been a single value d array   df loc   B    A      X        X B  3 A  1   boolean arrays whose elements are True or False and whose length matches the length of the respective index   In this case  loc simply grabs the rows  or columns  in which the boolean array is True   df loc  True  False     X        X A  1      In addition to what indexers you can pass to loc  it also enables you to make assignments   Now we can break down the line of code you provided   iris data loc iris data  class       versicolor    class      Iris-versicolor     iris data  class       versicolor  returns a boolean array  class is a scalar that represents a value in the columns object  iris data loc iris data  class       versicolor    class   returns a pd Series object consisting of the  class  column for all rows where  class  is  versicolor  When used with an assignment operator   iris data loc iris data  class       versicolor    class      Iris-versicolor    We assign  Iris-versicolor  for all elements in column  class  where  class  was  versicolor

User · Answer

This is using dataframes from the pandas package  The  index  part can be either a single index  a list of indices  or a list of booleans  This can be read about in the documentation  https   pandas pydata org pandas-docs stable indexing html  So the index part specifies a subset of the rows to pull out  and the  optional  column name specifies the column you want to work with from that subset of the dataframe  So if you want to update the  class  column but only in rows where the class is currently set as  versicolor   you might do something like what you list in the question   iris data loc iris data  class       versicolor    class      Iris-versicolor

User · Answer

It s pandas label-based selection  as explained here  https   pandas pydata org pandas-docs stable indexing html selection-by-label  The boolean array is basically a selection method using a mask

User · Answer

Whenever slicing  a n  can be used  it can be replaced by fancy indexing  e g   a b c     n    Fancy indexing is nothing more than listing explicitly all the index values instead of specifying only the limits   Whenever fancy indexing can be used  it can be replaced by a list of Boolean values  a mask  the same size than the index  The value will be True for index values that would have been included in the fancy index  and False for the values that would have been excluded  It s another way of listing some index values  but which can be easily automated in NumPy and Pandas  e g by a logical comparison  like in your case     The second replacement possibility is the one used in your example  In  iris data loc iris data  class       versicolor    class      Iris-versicolor   the mask iris data  class       versicolor   is a replacement for a long and silly fancy index which would be list of row numbers where class column  a Series  has the value versicolor  Whether a Boolean mask appears within a  iloc or  loc  e g  df loc mask   indexer or directly as the index  e g  df mask   depends on wether a slice is allowed as a direct index  Such cases are shown in the following indexer cheat-sheet   Pandas indexers loc and iloc cheat-sheet

User · Answer

It s a pandas data-frame and it s using label base selection tool with df loc and in it  there are two inputs  one for the row and the other one for the column  so in the row input it s selecting all those row values where the value saved in the column class is versicolor  and in the column input it s selecting the column with label class  and assigning Iris-versicolor value to them  So basically it s replacing all the cells of column class with value versicolor with Iris-versicolor

[python] Selection with .loc in python

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to ipython

Examples related to selection