pandas loc vs iloc vs at vs iat

Question

Recently began branching out from my safe place  R  into Python and and am a bit confused by the cell localization selection in Pandas  I ve read the documentation but I m struggling to understand the practical implications of the various localization selection options  Is there a reason why I should ever use  loc or  iloc over at  and iat or vice versa   In what situations should I use which method    Note  future readers be aware that this question is old and was written before pandas v0 20 when there used to exist a function called  ix  This method was later split into two - loc and iloc - to make the explicit distinction between positional and label based indexing  Please beware that ix was discontinued due to inconsistent behavior and being hard to grok  and no longer exists in current versions of pandas   gt   1 0

User · Answer

Let s start with this small df   import pandas as pd import time as tm import numpy as np n 10 a np arange 0 n  2  df pd DataFrame a reshape n n     We ll so have   df Out 25            0   1   2   3   4   5   6   7   8   9     0   0   1   2   3   4   5   6   7   8   9     1  10  11  12  13  14  15  16  17  18  19     2  20  21  22  23  24  25  26  27  28  29     3  30  31  32  33  34  35  36  37  38  39     4  40  41  42  43  44  45  46  47  48  49     5  50  51  52  53  54  55  56  57  58  59     6  60  61  62  63  64  65  66  67  68  69     7  70  71  72  73  74  75  76  77  78  79     8  80  81  82  83  84  85  86  87  88  89     9  90  91  92  93  94  95  96  97  98  99   With this we have   df iloc 3 3  Out 33   33  df iat 3 3  Out 34   33  df iloc  3  3  Out 35        0   1   2   3 0   0   1   2   3 1  10  11  12  13 2  20  21  22  23 3  30  31  32  33    df iat  3  3  Traceback  most recent call last          omissis     ValueError  At based indexing on an integer index can only have integer indexers   Thus we cannot use  iat for subset  where we must use  iloc only   But let s try both to select from a larger df and let s check the speed        - - coding  utf-8 - -     Created on Wed Feb  7 09 58 39 2018   author  Fabio Pomi      import pandas as pd import time as tm import numpy as np n 1000 a np arange 0 n  2  df pd DataFrame a reshape n n   t1 tm time   for j in df index      for i in df columns          a df iloc j i  t2 tm time   for j in df index      for i in df columns          a df iat j i  t3 tm time   loc t2-t1 at t3-t2 prc   loc at  100 print   nloc  f at  f prc  f    loc at prc    loc 10 485600 at 7 395423 prc 141 784987   So with  loc we can manage subsets and with  at only a single scalar  but  at is faster than  loc   -

User · Answer

There are two primary ways that pandas makes selections from a DataFrame    By Label By Integer Location   The documentation uses the term position for referring to integer location  I do not like this terminology as I feel it is confusing  Integer location is more descriptive and is exactly what  iloc stands for  The key word here is INTEGER - you must use integers when selecting by integer location   Before showing the summary let s all make sure that       ix is deprecated and ambiguous and should never be used  There are three primary indexers for pandas  We have the indexing operator itself  the brackets       loc  and  iloc  Let s summarize them       - Primarily selects subsets of columns  but can select rows as well  Cannot simultaneously select rows and columns   loc - selects subsets of rows and columns by label only  iloc - selects subsets of rows and columns by integer location only   I almost never use  at or  iat as they add no additional functionality and with just a small performance increase  I would discourage their use unless you have a very time-sensitive application  Regardless  we have their summary     at selects a single scalar value in the DataFrame by label only  iat selects a single scalar value in the DataFrame by integer location only   In addition to selection by label and integer location  boolean selection also known as boolean indexing exists      Examples explaining  loc   iloc  boolean selection and  at and  iat are shown below  We will first focus on the differences between  loc and  iloc  Before we talk about the differences  it is important to understand that DataFrames have labels that help identify each column and each row  Let s take a look at a sample DataFrame   df   pd DataFrame   age   30  2  12  4  32  33  69                       color    blue    green    red    white    gray    black    red                        food    Steak    Lamb    Mango    Apple    Cheese    Melon    Beans                        height   165  70  120  80  180  172  150                       score   4 6  8 3  9 0  3 3  1 8  9 5  2 2                       state    NY    TX    FL    AL    AK    TX    TX                                           index   Jane    Nick    Aaron    Penelope    Dean    Christina    Cornelia        All the words in bold are the labels  The labels  age  color  food  height  score and state are used for the columns  The other labels  Jane  Nick  Aaron  Penelope  Dean  Christina  Cornelia are used as labels for the rows  Collectively  these row labels are known as the index     The primary ways to select particular rows in a DataFrame are with the  loc and  iloc indexers  Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now  Also  each of the indexers use a set of brackets that immediately follow their name to make their selections    loc selects data only by labels  We will first talk about the  loc indexer which only selects data by the index or column labels  In our sample DataFrame  we have provided meaningful names as values for the index  Many DataFrames will not have any meaningful names and will instead  default to just the integers from 0 to n-1  where n is the length number of rows  of the DataFrame   There are many different inputs you can use for  loc three out of them are     A string A list of strings Slice notation using strings as the start and stop values   Selecting a single row with  loc with a string  To select a single row of data  place the index label inside of the brackets following  loc   df loc  Penelope     This returns the row of data as a Series    age           4 color     white food      Apple height       80 score       3 3 state        AL Name  Penelope  dtype  object   Selecting multiple rows with  loc with a list of strings  df loc   Cornelia    Jane    Dean      This returns a DataFrame with the rows in the order specified in the list     Selecting multiple rows with  loc with slice notation  Slice notation is defined by a start  stop and step values  When slicing by label  pandas includes the stop value in the return  The following slices from Aaron to Dean  inclusive  Its step size is not explicitly defined but defaulted to 1   df loc  Aaron   Dean       Complex slices can be taken in the same manner as Python lists    iloc selects data only by integer location  Let s now turn to  iloc  Every row and column of data in a DataFrame has an integer location that defines it  This is in addition to the label that is visually displayed in the output  The integer location is simply the number of rows columns from the top left beginning at 0   There are many different inputs you can use for  iloc three out of them are     An integer A list of integers Slice notation using integers as the start and stop values   Selecting a single row with  iloc with an integer  df iloc 4    This returns the 5th row  integer location 4  as a Series  age           32 color       gray food      Cheese height       180 score        1 8 state         AK Name  Dean  dtype  object   Selecting multiple rows with  iloc with a list of integers  df iloc  2  -2     This returns a DataFrame of the third and second to last rows     Selecting multiple rows with  iloc with slice notation  df iloc  5 3        Simultaneous selection of rows and columns with  loc and  iloc  One excellent ability of both  loc  iloc is their ability to select both rows and columns simultaneously  In the examples above  all the columns were returned from each selection  We can choose columns with the same types of inputs as we do for rows  We simply need to separate the row and column selection with a comma   For example  we can select rows Jane  and Dean with just the columns height  score and state like this   df loc   Jane    Dean     height        This uses a list of labels for the rows and slice notation for the columns  We can naturally do similar operations with  iloc using only integers   df iloc  1 4   2  Nick      Lamb Dean    Cheese Name  food  dtype  object     Simultaneous selection with labels and integer location   ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated  In the event that you need to make a selection with a mix of labels and integer locations  you will have to make both your selections labels or integer locations    For instance  if we want to select rows Nick and Cornelia along with columns 2 and 4  we could use  loc by converting the integers to labels with the following   col names   df columns  2  4   df loc   Nick    Cornelia    col names     Or alternatively  convert the index labels to integers with the get loc index method   labels     Nick    Cornelia   index ints    df index get loc label  for label in labels  df iloc index ints   2  4     Boolean Selection  The  loc indexer can also do boolean selection  For instance  if we are interested in finding all the rows where age is above 30 and return just the food and score columns we can do the following   df loc df  age    gt  30    food    score       You can replicate this with  iloc but you cannot pass it a boolean series  You must convert the boolean Series into a numpy array like this   df iloc  df  age    gt  30  values   2  4        Selecting all rows  It is possible to use  loc  iloc for just column selection  You can select all the rows by using a colon like this   df loc     color   score  2        The indexing operator      can slice can select rows and columns too but not simultaneously   Most people are familiar with the primary purpose of the DataFrame indexing operator  which is to select columns  A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame   df  food    Jane          Steak Nick           Lamb Aaron         Mango Penelope      Apple Dean         Cheese Christina     Melon Cornelia      Beans Name  food  dtype  object   Using a list selects multiple columns  df   food    score        What people are less familiar with  is that  when slice notation is used  then selection happens by row labels or by integer location  This is very confusing and something that I almost never use but it does work   df  Penelope   Christina     slice rows by label     df 2 6 2    slice rows by integer location     The explicitness of  loc  iloc for selecting rows is highly preferred  The indexing operator alone is unable to select rows and columns simultaneously   df 3 5   color   TypeError  unhashable type   slice      Selection by  at and  iat  Selection with  at is nearly identical to  loc but it only selects a single  cell  in your DataFrame  We usually refer to this cell as a scalar value  To use  at  pass it both a row and column label separated by a comma   df at  Christina    color    black    Selection with  iat is nearly identical to  iloc but it only selects a single scalar value  You must pass it an integer for both the row and column locations  df iat 2  5   FL

User · Answer

Updated for pandas 0 20 given that ix is deprecated   This demonstrates not only how to use loc  iloc  at  iat  set value  but how to accomplish  mixed positional label based indexing     loc - label based Allows you to pass 1-D arrays as indexers   Arrays can be either slices  subsets  of the index or column  or they can be boolean arrays which are equal in length to the index or columns     Special Note  when a scalar indexer is passed  loc can assign a new index or column value that didn t exist before     label based  but we can use position values   to get the labels from the index object df loc df index 2    ColName     3     df loc df index 1 3    ColName     3     iloc - position based Similar to loc except with positions rather that index values   However  you cannot assign new columns or indices     position based  but we can get the position   from the columns object via the  get loc  method df iloc 2  df columns get loc  ColName      3     df iloc 2  4    3     df iloc  3  2 4    3     at - label based Works very similar to loc for scalar indexers   Cannot operate on array indexers   Can  assign new indices and columns     Advantage over loc is that this is faster  Disadvantage is that you can t use arrays for indexers     label based  but we can use position values   to get the labels from the index object df at df index 2    ColName     3     df at  C    ColName     3     iat - position based Works similarly to iloc   Cannot work in array indexers   Cannot  assign new indices and columns   Advantage over iloc is that this is faster  Disadvantage is that you can t use arrays for indexers     position based  but we can get the position   from the columns object via the  get loc  method IBM iat 2  IBM columns get loc  PNL      3     set value - label based Works very similar to loc for scalar indexers   Cannot operate on array indexers   Can  assign new indices and columns  Advantage Super fast  because there is very little overhead  Disadvantage There is very little overhead because pandas is not doing a bunch of safety checks   Use at your own risk   Also  this is not intended for public use     label based  but we can use position values   to get the labels from the index object df set value df index 2    ColName   3      set value with takable True - position based Works similarly to iloc   Cannot work in array indexers   Cannot  assign new indices and columns   Advantage Super fast  because there is very little overhead  Disadvantage There is very little overhead because pandas is not doing a bunch of safety checks   Use at your own risk   Also  this is not intended for public use     position based  but we can get the position   from the columns object via the  get loc  method df set value 2  df columns get loc  ColName    3  takable True

User · Answer

df   pd DataFrame   A    a    b    c     B   54  67  89    index  100  200  300    df                          A   B                 100     a   54                 200     b   67                 300     c   89 In  19       df loc 100   Out 19   A     a B    54 Name  100  dtype  object  In  20       df iloc 0   Out 20   A     a B    54 Name  100  dtype  object  In  24       df2   df set index  df index  A    df2  Out 24           B     A    100 a   54 200 b   67 300 c   89  In  25       df2 ix 100   a    Out 25       B    54 Name   100  a   dtype  int64

User · Answer

loc  only work on index iloc  work on position at  get scalar values  It s a very fast loc iat  Get scalar values  It s a very fast iloc Also   at and iat are meant to access a scalar  that is  a single element in the dataframe  while loc and iloc are ments to access several elements at the same time  potentially to perform vectorized operations   http   pyciencia blogspot com 2015 05 obtener-y-filtrar-datos-de-un-dataframe html

[python] pandas loc vs. iloc vs. at vs. iat?

Examples related to python

Examples related to pandas

Examples related to performance

Examples related to indexing

Examples related to lookup