How are iloc and loc different

Question

Can someone explain how these two methods of slicing are different  I ve seen the docs  and I ve seen these answers  but I still find myself unable to understand how the three are different  To me  they seem interchangeable in large part  because they are at the lower levels of slicing  For example  say we want to get the first five rows of a DataFrame   How is it that these two work  df loc  5  df iloc  5   Can someone present three cases where the distinction in uses are clearer   Once upon a time  I also wanted to know how these two functions differ from df ix  5  but ix has been removed from pandas 1 0  so I don t care anymore

User · Answer

In my opinion  the accepted answer is confusing  since it uses a DataFrame with only missing values  I also do not like the term position-based for  iloc and instead  prefer integer location as it is much more descriptive and exactly what  iloc stands for  The key word is INTEGER -  iloc needs INTEGERS   See my extremely detailed blog series on subset selection for more     ix is deprecated and ambiguous and should never be used  Because  ix is deprecated we will only focus on the differences between  loc and  iloc   Before we talk about the differences  it is important to understand that DataFrames have labels that help identify each column and each index  Let s take a look at a sample DataFrame   df   pd DataFrame   age   30  2  12  4  32  33  69                       color    blue    green    red    white    gray    black    red                        food    Steak    Lamb    Mango    Apple    Cheese    Melon    Beans                        height   165  70  120  80  180  172  150                       score   4 6  8 3  9 0  3 3  1 8  9 5  2 2                       state    NY    TX    FL    AL    AK    TX    TX                                           index   Jane    Nick    Aaron    Penelope    Dean    Christina    Cornelia        All the words in bold are the labels  The labels  age  color  food  height  score and state are used for the columns  The other labels  Jane  Nick  Aaron  Penelope  Dean  Christina  Cornelia are used for the index     The primary ways to select particular rows in a DataFrame are with the  loc and  iloc indexers  Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now  Also  each of the indexers use a set of brackets that immediately follow their name to make their selections    loc selects data only by labels  We will first talk about the  loc indexer which only selects data by the index or column labels  In our sample DataFrame  we have provided meaningful names as values for the index  Many DataFrames will not have any meaningful names and will instead  default to just the integers from 0 to n-1  where n is the length of the DataFrame   There are three different inputs you can use for  loc      A string A list of strings Slice notation using strings as the start and stop values   Selecting a single row with  loc with a string  To select a single row of data  place the index label inside of the brackets following  loc   df loc  Penelope     This returns the row of data as a Series    age           4 color     white food      Apple height       80 score       3 3 state        AL Name  Penelope  dtype  object   Selecting multiple rows with  loc with a list of strings  df loc   Cornelia    Jane    Dean      This returns a DataFrame with the rows in the order specified in the list     Selecting multiple rows with  loc with slice notation  Slice notation is defined by a start  stop and step values  When slicing by label  pandas includes the stop value in the return  The following slices from Aaron to Dean  inclusive  Its step size is not explicitly defined but defaulted to 1   df loc  Aaron   Dean       Complex slices can be taken in the same manner as Python lists    iloc selects data only by integer location  Let s now turn to  iloc  Every row and column of data in a DataFrame has an integer location that defines it  This is in addition to the label that is visually displayed in the output  The integer location is simply the number of rows columns from the top left beginning at 0   There are three different inputs you can use for  iloc      An integer A list of integers Slice notation using integers as the start and stop values   Selecting a single row with  iloc with an integer  df iloc 4    This returns the 5th row  integer location 4  as a Series  age           32 color       gray food      Cheese height       180 score        1 8 state         AK Name  Dean  dtype  object   Selecting multiple rows with  iloc with a list of integers  df iloc  2  -2     This returns a DataFrame of the third and second to last rows     Selecting multiple rows with  iloc with slice notation  df iloc  5 3        Simultaneous selection of rows and columns with  loc and  iloc  One excellent ability of both  loc  iloc is their ability to select both rows and columns simultaneously  In the examples above  all the columns were returned from each selection  We can choose columns with the same types of inputs as we do for rows  We simply need to separate the row and column selection with a comma   For example  we can select rows Jane  and Dean with just the columns height  score and state like this   df loc   Jane    Dean     height        This uses a list of labels for the rows and slice notation for the columns  We can naturally do similar operations with  iloc using only integers   df iloc  1 4   2  Nick      Lamb Dean    Cheese Name  food  dtype  object     Simultaneous selection with labels and integer location   ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated  In the event that you need to make a selection with a mix of labels and integer locations  you will have to make both your selections labels or integer locations    For instance  if we want to select rows Nick and Cornelia along with columns 2 and 4  we could use  loc by converting the integers to labels with the following   col names   df columns  2  4   df loc   Nick    Cornelia    col names     Or alternatively  convert the index labels to integers with the get loc index method   labels     Nick    Cornelia   index ints    df index get loc label  for label in labels  df iloc index ints   2  4     Boolean Selection  The  loc indexer can also do boolean selection  For instance  if we are interested in finding all the rows wher age is above 30 and return just the food and score columns we can do the following   df loc df  age    gt  30    food    score       You can replicate this with  iloc but you cannot pass it a boolean series  You must convert the boolean Series into a numpy array like this   df iloc  df  age    gt  30  values   2  4        Selecting all rows  It is possible to use  loc  iloc for just column selection  You can select all the rows by using a colon like this   df loc     color   score  2        The indexing operator      can select rows and columns too but not simultaneously   Most people are familiar with the primary purpose of the DataFrame indexing operator  which is to select columns  A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame   df  food    Jane          Steak Nick           Lamb Aaron         Mango Penelope      Apple Dean         Cheese Christina     Melon Cornelia      Beans Name  food  dtype  object   Using a list selects multiple columns  df   food    score        What people are less familiar with  is that  when slice notation is used  then selection happens by row labels or by integer location  This is very confusing and something that I almost never use but it does work   df  Penelope   Christina     slice rows by label     df 2 6 2    slice rows by integer location     The explicitness of  loc  iloc for selecting rows is highly preferred  The indexing operator alone is unable to select rows and columns simultaneously   df 3 5   color   TypeError  unhashable type   slice

User · Answer

DataFrame loc     Select rows by index value DataFrame iloc     Select rows by rows number  example    Select first 5 rows of a table  df1 is your dataframe   df1 iloc  5    Select first A  B rows of a table  df1 is your dataframe   df1 loc  A   B

User · Answer

Label vs  Location The main distinction between the two methods is   loc gets rows  and or columns  with particular labels   iloc gets rows  and or columns  at integer locations    To demonstrate  consider a series s of characters with a non-monotonic integer index   gt  gt  gt  s   pd Series list  quot abcdef quot    index  49  48  47  0  1  2    49    a 48    b 47    c 0     d 1     e 2     f   gt  gt  gt  s loc 0       value at index label 0  d    gt  gt  gt  s iloc 0      value at index location 0  a    gt  gt  gt  s loc 0 1     rows at index labels between 0 and 1  inclusive  0    d 1    e   gt  gt  gt  s iloc 0 1    rows at index location between 0 and 1  exclusive  49    a  Here are some of the differences similarities between s loc and s iloc when passed various objects       lt object gt  description s loc  lt object gt   s iloc  lt object gt       0 single item Value at index label 0  the string  d   Value at index location 0  the string  a     0 1 slice Two rows  labels 0 and 1  One row  first row at location 0    1 47 slice with out-of-bounds end Zero rows  empty Series  Five rows  location 1 onwards    1 47 -1 slice with negative step Four rows  labels 1 back to 47  Zero rows  empty Series     2  0  integer list Two rows with given labels Two rows with given locations   s  gt   e  Bool series  indicating which values have the property  One row  containing  f   NotImplementedError    s gt  e   values Bool array One row  containing  f   Same as loc   999 int object not in index KeyError IndexError  out of bounds    -1 int object not in index KeyError Returns last value in s   lambda x  x index 3  callable applied to series  here returning 3rd item in index  s loc s index 3   s iloc s index 3       loc s label-querying capabilities extend well-beyond integer indexes and it s worth highlighting a couple of additional examples  Here s a Series where the index contains string objects   gt  gt  gt  s2   pd Series s index  index s values   gt  gt  gt  s2 a    49 b    48 c    47 d     0 e     1 f     2  Since loc is label-based  it can fetch the first value in the Series using s2 loc  a    It can also slice with non-integer objects   gt  gt  gt  s2 loc  c   e      all rows lying between  c  and  e   inclusive  c    47 d     0 e     1  For DateTime indexes  we don t need to pass the exact date time to fetch by label  For example   gt  gt  gt  s3   pd Series list  abcde    pd date range  now   periods 5  freq  M      gt  gt  gt  s3 2021-01-31 16 41 31 879768    a 2021-02-28 16 41 31 879768    b 2021-03-31 16 41 31 879768    c 2021-04-30 16 41 31 879768    d 2021-05-31 16 41 31 879768    e  Then to fetch the row s  for March April 2021 we only need   gt  gt  gt  s3 loc  2021-03   2021-04   2021-03-31 17 04 30 742316    c 2021-04-30 17 04 30 742316    d  Rows and Columns loc and iloc work the same way with DataFrames as they do with Series  It s useful to note that both methods can address columns and rows together  When given a tuple  the first element is used to index the rows and  if it exists  the second element is used to index the columns  Consider the DataFrame defined below   gt  gt  gt  import numpy as np   gt  gt  gt  df   pd DataFrame np arange 25  reshape 5  5                           index list  abcde                           columns   x   y   z   8  9    gt  gt  gt  df     x   y   z   8   9 a   0   1   2   3   4 b   5   6   7   8   9 c  10  11  12  13  14 d  15  16  17  18  19 e  20  21  22  23  24  Then for example   gt  gt  gt  df loc  c       z      rows  c  and onwards AND columns up to  z      x   y   z c  10  11  12 d  15  16  17 e  20  21  22   gt  gt  gt  df iloc    3           all rows  but only the column at index location 3 a     3 b     8 c    13 d    18 e    23  Sometimes we want to mix label and positional indexing methods for the rows and columns  somehow combining the capabilities of loc and iloc  For example  consider the following DataFrame  How best to slice the rows up to and including  c  and take the first four columns   gt  gt  gt  import numpy as np   gt  gt  gt  df   pd DataFrame np arange 25  reshape 5  5                           index list  abcde                           columns   x   y   z   8  9    gt  gt  gt  df     x   y   z   8   9 a   0   1   2   3   4 b   5   6   7   8   9 c  10  11  12  13  14 d  15  16  17  18  19 e  20  21  22  23  24  We can achieve this result using iloc and the help of another method   gt  gt  gt  df iloc  df index get loc  c     1   4      x   y   z   8 a   0   1   2   3 b   5   6   7   8 c  10  11  12  13  get loc   is an index method meaning  quot get the position of the label in this index quot   Note that since slicing with iloc is exclusive of its endpoint  we must add 1 to this value if we want row  c  as well

User · Answer

loc and  iloc are used for indexing  i e   to pull out portions of data  In essence  the difference is that  loc allows label-based indexing  while  iloc allows position-based indexing  If you get confused by  loc and  iloc  keep in mind that  iloc is based on the index  starting with i  position  while  loc is based on the label  starting with l    loc  loc is supposed to be based on the index labels and not the positions  so it is analogous to Python dictionary-based indexing  However  it can accept boolean arrays  slices  and a list of labels  none of which work with a Python dictionary   iloc  iloc does the lookup based on index position  i e   pandas behaves similarly to a Python list  pandas will raise an IndexError if there is no index at that location  Examples The following examples are presented to illustrate the differences between  iloc and  loc  Let s consider the following series   gt  gt  gt  s   pd Series  11  9   index   quot 1990 quot    quot 1993 quot    name  quot Magic Numbers quot    gt  gt  gt  s 1990    11 1993     9 Name  Magic Numbers   dtype  int64   iloc Examples  gt  gt  gt  s iloc 0  11  gt  gt  gt  s iloc -1  9  gt  gt  gt  s iloc 4  Traceback  most recent call last           IndexError  single positional indexer is out-of-bounds  gt  gt  gt  s iloc 0 3    slice 1990 11 1993  9 Name  Magic Numbers   dtype  int64  gt  gt  gt  s iloc  0 1     list 1990 11 1993  9 Name  Magic Numbers   dtype  int64   loc Examples  gt  gt  gt  s loc  1990   11  gt  gt  gt  s loc  1970   Traceback  most recent call last           KeyError     the label  1970  is not in the  index      gt  gt  gt  mask   s  gt  9  gt  gt  gt  s loc mask  1990 11 Name  Magic Numbers   dtype  int64  gt  gt  gt  s loc  1990      slice 1990    11 1993     9 Name  Magic Numbers  dtype  int64  Because s has string index values   loc will fail when indexing with an integer   gt  gt  gt  s loc 0  Traceback  most recent call last           KeyError  0

User · Answer

iloc works based on integer positioning  So no matter what your row labels are  you can always  e g   get the first row by doing  df iloc 0    or the last five rows by doing  df iloc -5     You can also use it on the columns  This retrieves the 3rd column   df iloc    2       the   in the first position indicates all rows   You can combine them to get intersections of rows and columns   df iloc  3   3    The upper-left 3 X 3 entries  assuming df has 3  rows and columns    On the other hand   loc use named indices  Let s set up a data frame with strings as row and column labels   df   pd DataFrame index   a    b    c    columns   time    date    name      Then we can get the first row by  df loc  a         equivalent to df iloc 0    and the second two rows of the  date  column by   df loc  b     date       equivalent to df iloc 1   1    and so on  Now  it s probably worth pointing out that the default row and column indices for a DataFrame are integers from 0 and in this case iloc and loc would work in the same way  This is why your three examples are equivalent  If you had a non-numeric index such as strings or datetimes  df loc  5  would raise an error    Also  you can do column retrieval just by using the data frame s   getitem     df  time        equivalent to df loc     time     Now suppose you want to mix position and named indexing  that is  indexing using names on rows and positions on columns  to clarify  I mean select from our data frame  rather than creating a data frame with strings in the row index and integers in the column index   This is where  ix comes in   df ix  2   time        the first two rows of the  time  column   I think it s also worth mentioning that you can pass boolean vectors to the loc method as well  For example    b    True  False  True   df loc b     Will return the 1st and 3rd rows of df  This is equivalent to df b  for selection  but it can also be used for assigning via boolean vectors    df loc b   name      Mary    John

[python] How are iloc and loc different?

.ix is deprecated and ambiguous and should never be used

.loc selects data only by labels

.iloc selects data only by integer location

Simultaneous selection of rows and columns with .loc and .iloc

Simultaneous selection with labels and integer location

Boolean Selection

Selecting all rows

The indexing operator, `[]`, can select rows and columns too but not simultaneously.

Examples related to python

Examples related to pandas

Examples related to indexing

Examples related to dataframe