What does axis in pandas mean

Question

Here is my code to generate a dataframe   import pandas as pd import numpy as np  dff   pd DataFrame np random randn 1 2  columns list  AB      then I got the dataframe    ------------ --------- --------                  A         B        ------------ --------- ---------        0       0 626386  1 52325   ------------ --------- --------    When I type the commmand    dff mean axis 1    I got    0    1 074821 dtype  float64   According to the reference of pandas  axis 1 stands for columns and I expect the result of the command to be  A    0 626386 B    1 523255 dtype  float64   So here is my question  what does axis in pandas mean

User · Answer

Let s look at the table from Wiki  This is an IMF estimate of GDP from 2010 to 2019 for top ten countries     1  Axis 1 will act for each row on all the columns  If you want to calculate the average  mean  GDP for EACH countries over the decade  2010-2019   you need to do  df mean axis 1   For example  if you want to calculate mean GDP of United States from 2010 to 2019  df loc  United States   2010   2019   mean axis 1   2  Axis 0 will act for each column on all the rows If I want to calculate the average  mean  GDP for EACH year for all countries  you need to do  df mean axis 0   For example  if you want to calculate mean GDP of the year 2015 for United States  China  Japan  Germany and India  df loc  United States   India   2015   mean axis 0   Note  The above code will work only after setting  Country or dependent territory   column as the Index  using set index method

User · Answer

This is based on  Safak s answer  The best way to understand the axes in pandas numpy is to create a 3d array and check the result of the sum function along the 3 different axes    a   np ones  3 5 7     a will be       array    1   1   1   1   1   1   1         1   1   1   1   1   1   1         1   1   1   1   1   1   1         1   1   1   1   1   1   1         1   1   1   1   1   1   1           1   1   1   1   1   1   1         1   1   1   1   1   1   1         1   1   1   1   1   1   1         1   1   1   1   1   1   1         1   1   1   1   1   1   1           1   1   1   1   1   1   1         1   1   1   1   1   1   1         1   1   1   1   1   1   1         1   1   1   1   1   1   1         1   1   1   1   1   1   1        Now check out the sum of elements of the array along each of the axes    x0   np sum a axis 0   x1   np sum a axis 1   x2   np sum a axis 2    will give you the following results      x0      array   3   3   3   3   3   3   3             3   3   3   3   3   3   3             3   3   3   3   3   3   3             3   3   3   3   3   3   3             3   3   3   3   3   3   3         x1       array   5   5   5   5   5   5   5        5   5   5   5   5   5   5        5   5   5   5   5   5   5        x2      array   7   7   7   7   7             7   7   7   7   7             7   7   7   7   7

User · Answer

axis refers to the dimension of the array  in the case of pd DataFrames axis 0 is the dimension that points downwards and axis 1 the one that points to the right    Example  Think of an ndarray with shape  3 5 7       a   np ones  3 5 7     a is a 3 dimensional ndarray  i e  it has 3 axes   axes  is plural of  axis    The configuration of a will look like 3 slices of bread where each slice is of dimension 5-by-7  a 0      will refer to the 0-th slice  a 1      will refer to the 1-st slice etc   a sum axis 0  will apply sum   along the 0-th axis of a  You will add all the slices and end up with one slice of shape  5 7    a sum axis 0  is equivalent to   b   np zeros  5 7   for i in range 5       for j in range 7           b i j     a   i j  sum     b and a sum axis 0  will both look like this  array    3    3    3    3    3    3    3             3    3    3    3    3    3    3             3    3    3    3    3    3    3             3    3    3    3    3    3    3             3    3    3    3    3    3    3       In a pd DataFrame  axes work the same way as in numpy arrays  axis 0 will apply sum   or any other reduction function for each column    N B  In  zhangxaochen s answer  I find the phrases  along the rows  and  along the columns  slightly confusing  axis 0 should refer to  along each column   and axis 1  along each row

User · Answer

The easiest way for me to understand is to talk about whether you are calculating a statistic for each column  axis   0  or each row  axis   1   If you calculate a statistic  say a mean  with axis   0 you will get that statistic for each column  So if each observation is a row and each variable is in a column  you would get the mean of each variable  If you set axis   1 then you will calculate your statistic for each row  In our example  you would get the mean for each observation across all of your variables  perhaps you want the average of related measures    axis   0  by column   column-wise   along the rows  axis   1  by row   row-wise   along the columns

User · Answer

The designer of pandas  Wes McKinney  used to work intensively on finance data  Think of columns as stock names and index as daily prices  You can then guess what the default behavior is  i e   axis 0  with respect to this finance data  axis 1 can be simply thought as  the other direction    For example  the statistics functions  such as mean    sum    describe    count   all default to column-wise because it makes more sense to do them for each stock  sort index by   also defaults to column  fillna method  ffill   will fill along column because it is the same stock  dropna   defaults to row because you probably just want to discard the price on that day instead of throw away all prices of that stock    Similarly  the square brackets indexing refers to the columns since it s more common to pick a stock instead of picking a day

User · Answer

It specifies the axis along which the means are computed  By default axis 0  This is consistent with the numpy mean usage when axis is specified explicitly  in numpy mean  axis  None by default  which computes the mean value over the flattened array    in which axis 0 along the rows  namely  index in pandas   and axis 1 along the columns  For added clarity  one may choose to specify axis  index   instead of axis 0  or axis  columns   instead of axis 1     ------------ --------- --------                  A         B        ------------ --------- ---------        0       0 626386  1 52325 ----axis 1----- gt   ------------ --------- --------                                          axis 0

User · Answer

I will explicitly avoid using  row-wise  or  along the columns   since people may interpret them in exactly the wrong way   Analogy first  Intuitively  you would expect that pandas DataFrame drop axis  column   drops a column from N columns and gives you  N - 1  columns  So you can pay NO attention to rows for now  and remove word  row  from your English dictionary   Vice versa  drop axis  row   works on rows   In the same way  sum axis  column   works on multiple columns and gives you 1 column  Similarly  sum axis  row   results in 1 row  This is consistent with its simplest form of definition  reducing a list of numbers to a single number   In general  with axis column  you see columns  work on columns  and get columns  Forget rows   With axis row  change perspective and work on rows   0 and 1 are just aliases for  row  and  column   It s the convention of matrix indexing

User · Answer

Let s visualize  you gonna remember always      In Pandas    axis 0 means along  indexes   It s a row-wise operation     Suppose  to perform concat   operation on dataframe1  amp  dataframe2  we will take dataframe1  amp  take out 1st row from dataframe1 and place into the new DF  then we take out another row from dataframe1 and put into new DF  we repeat this process until we reach to the bottom of dataframe1  Then  we do the same process for dataframe2   Basically  stacking dataframe2 on top of dataframe1 or vice a versa   E g making a pile of books on a table or floor   axis 1 means along  columns   It s a column-wise operation    Suppose  to perform concat   operation on dataframe1  amp  dataframe2  we will take out the 1st complete column a k a 1st series  of dataframe1 and place into new DF  then we take out the second column of dataframe1 and keep adjacent to it  sideways   we have to repeat this operation until all columns are finished  Then  we repeat the same process on dataframe2  Basically   stacking dataframe2 sideways   E g arranging books on a bookshelf      More to it  since arrays are better representations to represent a nested n-dimensional structure compared to matrices  so below can help you more to visualize how axis plays an important role when you generalize to more than one dimension  Also  you can actually print write draw visualize any n-dim array but  writing or visualizing the same in a matrix representation 3-dim  is impossible on a paper more than 3-dimensions

User · Answer

I think there is an another way to understand it    For a np array if we want eliminate columns we use axis   1  if we want eliminate rows  we use axis   0   np mean np array np ones shape  3 5 10    axis   0  shape    5 10  np mean np array np ones shape  3 5 10    axis   1  shape    3 10  np mean np array np ones shape  3 5 10    axis    0 1   shape    10     For pandas object  axis   0 stands for row-wise operation and axis   1 stands for column-wise operation  This is different from numpy by definition  we can check definitions from numpy doc and pandas doc

User · Answer

Arrays are designed with so-called axis 0 and rows positioned vertically versus axis 1 and columns positioned horizontally  Axis refers to the dimension of the array

User · Answer

I m a newbie to pandas  But this is how I understand axis in pandas     Axis   Constant     Varying      Direction    0     Column            Row          Downwards       1           Row          Column     Towards Right --     So to compute mean of a column  that particular column should be constant but the rows under that can change  varying  so it is axis 0   Similarly  to compute mean of a row  that particular row is constant but it can traverse through different columns  varying   axis 1

User · Answer

axis   0 means up to down  axis   1 means left to right  sums key    lang sets key  iloc   1   sum axis 0    Given example is taking sum of all the data in column    key

User · Answer

These answers do help explain this  but it still isn t perfectly intuitive for a non-programmer  i e  someone like me who is learning Python for the first time in context of data science coursework   I still find using the terms  along  or  for each  wrt to rows and columns to be confusing   What makes more sense to me is to say it this way    Axis 0 will act on all the ROWS in each COLUMN  Axis 1 will act on all the COLUMNS in each ROW   So a mean on axis 0 will be the mean of all the rows in each column  and a mean on axis 1 will be a mean of all the columns in each row   Ultimately this is saying the same thing as  zhangxaochen and  Michael  but in a way that is easier for me to internalize

User · Answer

axis 1  It will give the sum row wise keepdims True will maintain the 2D dimension  Hope it helps you

User · Answer

Say for example  if you use df shape then you will get a tuple containing the number of rows  amp  columns in the data frame as the output  In  10   movies df shape Out 10    1000  11   In the example above  there are 1000 rows  amp  11 columns in the movies data frame where  row  is mentioned in the index 0 position  amp   column  in the index 1 position of the tuple  Hence  axis 1  denotes column  amp   axis 0  denotes row  Credits  Github

User · Answer

Many answers here helped me a lot   In case you get confused by the different behaviours of axis in Python and MARGIN in R  like in the apply function   you may find a blog post that I wrote of interest  https   accio github io programming 2020 05 19 numpy-pandas-axis html    In essence    Their behaviours are  intriguingly  easier to understand with three-dimensional array than with two-dimensional arrays  In Python packages numpy and pandas  the axis parameter in sum actually specifies numpy to calculate the mean of all values that can be fetched in the form of array 0  0       i       0  where i iterates through all possible values  The process is repeated with the position of i fixed and the indices of other dimensions vary one after the other  from the most far-right element   The result is a n-1-dimensional array  In R  the MARGINS parameter let the apply function calculate the mean of all values that can be fetched in the form of array         i         where i iterates through all possible values  The process is not repeated when all i values have been iterated  Therefore  the result is a simple vector

User · Answer

My thinking   Axis   n  where n   0  1  etc  means that the matrix is collapsed  folded  along that axis  So in a 2D matrix  when you collapse along 0  rows   you are really operating on one column at a time  Similarly for higher order matrices   This is not the same as the normal reference to a dimension in a matrix  where 0 -  row and 1 -  column  Similarly for other dimensions in an N dimension array

User · Answer

one of easy ways to remember axis 1  columns   vs axis 0  rows  is the output you expect     if you expect an output for each row you use axis  columns    on the other hand if you want an output for each column you use axis  rows

User · Answer

I have been trying to figure out the axis for the last hour as well  The language in all the above answers  and also the documentation is not at all helpful   To answer the question as I understand it now  in Pandas  axis   1 or 0 means which axis headers do you want to keep constant when applying the function   Note  When I say headers  I mean index names  Expanding your example    ------------ --------- --------                  A         B        ------------ --------- ---------        X       0 626386  1 52325   ------------ --------- --------         Y       0 626386  1 52325   ------------ --------- --------    For axis 1 columns   We keep columns headers constant and apply the mean function by changing data  To demonstrate  we keep the columns headers constant as    ------------ --------- --------                  A         B         Now we populate one set of A and B values and then find the mean                 0 626386  1 52325      Then we populate next set of A and B values and find the mean                 0 626386  1 52325    Similarly  for axis rows  we keep row headers constant  and keep changing the data  To demonstrate  first fix the row headers    ------------         X        ------------         Y        ------------    Now populate first set of X and Y values and then find the mean   ------------ ---------         X       0 626386  ------------ ---------         Y       0 626386  ------------ ---------    Then populate the next set of X and Y values and then find the mean    ------------ ---------         X       1 52325    ------------ ---------         Y       1 52325    ------------ ---------    In summary    When axis columns  you fix the column headers and change data  which will come from the different rows   When axis rows  you fix the row headers and change data  which will come from the different columns

User · Answer

I understand this way    Say if your operation requires traversing from left to right right to left in a dataframe  you are apparently merging columns ie  you are operating on various columns  This is axis  1  Example  df   pd DataFrame np arange 12  reshape 3 4  columns   A    B    C    D    print df     A  B   C   D 0  0  1   2   3 1  4  5   6   7 2  8  9  10  11   df mean axis 1   0    1 5 1    5 5 2    9 5 dtype  float64  df drop   A   B   axis 1 inplace True       C   D 0   2   3 1   6   7 2  10  11      Point to note here is we are operating on columns   Similarly  if your operation requires traversing from top to bottom bottom to top in a dataframe  you are merging rows  This is axis 0

User · Answer

Axis in view of programming is the position in the shape tuple  Here is an example   import numpy as np  a np arange 120  reshape 2 3 4 5   a shape Out 3    2  3  4  5   np sum a axis 0  shape Out 4    3  4  5   np sum a axis 1  shape Out 5    2  4  5   np sum a axis 2  shape Out 6    2  3  5   np sum a axis 3  shape Out 7    2  3  4    Mean on the axis will cause that dimension to be removed   Referring to the original question  the dff shape is  1 2   Using axis 1 will change the shape to  1

User · Answer

The problem with using axis  properly is for its use for 2 main different cases   For computing an accumulated value  or rearranging  e  g  sorting  data  For manipulating   quot playing quot  with  entities  e  g  dataframes    The main idea behind this answer is that for avoiding the confusion  we select either a number  or a name for specifying the particular axis  whichever is more clear  intuitive  and descriptive  Pandas is based on NumPy  which is based on mathematics  particularly on n-dimensional matrices  Here is an image for common use of axes  names in math in the 3-dimensional space   This picture is for memorizing the axes  ordinal numbers only   0 for x-axis  1 for y-axis  and 2 for z-axis   The z-axis is only for panels  for dataframes we will restrict our interest to the green-colored  2-dimensional basic plane with x-axis  0  vertical   and y-axis  1  horizontal    It s all for numbers as potential values of axis  parameter  The names of axes are  index   you may use the alias  rows   and  columns   and for this explanation it is NOT important the relation between these names and ordinal numbers  of axes    as  everybody knows what the words  quot rows quot  and  quot columns quot  mean  and everybody here     I suppose     knows what the word  quot index quot  in pandas means   And now  my recommendation   If you want to compute an accumulated value  you may compute it from values located along axis 0  or along axis 1      use axis 0  or axis 1   Similarly  if you want to rearrange values  use the axis number of the axis  along which are located data for rearranging  e g  for sorting    If you want to manipulate  e g  concatenate  entities  e g  dataframes      use axis  index   synonym  axis  rows   or axis  columns  to specify the resulting change     index  rows  or columns  respectively   For concatenating  you will obtain either a longer index    more rows   or more columns  respectively

[python] What does axis in pandas mean?

Examples related to python

Examples related to pandas

Examples related to numpy

Examples related to dataframe