[python] How to calculate the sum of all columns of a 2D numpy array (efficiently)

Let's say I have the following 2D numpy array consisting of four rows and three columns:

>>> a = numpy.arange(12).reshape(4,3)
>>> print(a)
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

What would be an efficient way to generate a 1D array that contains the sum of all columns (like [18, 22, 26])? Can this be done without having the need to loop through all columns?

This question is related to python numpy

The answer is


Use numpy.sum. for your case, it is

sum = a.sum(axis=0)

Other alternatives for summing the columns are

numpy.einsum('ij->j', a)

and

numpy.dot(a.T, numpy.ones(a.shape[0]))

If the number of rows and columns is in the same order of magnitude, all of the possibilities are roughly equally fast:

enter image description here

If there are only a few columns, however, both the einsum and the dot solution significantly outperform numpy's sum (note the log-scale):

enter image description here


Code to reproduce the plots:

import numpy
import perfplot


def numpy_sum(a):
    return numpy.sum(a, axis=1)


def einsum(a):
    return numpy.einsum('ij->i', a)


def dot_ones(a):
    return numpy.dot(a, numpy.ones(a.shape[1]))


perfplot.save(
    "out1.png",
    # setup=lambda n: numpy.random.rand(n, n),
    setup=lambda n: numpy.random.rand(n, 3),
    n_range=[2**k for k in range(15)],
    kernels=[numpy_sum, einsum, dot_ones],
    logx=True,
    logy=True,
    xlabel='len(a)',
    )

a.sum(0)

should solve the problem. It is a 2d np.array and you will get the sum of all column. axis=0 is the dimension that points downwards and axis=1 the one that points to the right.


Then NumPy sum function takes an optional axis argument that specifies along which axis you would like the sum performed:

>>> a = numpy.arange(12).reshape(4,3)
>>> a.sum(0)
array([18, 22, 26])

Or, equivalently:

>>> numpy.sum(a, 0)
array([18, 22, 26])

Use the axis argument:

>> numpy.sum(a, axis=0)
  array([18, 22, 26])