Let's say I have the following 2D numpy array consisting of four rows and three columns:
>>> a = numpy.arange(12).reshape(4,3)
>>> print(a)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
What would be an efficient way to generate a 1D array that contains the sum of all columns (like [18, 22, 26]
)? Can this be done without having the need to loop through all columns?
Use numpy.sum
. for your case, it is
sum = a.sum(axis=0)
Other alternatives for summing the columns are
numpy.einsum('ij->j', a)
and
numpy.dot(a.T, numpy.ones(a.shape[0]))
If the number of rows and columns is in the same order of magnitude, all of the possibilities are roughly equally fast:
If there are only a few columns, however, both the einsum
and the dot
solution significantly outperform numpy's sum
(note the log-scale):
Code to reproduce the plots:
import numpy
import perfplot
def numpy_sum(a):
return numpy.sum(a, axis=1)
def einsum(a):
return numpy.einsum('ij->i', a)
def dot_ones(a):
return numpy.dot(a, numpy.ones(a.shape[1]))
perfplot.save(
"out1.png",
# setup=lambda n: numpy.random.rand(n, n),
setup=lambda n: numpy.random.rand(n, 3),
n_range=[2**k for k in range(15)],
kernels=[numpy_sum, einsum, dot_ones],
logx=True,
logy=True,
xlabel='len(a)',
)
a.sum(0)
should solve the problem. It is a 2d np.array
and you will get the sum of all column. axis=0
is the dimension that points downwards and axis=1
the one that points to the right.
Then NumPy sum
function takes an optional axis argument that specifies along which axis you would like the sum performed:
>>> a = numpy.arange(12).reshape(4,3)
>>> a.sum(0)
array([18, 22, 26])
Or, equivalently:
>>> numpy.sum(a, 0)
array([18, 22, 26])
Use the axis
argument:
>> numpy.sum(a, axis=0)
array([18, 22, 26])
Source: Stackoverflow.com