I recently moved to Python 3.5 and noticed the new matrix multiplication operator (@) sometimes behaves differently from the numpy dot operator. In example, for 3d arrays:
import numpy as np
a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a @ b # Python 3.5+
d = np.dot(a, b)
The @
operator returns an array of shape:
c.shape
(8, 13, 13)
while the np.dot()
function returns:
d.shape
(8, 13, 8, 13)
How can I reproduce the same result with numpy dot? Are there any other significant differences?
This question is related to
python
numpy
matrix-multiplication
python-3.5
My experience with MATMUL and DOT
I was constantly getting "ValueError: Shape of passed values is (200, 1), indices imply (200, 3)" when trying to use MATMUL. I wanted a quick workaround and found DOT to deliver the same functionality. I don't get any error using DOT. I get the correct answer
with MATMUL
X.shape
>>>(200, 3)
type(X)
>>>pandas.core.frame.DataFrame
w
>>>array([0.37454012, 0.95071431, 0.73199394])
YY = np.matmul(X,w)
>>> ValueError: Shape of passed values is (200, 1), indices imply (200, 3)"
with DOT
YY = np.dot(X,w)
# no error message
YY
>>>array([ 2.59206877, 1.06842193, 2.18533396, 2.11366346, 0.28505879, …
YY.shape
>>> (200, )
Here is a comparison with np.einsum
to show how the indices are projected
np.allclose(np.einsum('ijk,ijk->ijk', a,b), a*b) # True
np.allclose(np.einsum('ijk,ikl->ijl', a,b), a@b) # True
np.allclose(np.einsum('ijk,lkm->ijlm',a,b), a.dot(b)) # True
In mathematics, I think the dot in numpy makes more sense
dot(a,b)_{i,j,k,a,b,c} =
since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices
As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as
So, you can see that matmul(a,b) returns an array with a small shape, which has smaller memory consumption and make more sense in applications. In particular, combining with broadcasting, you can get
matmul(a,b)_{i,j,k,l} =
for example.
From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)
To use dot(a,b) you need
To use matmul(a,b) you need
Use the following piece of code to convince yourself.
import numpy as np
for it in xrange(10000):
a = np.random.rand(5,6,2,4)
b = np.random.rand(6,4,3)
c = np.matmul(a,b)
d = np.dot(a,b)
#print 'c shape: ', c.shape,'d shape:', d.shape
for i in range(5):
for j in range(6):
for k in range(2):
for l in range(3):
if not c[i,j,k,l] == d[i,j,k,j,l]:
print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them
Just FYI, @
and its numpy equivalents dot
and matmul
are all equally fast. (Plot created with perfplot, a project of mine.)
Code to reproduce the plot:
import perfplot
import numpy
def setup(n):
A = numpy.random.rand(n, n)
x = numpy.random.rand(n)
return A, x
def at(data):
A, x = data
return A @ x
def numpy_dot(data):
A, x = data
return numpy.dot(A, x)
def numpy_matmul(data):
A, x = data
return numpy.matmul(A, x)
perfplot.show(
setup=setup,
kernels=[at, numpy_dot, numpy_matmul],
n_range=[2 ** k for k in range(15)],
)
The answer by @ajcr explains how the dot
and matmul
(invoked by the @
symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.
To clarify the differences take a 4x4 array and return the dot
product and matmul
product with a 3x4x2 'stack of matricies' or tensor.
import numpy as np
fourbyfour = np.array([
[1,2,3,4],
[3,2,1,4],
[5,4,6,7],
[11,12,13,14]
])
threebyfourbytwo = np.array([
[[2,3],[11,9],[32,21],[28,17]],
[[2,3],[1,9],[3,21],[28,7]],
[[2,3],[1,9],[3,21],[28,7]],
])
print('4x4*3x4x2 dot:\n {}\n'.format(np.dot(fourbyfour,threebyfourbytwo)))
print('4x4*3x4x2 matmul:\n {}\n'.format(np.matmul(fourbyfour,threebyfourbytwo)))
The products of each operation appear below. Notice how the dot product is,
...a sum product over the last axis of a and the second-to-last of b
and how the matrix product is formed by broadcasting the matrix together.
4x4*3x4x2 dot:
[[[232 152]
[125 112]
[125 112]]
[[172 116]
[123 76]
[123 76]]
[[442 296]
[228 226]
[228 226]]
[[962 652]
[465 512]
[465 512]]]
4x4*3x4x2 matmul:
[[[232 152]
[172 116]
[442 296]
[962 652]]
[[125 112]
[123 76]
[228 226]
[465 512]]
[[125 112]
[123 76]
[228 226]
[465 512]]]
Source: Stackoverflow.com