I found out a curious thing today and was wondering if somebody could shed some light into what the difference is here?
import numpy as np
A = np.arange(12).reshape(4,3)
for a in A:
a = a + 1
B = np.arange(12).reshape(4,3)
for b in B:
b += 1
After running each for
loop, A
has not changed, but B
has had one added to each element. I actually use the B
version to write to a initialized NumPy array within a for
loop.
The difference is that one modifies the data-structure itself (in-place operation) b += 1
while the other just reassigns the variable a = a + 1
.
Just for completeness:
x += y
is not always doing an in-place operation, there are (at least) three exceptions:
If x
doesn't implement an __iadd__
method then the x += y
statement is just a shorthand for x = x + y
. This would be the case if x
was something like an int
.
If __iadd__
returns NotImplemented
, Python falls back to x = x + y
.
The __iadd__
method could theoretically be implemented to not work in place. It'd be really weird to do that, though.
As it happens your b
s are numpy.ndarray
s which implements __iadd__
and return itself so your second loop modifies the original array in-place.
You can read more on this in the Python documentation of "Emulating Numeric Types".
These [
__i*__
] methods are called to implement the augmented arithmetic assignments (+=
,-=
,*=
,@=
,/=
,//=
,%=
,**=
,<<=
,>>=
,&=
,^=
,|=
). These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self). If a specific method is not defined, the augmented assignment falls back to the normal methods. For instance, if x is an instance of a class with an__iadd__()
method,x += y
is equivalent tox = x.__iadd__(y)
. Otherwise,x.__add__(y)
andy.__radd__(x)
are considered, as with the evaluation ofx + y
. In certain situations, augmented assignment can result in unexpected errors (see Why doesa_tuple[i] += ["item"]
raise an exception when the addition works?), but this behavior is in fact part of the data model.
The short form(a += 1
) has the option to modify a
in-place , instead of creating a new object representing the sum and rebinding it back to the same name(a = a + 1
).So,The short form(a += 1
) is much efficient as it doesn't necessarily need to make a copy of a
unlike a = a + 1
.
Also even if they are outputting the same result, notice they are different because they are separate operators: +
and +=
First off: The variables a and b in the loops refer to numpy.ndarray
objects.
In the first loop, a = a + 1
is evaluated as follows: the __add__(self, other)
function of numpy.ndarray
is called. This creates a new object and hence, A is not modified. Afterwards, the variable a
is set to refer to the result.
In the second loop, no new object is created. The statement b += 1
calls the __iadd__(self, other)
function of numpy.ndarray
which modifies the ndarray
object in place to which b is referring to. Hence, B
is modified.
In the first example, you are reassigning the variable a
, while in the second one you are modifying the data in-place, using the +=
operator.
See the section about 7.2.1. Augmented assignment statements :
An augmented assignment expression like
x += 1
can be rewritten asx = x + 1
to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.
+=
operator calls __iadd__
. This function makes the change in-place, and only after its execution, the result is set back to the object you are "applying" the +=
on.
__add__
on the other hand takes the parameters and returns their sum (without modifying them).
As already pointed out, b += 1
updates b
in-place, while a = a + 1
computes a + 1
and then assigns the name a
to the result (now a
does not refer to a row of A
anymore).
To understand the +=
operator properly though, we need also to understand the concept of mutable versus immutable objects. Consider what happens when we leave out the .reshape
:
C = np.arange(12)
for c in C:
c += 1
print(C) # [ 0 1 2 3 4 5 6 7 8 9 10 11]
We see that C
is not updated, meaning that c += 1
and c = c + 1
are equivalent. This is because now C
is a 1D array (C.ndim == 1
), and so when iterating over C
, each integer element is pulled out and assigned to c
.
Now in Python, integers are immutable, meaning that in-place updates are not allowed, effectively transforming c += 1
into c = c + 1
, where c
now refers to a new integer, not coupled to C
in any way. When you loop over the reshaped arrays, whole rows (np.ndarray
's) are assigned to b
(and a
) at a time, which are mutable objects, meaning that you are allowed to stick in new integers at will, which happens when you do a += 1
.
It should be mentioned that though +
and +=
are meant to be related as described above (and very much usually are), any type can implement them any way it wants by defining the __add__
and __iadd__
methods, respectively.
A key issue here is that this loop iterates over the rows (1st dimension) of B
:
In [258]: B
Out[258]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
In [259]: for b in B:
...: print(b,'=>',end='')
...: b += 1
...: print(b)
...:
[0 1 2] =>[1 2 3]
[3 4 5] =>[4 5 6]
[6 7 8] =>[7 8 9]
[ 9 10 11] =>[10 11 12]
Thus the +=
is acting on a mutable object, an array.
This is implied in the other answers, but easily missed if your focus is on the a = a+1
reassignment.
I could also make an in-place change to b
with [:]
indexing, or even something fancier, b[1:]=0
:
In [260]: for b in B:
...: print(b,'=>',end='')
...: b[:] = b * 2
[1 2 3] =>[2 4 6]
[4 5 6] =>[ 8 10 12]
[7 8 9] =>[14 16 18]
[10 11 12] =>[20 22 24]
Of course with a 2d array like B
we usually don't need to iterate on the rows. Many operations that work on a single of B
also work on the whole thing. B += 1
, B[1:] = 0
, etc.
Source: Stackoverflow.com