Sometimes it is useful to "clone" a row or column vector to a matrix. By cloning I mean converting a row vector such as
[1, 2, 3]
Into a matrix
[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]
or a column vector such as
[[1],
[2],
[3]]
into
[[1, 1, 1]
[2, 2, 2]
[3, 3, 3]]
In MATLAB or octave this is done pretty easily:
x = [1, 2, 3]
a = ones(3, 1) * x
a =
1 2 3
1 2 3
1 2 3
b = (x') * ones(1, 3)
b =
1 1 1
2 2 2
3 3 3
I want to repeat this in numpy, but unsuccessfully
In [14]: x = array([1, 2, 3])
In [14]: ones((3, 1)) * x
Out[14]:
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
# so far so good
In [16]: x.transpose() * ones((1, 3))
Out[16]: array([[ 1., 2., 3.]])
# DAMN
# I end up with
In [17]: (ones((3, 1)) * x).transpose()
Out[17]:
array([[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]])
Why wasn't the first method (In [16]
) working? Is there a way to achieve this task in python in a more elegant way?
This question is related to
python
numpy
linear-algebra
To answer the actual question, now that nearly a dozen approaches to working around a solution have been posted: x.transpose
reverses the shape of x
. One of the interesting side-effects is that if x.ndim == 1
, the transpose does nothing.
This is especially confusing for people coming from MATLAB, where all arrays implicitly have at least two dimensions. The correct way to transpose a 1D numpy array is not x.transpose()
or x.T
, but rather
x[:, None]
or
x.reshape(-1, 1)
From here, you can multiply by a matrix of ones, or use any of the other suggested approaches, as long as you respect the (subtle) differences between MATLAB and numpy.
If you have a pandas dataframe and want to preserve the dtypes, even the categoricals, this is a fast way to do it:
import numpy as np
import pandas as pd
df = pd.DataFrame({1: [1, 2, 3], 2: [4, 5, 6]})
number_repeats = 50
new_df = df.reindex(np.tile(df.index, number_repeats))
Let:
>>> n = 1000
>>> x = np.arange(n)
>>> reps = 10000
Zero-cost allocations
A view does not take any additional memory. Thus, these declarations are instantaneous:
# New axis
x[np.newaxis, ...]
# Broadcast to specific shape
np.broadcast_to(x, (reps, n))
Forced allocation
If you want force the contents to reside in memory:
>>> %timeit np.array(np.broadcast_to(x, (reps, n)))
10.2 ms ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit np.repeat(x[np.newaxis, :], reps, axis=0)
9.88 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit np.tile(x, (reps, 1))
9.97 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
All three methods are roughly the same speed.
Computation
>>> a = np.arange(reps * n).reshape(reps, n)
>>> x_tiled = np.tile(x, (reps, 1))
>>> %timeit np.broadcast_to(x, (reps, n)) * a
17.1 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit x[np.newaxis, :] * a
17.5 ms ± 300 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit x_tiled * a
17.6 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
All three methods are roughly the same speed.
Conclusion
If you want to replicate before a computation, consider using one of the "zero-cost allocation" methods. You won't suffer the performance penalty of "forced allocation".
Use numpy.tile
:
>>> tile(array([1,2,3]), (3, 1))
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
or for repeating columns:
>>> tile(array([[1,2,3]]).transpose(), (1, 3))
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
One clean solution is to use NumPy's outer-product function with a vector of ones:
np.outer(np.ones(n), x)
gives n
repeating rows. Switch the argument order to get repeating columns. To get an equal number of rows and columns you might do
np.outer(np.ones_like(x), x)
I think using the broadcast in numpy is the best, and faster
I did a compare as following
import numpy as np
b = np.random.randn(1000)
In [105]: %timeit c = np.tile(b[:, newaxis], (1,100))
1000 loops, best of 3: 354 µs per loop
In [106]: %timeit c = np.repeat(b[:, newaxis], 100, axis=1)
1000 loops, best of 3: 347 µs per loop
In [107]: %timeit c = np.array([b,]*100).transpose()
100 loops, best of 3: 5.56 ms per loop
about 15 times faster using broadcast
First note that with numpy's broadcasting operations it's usually not necessary to duplicate rows and columns. See this and this for descriptions.
But to do this, repeat and newaxis are probably the best way
In [12]: x = array([1,2,3])
In [13]: repeat(x[:,newaxis], 3, 1)
Out[13]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
In [14]: repeat(x[newaxis,:], 3, 0)
Out[14]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
This example is for a row vector, but applying this to a column vector is hopefully obvious. repeat seems to spell this well, but you can also do it via multiplication as in your example
In [15]: x = array([[1, 2, 3]]) # note the double brackets
In [16]: (ones((3,1))*x).transpose()
Out[16]:
array([[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]])
import numpy as np
x=np.array([1,2,3])
y=np.multiply(np.ones((len(x),len(x))),x).T
print(y)
yields:
[[ 1. 1. 1.]
[ 2. 2. 2.]
[ 3. 3. 3.]]
You can use
np.tile(x,3).reshape((4,3))
tile will generate the reps of the vector
and reshape will give it the shape you want
Source: Stackoverflow.com