[python] Add numpy array as column to Pandas data frame

I have a Pandas data frame object of shape (X,Y) that looks like this:

[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]

and a numpy sparse matrix (CSC) of shape (X,Z) that looks something like this

[[0, 1, 0],
[0, 0, 1],
[1, 0, 0]]

How can I add the content from the matrix to the data frame in a new named column such that the data frame will end up like this:

[[1, 2, 3, [0, 1, 0]],
[4, 5, 6, [0, 0, 1]],
[7, 8, 9, [1, 0, 0]]]

Notice the data frame now has shape (X, Y+1) and rows from the matrix are elements in the data frame.

This question is related to python numpy pandas

The answer is


import numpy as np
import pandas as pd
import scipy.sparse as sparse

df = pd.DataFrame(np.arange(1,10).reshape(3,3))
arr = sparse.coo_matrix(([1,1,1], ([0,1,2], [1,2,0])), shape=(3,3))
df['newcol'] = arr.toarray().tolist()
print(df)

yields

   0  1  2     newcol
0  1  2  3  [0, 1, 0]
1  4  5  6  [0, 0, 1]
2  7  8  9  [1, 0, 0]

Similar questions with python tag:

Similar questions with numpy tag:

Similar questions with pandas tag: