[python] Python pandas insert list into a cell

I have a list 'abc' and a dataframe 'df':

abc = ['foo', 'bar']
df =
    A  B
0  12  NaN
1  23  NaN

I want to insert the list into cell 1B, so I want this result:

    A  B
0  12  NaN
1  23  ['foo', 'bar']

Ho can I do that?

1) If I use this:

df.ix[1,'B'] = abc

I get the following error message:

ValueError: Must have equal len keys and value when setting with an iterable

because it tries to insert the list (that has two elements) into a row / column but not into a cell.

2) If I use this:

df.ix[1,'B'] = [abc]

then it inserts a list that has only one element that is the 'abc' list ( [['foo', 'bar']] ).

3) If I use this:

df.ix[1,'B'] = ', '.join(abc)

then it inserts a string: ( foo, bar ) but not a list.

4) If I use this:

df.ix[1,'B'] = [', '.join(abc)]

then it inserts a list but it has only one element ( ['foo, bar'] ) but not two as I want ( ['foo', 'bar'] ).

Thanks for help!


EDIT

My new dataframe and the old list:

abc = ['foo', 'bar']
df2 =
    A    B         C
0  12  NaN      'bla'
1  23  NaN  'bla bla'

Another dataframe:

df3 =
    A    B         C                    D
0  12  NaN      'bla'  ['item1', 'item2']
1  23  NaN  'bla bla'        [11, 12, 13]

I want insert the 'abc' list into df2.loc[1,'B'] and/or df3.loc[1,'B'].

If the dataframe has columns only with integer values and/or NaN values and/or list values then inserting a list into a cell works perfectly. If the dataframe has columns only with string values and/or NaN values and/or list values then inserting a list into a cell works perfectly. But if the dataframe has columns with integer and string values and other columns then the error message appears if I use this: df2.loc[1,'B'] = abc or df3.loc[1,'B'] = abc.

Another dataframe:

df4 =
          A     B
0      'bla'  NaN
1  'bla bla'  NaN

These inserts work perfectly: df.loc[1,'B'] = abc or df4.loc[1,'B'] = abc.

This question is related to python list pandas insert dataframe

The answer is


Quick work around

Simply enclose the list within a new list, as done for col2 in the data frame below. The reason it works is that python takes the outer list (of lists) and converts it into a column as if it were containing normal scalar items, which is lists in our case and not normal scalars.

mydict={'col1':[1,2,3],'col2':[[1, 4], [2, 5], [3, 6]]}
data=pd.DataFrame(mydict)
data


   col1     col2
0   1       [1, 4]
1   2       [2, 5]
2   3       [3, 6]

Since set_value has been deprecated since version 0.21.0, you should now use at. It can insert a list into a cell without raising a ValueError as loc does. I think this is because at always refers to a single value, while loc can refer to values as well as rows and columns.

df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})

df.at[1, 'B'] = ['m', 'n']

df =
    A   B
0   1   x
1   2   [m, n]
2   3   z

You also need to make sure the column you are inserting into has dtype=object. For example

>>> df = pd.DataFrame(data={'A': [1, 2, 3], 'B': [1,2,3]})
>>> df.dtypes
A    int64
B    int64
dtype: object

>>> df.at[1, 'B'] = [1, 2, 3]
ValueError: setting an array element with a sequence

>>> df['B'] = df['B'].astype('object')
>>> df.at[1, 'B'] = [1, 2, 3]
>>> df
   A          B
0  1          1
1  2  [1, 2, 3]
2  3          3

As mentionned in this post pandas: how to store a list in a dataframe?; the dtypes in the dataframe may influence the results, as well as calling a dataframe or not to be assigned to.


Pandas >= 0.21

set_value has been deprecated. You can now use DataFrame.at to set by label, and DataFrame.iat to set by integer position.

Setting Cell Values with at/iat

# Setup
df = pd.DataFrame({'A': [12, 23], 'B': [['a', 'b'], ['c', 'd']]})
df

    A       B
0  12  [a, b]
1  23  [c, d]

df.dtypes

A     int64
B    object
dtype: object

If you want to set a value in second row of the "B" to some new list, use DataFrane.at:

df.at[1, 'B'] = ['m', 'n']
df

    A       B
0  12  [a, b]
1  23  [m, n]

You can also set by integer position using DataFrame.iat

df.iat[1, df.columns.get_loc('B')] = ['m', 'n']
df

    A       B
0  12  [a, b]
1  23  [m, n]

What if I get ValueError: setting an array element with a sequence?

I'll try to reproduce this with:

df

    A   B
0  12 NaN
1  23 NaN

df.dtypes

A      int64
B    float64
dtype: object

df.at[1, 'B'] = ['m', 'n']
# ValueError: setting an array element with a sequence.

This is because of a your object is of float64 dtype, whereas lists are objects, so there's a mismatch there. What you would have to do in this situation is to convert the column to object first.

df['B'] = df['B'].astype(object)
df.dtypes

A     int64
B    object
dtype: object

Then, it works:

df.at[1, 'B'] = ['m', 'n']
df

    A       B
0  12     NaN
1  23  [m, n]

Possible, But Hacky

Even more wacky, I've found you can hack through DataFrame.loc to achieve something similar if you pass nested lists.

df.loc[1, 'B'] = [['m'], ['n'], ['o'], ['p']]
df

    A             B
0  12        [a, b]
1  23  [m, n, o, p]

You can read more about why this works here.


Also getting

ValueError: Must have equal len keys and value when setting with an iterable,

using .at rather than .loc did not make any difference in my case, but enforcing the datatype of the dataframe column did the trick:

df['B'] = df['B'].astype(object)

Then I could set lists, numpy array and all sorts of things as single cell values in my dataframes.


I've got a solution that's pretty simple to implement.

Make a temporary class just to wrap the list object and later call the value from the class.

Here's a practical example:

  1. Let's say you want to insert list object into the dataframe.
df = pd.DataFrame([
    {'a': 1},
    {'a': 2},
    {'a': 3},
])

df.loc[:, 'b'] = [
    [1,2,4,2,], 
    [1,2,], 
    [4,5,6]
] # This works. Because the list has the same length as the rows of the dataframe

df.loc[:, 'c'] = [1,2,4,5,3] # This does not work. 

>>> ValueError: Must have equal len keys and value when setting with an iterable

## To force pandas to have list as value in each cell, wrap the list with a temporary class.

class Fake(object):
    def __init__(self, li_obj):
        self.obj = li_obj

df.loc[:, 'c'] = Fake([1,2,5,3,5,7,]) # This works. 

df.c = df.c.apply(lambda x: x.obj) # Now extract the value from the class. This works. 

Creating a fake class to do this might look like a hassle but it can have some practical applications. For an example you can use this with apply when the return value is list.

Pandas would normally refuse to insert list into a cell but if you use this method, you can force the insert.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to insert

How to insert current datetime in postgresql insert query How to add element in Python to the end of list using list.insert? Python pandas insert list into a cell Field 'id' doesn't have a default value? Insert a row to pandas dataframe Insert at first position of a list in Python How can INSERT INTO a table 300 times within a loop in SQL? How to refresh or show immediately in datagridview after inserting? Insert node at a certain position in a linked list C++ select from one table, insert into another table oracle sql query

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe