[python] Pandas: change data type of Series to String

I use Pandas 'ver 0.12.0' with Python 2.7 and have a dataframe as below:

df = pd.DataFrame({'id' : [123,512,'zhub1', 12354.3, 129, 753, 295, 610],
                    'colour': ['black', 'white','white','white',
                            'black', 'black', 'white', 'white'],
                    'shape': ['round', 'triangular', 'triangular','triangular','square',
                                        'triangular','round','triangular']
                    },  columns= ['id','colour', 'shape'])

The id Series consists of some integers and strings. Its dtype by default is object. I want to convert all contents of id to strings. I tried astype(str), which produces the output below.

df['id'].astype(str)
0    1
1    5
2    z
3    1
4    1
5    7
6    2
7    6

1) How can I convert all elements of id to String?

2) I will eventually use id for indexing for dataframes. Would having String indices in a dataframe slow things down, compared to having an integer index?

This question is related to python pandas series

The answer is


A new answer to reflect the most current practices: as of version 1.0.1, neither astype('str') nor astype(str) work.

As per the documentation, a Series can be converted to the string datatype in the following ways:

df['id'] = df['id'].astype("string")

df['id'] = pandas.Series(df['id'], dtype="string")

df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)

Personally none of the above worked for me. What did:

new_str = [str(x) for x in old_obj][0]

For me it worked:

 df['id'].convert_dtypes()

see the documentation here:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html


You can use:

df.loc[:,'id'] = df.loc[:, 'id'].astype(str)

This is why they recommend this solution: Pandas doc

TD;LR

To reflect some of the answers:

df['id'] = df['id'].astype("string")

This will break on the given example because it will try to convert to StringArray which can not handle any number in the 'string'.

df['id']= df['id'].astype(str)

For me this solution throw some warning:

> SettingWithCopyWarning:  
> A value is trying to be set on a copy of a
> slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

Your problem can easily be solved by converting it to the object first. After it is converted to object, just use "astype" to convert it to str.

obj = lambda x:x[1:]
df['id']=df['id'].apply(obj).astype('str')

You must assign it, like this:-

df['id']= df['id'].astype(str)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to series

Display/Print one column from a DataFrame of Series in Pandas Python Pandas iterate over rows and access column names Extract values in Pandas value_counts() Convert pandas data frame to series Is it possible to append Series to rows of DataFrame without making a list first? assigning column names to a pandas series Convert pandas Series to DataFrame Get first element of Series without knowing the index Pandas: change data type of Series to String Conditional Replace Pandas