When I read a csv file to pandas dataframe, each column is cast to its own datatypes. I have a column that was converted to an object. I want to perform string operations for this column such as splitting the values and creating a list. But no such operation is possible because its dtype is object. Can anyone please let me know the way to convert all the items of a column to strings instead of objects?
I tried several ways but nothing worked. I used astype, str(), to_string etc.
a=lambda x: str(x).split(',')
df['column'].apply(a)
df['column'].astype(str)
Not answering the question directly, but it might help someone else.
I have a column called Volume
, having both -
(invalid/NaN) and numbers formatted with ,
df['Volume'] = df['Volume'].astype('str')
df['Volume'] = df['Volume'].str.replace(',', '')
df['Volume'] = pd.to_numeric(df['Volume'], errors='coerce')
Casting to string is required for it to apply to str.replace
since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.
df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,
or alternatively
df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters
You could try using df['column'].str.
and then use any string function. Pandas documentation includes those like split
Source: Stackoverflow.com