I'm starting to tear my hair out with this - so I hope someone can help. I have a pandas DataFrame that was created from an Excel spreadsheet using openpyxl. The resulting DataFrame looks like:
print image_name_data
id image_name
0 1001 1001_mar2014_report
1 1002 1002_mar2014_report
2 1003 1003_mar2014_report
[3 rows x 2 columns]
…with the following datatypes:
print image_name_data.dtypes
id float64
image_name object
dtype: object
The issue is that the numbers in the id column are, in fact, identification numbers and I need to treat them as strings. I've tried converting the id column to strings using:
image_name_data['id'] = image_name_data['id'].astype('str')
This seems a bit ugly but it does produce a variable of type 'object' rather than 'float64':
print image_name_data.dyptes
id object
image_name object
dtype: object
However, the strings that are created have a decimal point, as shown:
print image_name_data
id image_name
0 1001.0 1001_mar2014_report
1 1002.0 1002_mar2014_report
2 1003.0 1003_mar2014_report
[3 rows x 2 columns]
How can I convert a float64 column in a pandas DataFrame to a string with a given format (in this case, for example, '%10.0f')?
This question is related to
python
string
floating-point
pandas
format
If you could reload this, you might be able to use dtypes argument.
pd.read_csv(..., dtype={'COL_NAME':'str'})
I'm putting this in a new answer because no linebreaks / codeblocks in comments. I assume you want those nans to turn into a blank string? I couldn't find a nice way to do this, only do the ugly method:
s = pd.Series([1001.,1002.,None])
a = s.loc[s.isnull()].fillna('')
b = s.loc[s.notnull()].astype(int).astype(str)
result = pd.concat([a,b])
Source: Stackoverflow.com