How can one modify the format for the output from a groupby operation in pandas that produces scientific notation for very large numbers?
I know how to do string formatting in python but I'm at a loss when it comes to applying it here.
df1.groupby('dept')['data1'].sum()
dept
value1 1.192433e+08
value2 1.293066e+08
value3 1.077142e+08
This suppresses the scientific notation if I convert to string but now I'm just wondering how to string format and add decimals.
sum_sales_dept.astype(str)
This question is related to
python
pandas
floating-point
scientific-notation
number-formatting
I had multiple dataframes with different floating point, so thx to Allans idea made dynamic length.
pd.set_option('display.float_format', lambda x: f'%.{len(str(x%1))-2}f' % x)
The minus of this is that if You have last 0 in float, it will cut it. So it will be not 0.000070, but 0.00007.
Setting a fixed number of decimal places globally is often a bad idea since it is unlikely that it will be an appropriate number of decimal places for all of your various data that you will display regardless of magnitude. Instead, try this which will give you scientific notation only for large and very small values (and adds a thousands separator unless you omit the ","):
pd.set_option('display.float_format', lambda x: '%,g' % x)
Or to almost completely suppress scientific notation without losing precision, try this:
pd.set_option('display.float_format', str)
You can use round function just to suppress scientific notation for specific dataframe:
df1.round(4)
or you can suppress is globally by:
pd.options.display.float_format = '{:.4f}'.format
If you want to style the output of a data frame in a jupyter notebook cell, you can set the display style on a per-dataframe basis:
df = pd.DataFrame({'A': np.random.randn(4)*1e7})
df.style.format("{:.1f}")
See the documentation here.
If you would like to use the values, say as part of csvfile csv.writer, the numbers can be formatted before creating a list:
df['label'].apply(lambda x: '%.17f' % x).values.tolist()
Here is another way of doing it, similar to Dan Allan's answer but without the lambda function:
>>> pd.options.display.float_format = '{:.2f}'.format
>>> Series(np.random.randn(3))
0 0.41
1 0.99
2 0.10
or
>>> pd.set_option('display.float_format', '{:.2f}'.format)
Source: Stackoverflow.com