[python] Binning column with python pandas

You can use pandas.cut:

bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = pd.cut(df['percentage'], bins)
print (df)
   percentage     binned
0       46.50   (25, 50]
1       44.20   (25, 50]
2      100.00  (50, 100]
3       42.12   (25, 50]

bins = [0, 1, 5, 10, 25, 50, 100]
labels = [1,2,3,4,5,6]
df['binned'] = pd.cut(df['percentage'], bins=bins, labels=labels)
print (df)
   percentage binned
0       46.50      5
1       44.20      5
2      100.00      6
3       42.12      5

Or numpy.searchsorted:

bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = np.searchsorted(bins, df['percentage'].values)
print (df)
   percentage  binned
0       46.50       5
1       44.20       5
2      100.00       6
3       42.12       5

...and then value_counts or groupby and aggregate size:

s = pd.cut(df['percentage'], bins=bins).value_counts()
print (s)
(25, 50]     3
(50, 100]    1
(10, 25]     0
(5, 10]      0
(1, 5]       0
(0, 1]       0
Name: percentage, dtype: int64

s = df.groupby(pd.cut(df['percentage'], bins=bins)).size()
print (s)
percentage
(0, 1]       0
(1, 5]       0
(5, 10]      0
(10, 25]     0
(25, 50]     3
(50, 100]    1
dtype: int64

By default cut return categorical.

Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data, operations in categorical.

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to numpy

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to binning

Binning column with python pandas Histogram using gnuplot?