For Data analysis, continuous data is often discretized or separated into “bins”.Suppose you have a list of people and their ages and you want to group them into discrete age buckets.

data = [['Alex',10],['Bob',16],['Clarke',26],['James',24],['John',69]]

Let’s divide these into bins of 0 to 14, 15 to 24, 25 to 64, and finally 65 to 100. To do so, you have to use cut function in Pandas.

df['binned']=pd.cut(x=df['age'], bins=[0,14,24,64,100])

It contains a categories array specifying the distinct category names along with labeling for the ages data in the codes attribute.

Pandas cut for bins

The parenthesis means that the side is open, while the square bracket means it is closed. You can change which side is closed by passing right=False.

Pandas Cut Right bins

You can also pass your own bin names by passing a list or array to the labels option.

category = ['Child', 'Young', 'Adults', 'Senior']

df['category']=pd.cut(x=df['age'], bins=[0,14,24,64,100],labels=category)
Pandas Cut Label bins

If you pass an integer number of bins to cut instead of explicit bin edges, it will compute equal-length bins based on the minimum and maximum values in the data. Consider the case of some uniformly distributed data chopped into three.

data = [0,10,20,30,40,50,60,70,80,90,100]
pd.cut(data, 4,precision=0)
Pandas Cut Auto Bins

Count Bins

Note that pd.value_counts(bins) are the bin counts for the result of pandas.cut .

pd.value_counts(df['binned'])
#df.groupby(df['category']).size()
Pandas cut count Count

Related Post