Discretization, Binning, and Count in Pandas Column.

For Data analysis, continuous data is often discretized or separated into “bins”.Suppose you have a list of people and their ages and you want to group them into discrete age buckets.

data = [['Alex',10],['Bob',16],['Clarke',26],['James',24],['John',69]]

Let’s divide these into bins of 0 to 14, 15 to 24, 25 to 64, and finally 65 to 100. To do so, you have to use cut function in Pandas.

df['binned']=pd.cut(x=df['age'], bins=[0,14,24,64,100])

It contains a categories array specifying the distinct category names along with labeling for the ages data in the codes attribute.

The parenthesis means that the side is open, while the square bracket means it is closed. You can change which side is closed by passing right=False.

You can also pass your own bin names by passing a list or array to the labels option.

category = ['Child', 'Young', 'Adults', 'Senior']

df['category']=pd.cut(x=df['age'], bins=[0,14,24,64,100],labels=category)

If you pass an integer number of bins to cut instead of explicit bin edges, it will compute equal-length bins based on the minimum and maximum values in the data. Consider the case of some uniformly distributed data chopped into three.

data = [0,10,20,30,40,50,60,70,80,90,100]
pd.cut(data, 4,precision=0)

Count Bins

Note that pd.value_counts(bins) are the bin counts for the result of pandas.cut .

pd.value_counts(df['binned'])
#df.groupby(df['category']).size()

Discretization, Binning, and Count in Pandas Column.

Count Bins

Related Post

Latest Posts