For Data analysis, continuous data is often discretized or separated into “bins”.Suppose you have a list of people and their ages and you want to group them into discrete age buckets.
data = [['Alex',10],['Bob',16],['Clarke',26],['James',24],['John',69]]
Let’s divide these into bins of 0 to 14, 15 to 24, 25 to 64, and finally 65 to 100. To do so, you have to use
cut function in Pandas.
It contains a categories array specifying the distinct category names along with labeling for the
ages data in the codes attribute.
The parenthesis means that the side is open, while the square bracket means it is closed. You can change which side is closed by passing
You can also pass your own bin names by passing a list or array to the labels option.
category = ['Child', 'Young', 'Adults', 'Senior'] df['category']=pd.cut(x=df['age'], bins=[0,14,24,64,100],labels=category)
If you pass an integer number of bins to cut instead of explicit bin edges, it will compute equal-length bins based on the minimum and maximum values in the data. Consider the case of some uniformly distributed data chopped into three.
data = [0,10,20,30,40,50,60,70,80,90,100] pd.cut(data, 4,precision=0)
pd.value_counts(bins) are the bin counts for the result of