In this tutorial, we’re going to look at the difference between micro and macro averages and calculate, macro and micro for Precision and Recall. First, start with the micro average.
You can think micro meaning small, to understand small things you need to look through a microscope. Micro average is the study of the individual class, for example, Company’s micro average examines the prices of individual goods and the market structures.
Macro average deals with aggregates or totals, macro is the study as a whole. Here, we examine economy-wide phenomena such as the unemployment rate economic growth price levels, and gross domestic product or GDP.
Some quick examples suppose that during a recession Jonson loses his job if we want to see how this decrease in income affects his consumption this topic would be studied in micro. We’re looking at the actions of one individual.
On the other hand, suppose that during that same recession unemployment rate increased to 11 percent this overall unemployment rate and its effects on aggregate demand would be studied in macro.
The difference between macro and micro averaging is that macro weighs each class equally whereas micro weighs each sample equally. If you have an equal number of samples for each class, then macro and micro will result in the same score.
A macro-average will compute the metric independently for each class and then take the average hence treating all classes equally, whereas a micro-average will aggregate the contributions of all classes to compute the average metric.
Let’s imagine you have a multi-class classification system with three classes and the following numbers, The classes are imbalanced:
Precision Micro Averages
Micro Average Precision is the sum of all true positives and divides by the sum of all true positives plus the sum of all false positives. So basically you divide the number of correctly identified predictions by the total number of predictions. A micro-average will compute:
Precision Macro Averages
You can see easily that PrA=.71, PrB=.1, whereas PrC=.57. A macro-average will then compute:
These are quite different values for precision. Intuitively, in the macro-average, the “good” precision (0.6) of classes A and C is contributing to maintaining a “decent” overall precision (0.4). While this is technically true (across classes, the average precision is 0.4), it is a bit misleading, since a large number of examples are not properly classified. These examples predominantly correspond to class B, so they only contribute 1/4 towards the average in spite of constituting 90% of your data.
The micro-average will adequately capture this class imbalance, and bring the overall precision average down to 0.22 (more in line with the precision of the dominating class B (0.1)).
Recall Micro Averages
Now, the average recall using the Micro-average method is:
Here we take False Negative instead of False Positive.
Recall Macro Averages
The method is straightforward. Just take the average of the recall of the different sets. For example, the macro-average recall of the given example is:
F1 macro and macro Averages
The Micro-macro average of F-Score will be simply the harmonic mean. For example, In binary classification, we get an F1-score of 0.7 for class 1 and 0.5 for class 2. Using macro averaging, we’d simply average those two scores to get an overall score for your classifier of 0.6, this would be the same no matter how the samples are distributed between the two classes.
If you were using micro averaging, then it would matter what the distribution was. For example, class 1 made up 80% of your data, the formula would then be 0.7x80% + 0.5x20 which would equal 0.66 since each sample is weighed equally and as a result, the score is representative of the data imbalance. If class 1 made up 50% of your data, the formula would shift to 0.7x50% + 0.5x50% which would be 0.6, the same as the result from macro averaging.
If your data was perfectly balanced, then macro and micro averaging will both result in the same score.
Micro-averaged precision and micro-averaged recall are both equal to the accuracy when each data point is assigned to exactly one class. Micro-averaged metrics are different from the overall accuracy when the classifications are multi-labeled or when some classes are excluded in the multi-class case.
With the large class performing better than the small ones, you would expect to see the micro average being higher than the macro average.
“Is micro-average is preferable if the class is imbalanced”. It depends on what’s the objective. If you care about overall data not prefer any class, ‘micro’ is just fine. However, let’s say, class A is rare, but it’s way important, ‘macro’ should be a better choice because it treats each class equally. ‘micro’ is better if we care more about the accuracy overall. ‘micro’ is closer to ‘accuracy’, while ‘macro’ is a bit different when it’s not dominated by prevalent class.
In a multi-class classification setup, micro-average is preferable if you suspect there might be a class imbalance.
- Split Imbalanced dataset using sklearn Stratified train_test_split().
- Calculate F1 Macro in Keras
- Calculate and Plot AUC ROC Curve for Multi-Class Classification
- Calculate Precision, Recall and F1 score for Keras model
- How to get the ROC curve and AUC for Keras model?
- PyTorch Confusion Matrix for multi-class image classification
- TensorFlow Keras Confusion Matrix in TensorBoard