A pre-trained model is a saved model that was previously trained on a large dataset. You can use the pre-trained model as it is or use transfer learning to customize this model to a specific task.

In the most pre-trained models first few layers learn very simple and generic features that generalize to almost all types of images and the last few-layer is more specialized. As you go down, the features are increasingly more specific to the dataset on which the model was trained. 

If the model is trained on a large dataset, this model will serve as a generic model. You can use this model to learn feature maps without starting from scratch.

In this tutorial, you will learn how to modify a pre-trained model in two ways: Feature Extraction and Finefinetuning.

Feature Extraction: The final, classification layer of the pre-trained model is specific to the original classification task, and subsequently specific to the set of classes on which the model was trained. You simply add a new classifier layer, which will be trained from scratch, on top of the pre-trained model so that you can repurpose the feature maps learned previously for the dataset. 

You do not need to retrain the entire model. The base convolutional network already contains features that are generically useful for classifying images. 

In this step, you will freeze the convolutional base created from the previous step and use it as a feature extractor. Additionally, you add a classifier on top of it and train the top-level classifier.

Fine-Tuning: One way to increase performance is to fine-tune the weights of the top layers of the pre-trained model alongside the training of the classifier you added. The training process will force the weights to be tuned from generic feature maps to features associated specifically with the dataset.

Unfreeze a few of the top layers of a frozen model base and jointly train both the newly-added classifier layers and the last layers of the base model. This allows us to “fine-tune” the higher-order feature representations in the base model in order to make them more relevant for the specific task.

The goal of fine-tuning is; to adapt these specialized features to work with the new dataset, rather than overwrite the generic learning.

Compose the model

Load the pre-trained base model and pre-trained weights.

model = models.resnet18(pretrained=True)

We create the base model from the resnet18 model. This is pre-trained on the ImageNet dataset, a large dataset consisting of 1.4M images and 1000 classes. ImageNet is a research training dataset with a wide variety of categories. This base of knowledge will help us classify cats and dogs from our specific dataset.

Modify Final Layer

Here, we use the resnet18 for my own dataset, which has 10 classes. So I want to change the output of the last FC layer to 10. What should I do to change the last FC layer?

Since all the models have been pre-trained on Imagenet, they all have output layers of size 1000, one node for each class. The goal here is to reshape the last layer to have the same number of outputs as the number of classes in the dataset. 

num_classes = 10
num_ftrs = model_ft.fc.in_features
model.fc = nn.Linear(num_ftrs, num_classes)

The final layer of a CNN model, which is often an FC layer, has the same number of nodes as the number of output classes in the dataset. 

Since each model architecture is different, there is no boilerplate finetuning code that will work in all scenarios. Rather, you must look at the existing architecture and make custom adjustments for each model. This is not unique to each model

Freezing and Unfreezing 

In Feature Extraction, you will only train the last layer of the pre-train model. The weights of the pre-trained network were not updated during training. Freezing by setting requires_grad = False prevents the weights in a given layer from being updated during training.

for param in model.parameters():
    param.requires_grad = False

When we load a pre-trained model all of the parameters have requires_grad=True, which is fine if we are training from scratch or fine-tuning. Sets the requires_grad attribute of the parameters in the model to False when we are feature extracting. If we are feature extracting and only want to compute gradients for the newly initialized layer then we want all of the other parameters not to require gradients.

Freeze them, so as to avoid destroying any of the information they contain during future training rounds. The newly constructed layer has requires_grad=True by default. You don’t need to do it manually.

Don’t forget to retrain the last layer though. At the moment the weights of it have just random numbers, so you must retrain them. You can do the retraining the same way as the normal model, the only change is that now the weights of the other layers won’t change because I set the requires_grad to False. If you have enough data to do a full training, then simply change requires_grad to True but it will surely take much longer to train.

Related Post