Once you train the deep learning model in PyTorch, you can use it to make predictions on new data instances. In this tutorial, you will discover exactly how you can make a convolutional neural network and predictions with a finalized model with the PyTorch Python library. After completing this tutorial, you will know:

  • How to finalize a model in order to make it ready for making predictions.
  • How to make class and probability predictions for classification problems in PyTorch.

Before you can make predictions, you must train a final model. You may have trained models using your data.

Load Data

In this tutorial, we will practice the dog breed identification problem on Kaggle. In this competition, 120 different breeds of dogs will be recognized. In fact, the dataset for this competition is a subset of the ImageNet dataset.ImageNet datasets are both higher and wider in varying dimensions.

Kaggle Dog Dataset
batch_size = 64

train_df,test_df=train_test_split(label_df, test_size=0.1, random_state=0)

train_df.shape,test_df.shape

# Create dataloaders form datasets
train_set = DogDataset(train_df, transform=train_transformer)
val_set = DogDataset(test_df, transform=val_transformer)

train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_set , batch_size=batch_size, shuffle=True)

dataset_sizes=len(train_set)

The dataset is divided into a training set and a test set, which contain 9199 and 1023 JPEG images of three RGB (color) channels, respectively. Among the training dataset, there are 120 breeds of dogs such as Labradors, Poodles, Dachshunds, Samoyeds, Huskies, Chihuahuas, and Yorkshire Terriers.

Fine-tuning Pretrained Model

The dataset for this competition is a subset of the ImageNet dataset. Therefore, we can use a pre-trained model on the full ImageNet dataset and use it to extract image features to be fed into a custom small-scale output network. 

# Use resnet-50 as a base model
class Model(torch.nn.Module):
    def __init__(self, base_model, base_out_features, num_classes):
        super(Model,self).__init__()
        self.base_model=base_model
        self.linear1 = torch.nn.Linear(base_out_features, 512)
        self.output = torch.nn.Linear(512,num_classes)
    def forward(self,x):
        x = F.relu(self.base_model(x))
        x = F.relu(self.linear1(x))
        x = self.output(x)
        return x

resNet = torchvision.models.resnet50(pretrained=True)

High-level APIs of deep learning frameworks provide a wide range of models pre-trained on the ImageNet dataset. Here, we choose a ResNet-50 model, where we simply reuse the input of this modelʼs output layer (i.e., the extracted features). Then we can replace the original output layer with a small custom output network that can be trained, such as stacking two fully-connected layers. 

for param in resNet.parameters():
    param.requires_grad=False

model = Model(base_model=resNet, base_out_features=resNet.fc.out_features, num_classes=120)
model = model.to(device)

This reduces training time and memory for storing gradients. Recall that we standardized images using the means and standard deviations of the three RGB channels for the full ImageNet dataset. In fact, this is also consistent with the standardization operation by the pre-trained model on ImageNet.

Training the model

Now, let’s write a general function to train a model.

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):    
    since = time.time()
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        model.train()  # Set model to training mode    
        running_loss = 0.0
        running_corrects = 0

        # Iterate over data.
        for inputs, labels in train_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward
            # track history if only in train
            with torch.set_grad_enabled(True):
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)

                loss.backward()
                optimizer.step()

                # statistics
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)
            
        scheduler.step()

        epoch_loss = running_loss / dataset_sizes
        epoch_acc = running_corrects.double() / dataset_sizes

        print(f' Training Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

        # deep copy the model
        if epoch_acc > best_acc:
            best_acc = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_acc:4f}')

    # load best model weights
    model.load_state_dict(best_model_wts)
    
    return model

# Cost function and optimzier
loss_function = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam([param for param in model.parameters() if param.requires_grad], lr=0.0003)

scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

model = train_model(model, loss_function, optimizer,scheduler,num_epochs=20)

Now that training is complete, our model is ready to classify some images. Given a series of images, we will compare their actual labels (first line of text output) and the predictions from the model (second line of text output).

Save and Load Model

After finalizing, you may want to save the model to a file, e.g. via the PyTorch  API. Once saved, you can load the model at any time and use it to make predictions.

# Specify a path
PATH = "entire_model.pt"
# Save
torch.save(model, PATH)

# Load
model = torch.load(PATH)
model.eval()

Remember, that you must call model. eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.

Predict Batch of Images

After the load train model, we can predict the probability of each output class. Normally, we use the class with the highest predicted probability as the output class. The prediction is correct if it is consistent with the actual class (label).

For small numbers of inputs that fit in one batch, directly use __call__() for faster execution, e.g., model(x). If you need access to NumPy array values instead of tensors after your model call, you can use tensor.numpy() to get the NumPy array value of a tensor.

inputs, classes = next(iter(val_loader))

model = model.to(device)
inputs=inputs.to(device)

outputs=model(inputs)
_, preds = torch.max(outputs, 1)
preds=preds.cpu().numpy()
classes=classes.numpy()
print(preds)
print(classes)

At this point, we need to determine the index corresponding to the maximum score in the out tensor. We can do that using the max function in PyTorch, which outputs the maximum value in a tensor as well as the indices where that maximum value occurred.

PyTorch Predict batch of Images

Computation is done in batches. This method is designed for batch processing of large numbers of inputs. It is not intended for use inside loops that iterate over your data and process small numbers of inputs at a time.

Predict Single Image

In the following sections, we will focus on writing the inference code for the single sample image. This will involve two parts, one where we prepare the image so that it can be fed to ResNet, and next, we will write the code to get the actual prediction from the model.

Preparing the image

ResNet model requires the image to be of 3 channel RGB of size 224 x 224. We will also normalize the image tensor with the required mean and standard deviation values.

We will use transforms from torchvision library and build a transform pipeline, which transforms our images as required.

def transform_image(image_bytes):
    my_transforms = transforms.Compose([transforms.Resize(255),
                                        transforms.CenterCrop(224),
                                        transforms.ToTensor(),
                                        transforms.Normalize(
                                            [0.485, 0.456, 0.406],
                                            [0.229, 0.224, 0.225])])
    image = Image.open(io.BytesIO(image_bytes))
    return my_transforms(image).unsqueeze(0)

The above method takes image data in bytes, applies the series of transforms, and returns a tensor. 

Prediction

Now will use a pre-trained ResNet50 model to predict the image class. We will load the model and get an inference. While we’ll be using a pre-trained model in this example, you can use this same approach for your own models.

model = torch.load(PATH)
model.eval()

def get_prediction(image_bytes):
    tensor = transform_image(image_bytes=image_bytes)
    tensor=tensor.to(device)
    output = model.forward(tensor)
    
    probs = torch.nn.functional.softmax(output, dim=1)
    conf, classes = torch.max(probs, 1)
    return conf.item(), index_to_breed[classes.item()]

image_path="/content/test/06b3a4da7b96404349e51551bf611551.jpg"
image = plt.imread(image_path)
plt.imshow(image)

with open(image_path, 'rb') as f:
    image_bytes = f.read()

    conf,y_pre=get_prediction(image_bytes=image_bytes)
    print(y_pre, ' at confidence score:{0:.2f}'.format(conf))

Confidence Score

We also use torch.nn.functional.softmax to normalize our outputs to the range [0, 1]. That gives us something roughly akin to the confidence that the model has in its prediction. In this case, the model is 89% certain that it knows what it’s looking.

PyTorch Predict single Image

Related Post

Run this code in Google Colab