Before training machine learning models on data, it’s common practice to normalize the data first to potentially get better, faster results. Normalization also makes the training process less sensitive to the scale of the features, resulting in better coefficients after training.

Neural networks usually work with floating-point tensors as their input. Neural networks exhibit the best training performance when the input data range from 0 to 1, this is an effect of how their building blocks are defined.

The typical thing to do is cast a tensor to a floating point and normalize the values of the pixels. Casting to floating-point is easy, but normalization is trickier, as it depends on what range of the input we decide should lie between 0 and 1 or -1 and 1. 

In this article, you’ll try out normalizing data in PyTorch using scikit-learn. When you normalize data, you change the scale of the data. Data is commonly rescaled to fall between 0 and 1 because machine learning algorithms tend to perform better or converge faster when the different features are on a smaller scale.

You can normalize data between the 0 and 1 range by using the formula:

 (data – np.min(data)) / (np.max(data) – np.min(data)).

You can use the below code snippet to normalize data between the 0 and 1 ranges. The below code snippet uses the tensor array to store the values and a user-defined function is created to normalize the data by using the minimum value and maximum value in the array.

import torch
X = [ [110,200,354,123,134],
      [231,132,123,120,800], 
      [436,123,543,310,400], 
      [140,753,234,234,900], 
      [510,313,456,214,200], 
      [653,739,322,565,400] ,
      [310,223,453,531,880]]

x_data=torch.tensor(X)


def NormalizeTensor(data):
    return (data - torch.min(data)) / (torch.max(data) - torch.min(data))

scaled_x = NormalizeTensor(x_data)

print(scaled_x)

The output shows that all the values are in the range of 0 to 1. The minimum value in the array will always be normalized to 0 and the maximum value in the array will be normalized to 1. All the other values will be in the range between 0 and 1.

tensor([[0.0000, 0.1139, 0.3089, 0.0165, 0.0304],
        [0.1532, 0.0278, 0.0165, 0.0127, 0.8734],
        [0.4127, 0.0165, 0.5481, 0.2532, 0.3671],
        [0.0380, 0.8139, 0.1570, 0.1570, 1.0000],
        [0.5063, 0.2570, 0.4380, 0.1316, 0.1139],
        [0.6873, 0.7962, 0.2684, 0.5759, 0.3671],
        [0.2532, 0.1430, 0.4342, 0.5329, 0.9747]])

This process is callable nominalization with attributes having a rescaled range of 0 and 1. It ensures the existence of an optimization algorithm that forms the core of gradient descent -an exam of the learning algorithm.

Using SKLearn MinMaxScaler

There are multiple libraries available to perform the normalization. One such library is Sklearn. It has a scaler object known as MinMaxScaler which will normalize the dataset using the minimum and maximum values of the dataset.

Use the below snippet to normalize the data using the Sklearn MinMaxScaler in Python.

from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler(feature_range=(0,1))
scaler.fit_transform(x_data.numpy())


When you’re scaling the training data, you need to scale the test data also on the same scale. Because training data will have different minimum and maximum values and test data will have different minimum and maximum values. However, the test data also must be scaled with the minimum and maximum value of the Train dataset for proper scaling.

In working with images, it is good practice to compute the mean and standard deviation on all the training data in advance and then subtract and divide by these fixed, precomputed quantities.

Related Post

Normalize Image Dataset in PyTorch using transforms.Normalize()

Get Normal/Uniform distribution in range[r1,r2] in PyTorch

Concatenates PyTorch tensors using Stack and Cat with Dimension

Scaling Pandas DataFrame with MinMaxScaler

How to Scale data into the 0-1 range using Min-Max Normalization.

How to Normalize(Scale, Standardize) Pandas DataFrame columns using Scikit-Learn?

How to normalize, mean subtraction, standard deviation, and zero center image dataset in Python?