Quantitative data is the measurement of something—​whether loan amount, monthly income, monthly sales, or student scores. The natural way to represent these quantities is numerically (e.g., 29 students, $500 income).

In this tutorial, we will cover strategies for transforming raw numerical data into features purpose-built for machine learning algorithms.

Machine learning algorithms perform better when applied to features that have been rescaled or standardized. If the features are not rescaled, it can take a longer time to converge and reach a solution. 

If the data is not scaled, then the algorithms will be dominated by the variables with large values and ignore the variables with small values.

For example, with the personal loan data, the variables have widely different units and magnitude. Some variables have relatively small values (e.g., dependents), while others have very large values (e.g., loan amount).

Scaling transformation avoids the problem of having some variables influence the algorithm (they may trick it into thinking they are important because they have big values) and it makes the computations exact, smooth, and fast.

DataSet

In this tutorial, we will use the Loan dataset from Kaggle. It has 615 rows and 13 columns.

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

sns.set_theme(palette="rainbow", style="darkgrid")

df=pd.read_csv("/content/train_data.csv")
df.head(5)
Scale DataFrame

Scikit-learn’s MinMaxScaler

Rescaling is a common preprocessing task in machine learning. Many of the algorithms will assume all features are on the same scale, typically 0 to 1 or –1 to 1.

A scaling algorithm applies to mixed numeric and categorical data to bring all variables to a 0–1 range. It squashes or expands data, usually to bring multiple variables to the same scale.

There are several rescaling techniques, but one of the simplest is called min-max scaling. Min-max scaling uses the minimum and maximum values of a feature to rescale values to within a range. Specifically, min-max calculates: 

MinMax Scaler

where is x the feature vector, xi is an individual element of the feature x, and xi' is the rescaled element. In our example, we can see from the outputted DataFrame that has been successfully rescaled to between 0 and 1:

# importing sklearn StandardScaler class which is for Standardization
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler().set_output(transform='pandas')

df[['ApplicantIncome','CoapplicantIncome','LoanAmount','Loan_Amount_Term']]=scaler.fit_transform(    df[['ApplicantIncome','CoapplicantIncome','LoanAmount','Loan_Amount_Term']])
Scale Pandas DataFrame Using MinMaxScaler

scikit-learn’s MinMaxScaler offers two options to rescale a feature. One option is to use fit to calculate the minimum and maximum values of the feature, and then use transform to rescale the feature. 

The second option is to use fit_transform to do both operations at once. There is no mathematical difference between the two options, but there is sometimes a practical benefit to keeping the operations separate because it allows us to apply the same transformation to different sets of data. 

# plotting the scatterplot of before and after MinMax Scaling
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.title("Scatterplot Before Min Max Scaling", fontsize=16)
sns.scatterplot(data = df1, color="blue")
plt.subplot(1,2,2)
plt.title("Scatterplot After Min Max Scaling", fontsize=16)
sns.scatterplot(data = df, color="red")
plt.tight_layout()
plt.show()
Pandas DataFrame Scaling Plot

Related Post

Normalize PyTorch batch of tensors between 0 and 1 using scikit-learn MinMaxScaler

Standardize Pandas DataFrame Using StandardScaler

How to Scale Data into the 0-1 range using Min-Max Normalization?

Normalize, Scale, and Standardize Pandas DataFrame columns using Scikit-Learn

Get Normal/Uniform distribution in range[r1,r2] in PyTorch

How to Normalize Categorical Variables?

One Hot Encoding Pandas List Type Column Values.

Difference between LabelEncoder and OrdinalEncoder to Encode Categorical Values

Encoding Ordinal Categorical Features using OrdinalEncoder