Category Archives: Pandas
Find Correlation between features and target using the correlation matrix.
You can evaluate the relationship between each feature and target using a correlation and selecting those features that have the strongest relationship with the target variable. Such as methods that remove redundant variables using correlation.
Plot two overlay Histograms on single chart with Pandas and Matplotlib.
We’ll look at a real-world example with data that I’ll load from a CSV file and also we’re going to learn how to draw overlapping histograms using Pandas and Matplotlib. Before we go and set up how we create the chart let’s look at what the end results would be.
How to Scale Data into the 0-1 range using Min-Max Normalization.
min-max scaling, is the simplest and consists method in rescaling. The range of features to scale in [0, 1] or [−1, 1]. The impact is that we end up with smaller standard deviations, which can suppress the effect of outliers.
Detect and Remove Outliers from Pandas DataFrame
Z-score re-scale and center(Normalize) the data and look for data points which are too far from zero(center). Data points far from zero will be treated as the outliers. In most of the cases, a threshold of 3 or -3 is used i.e if the Z-score value is greater than or less than 3 or -3 respectively, that data point will be identified as outliers.