It’s good practice to normalize the dataset so that each channel has zero mean and unitary standard deviation, keeping the data in the same range means it’s more likely that neurons have nonzero gradients.

In working with images, it is good practice to compute the mean and standard deviation on all the training data in advance and then subtract and divide by these fixed, precomputed quantities.

In this tutorial, we compute the mean and standard deviation of all input image datasets, so that the output has zero mean and unit standard deviation across each channel:

```import numpy as np
import cv2

from pathlib import Path

imageFilesDir = Path(r'/content/hymenoptera_data')
files = list(imageFilesDir.rglob('*.jpg'))

len(files)

mean = np.array([0.,0.,0.])
stdTemp = np.array([0.,0.,0.])
std = np.array([0.,0.,0.])

numSamples = len(files)

for i in range(numSamples):
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im = im.astype(float) / 255.

for j in range(3):
mean[j] += np.mean(im[:,:,j])

mean = (mean/numSamples)

print(mean) #0.51775225 0.47745317 0.35173384]
```

Since the std can’t be calculated by simply finding it for each image and averaging as the mean can be, to get the std we first calculate the overall mean in a first run then run it again to get the std.

```for i in range(numSamples):
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im = im.astype(float) / 255.
for j in range(3):
stdTemp[j] += ((im[:,:,j] - mean[j])**2).sum()/(im.shape*im.shape)

std = np.sqrt(stdTemp/numSamples)

print(std) #[0.28075549 0.25811162 0.28913701]
```

We can normalize the data by subtracting the mean and dividing it by the standard deviation, which helps with the learning process.

Normalizing each channel so that it has the same distribution will ensure that channel information can be mixed and updated through gradient descent using the same learning rate.

The normalization absolutely helps get the network trained, but you could make an argument that it’s not strictly needed to optimize the parameters for this particular problem.