Generally, we want to always preprocess some standard types of preprocessing on data before training a neural network like, take your original data and to zero mean them or normalize that or normalized by the standard deviation.
Image normalization in general, standardize the inputs to your network as much as possible, so that learning is more stable by reducing variability across the training data. In terms of normalization of the data, that all features are in the same range so that they contribute equally.
In this tutorial, we propose a method to enhance image recognition performance through image normalization.
Before you start any, you will need a set of images you want to normalize. You can use an archive of creative-commons licensed flower photos from Google.
data_root = tf.keras.utils.get_file( 'flower_photos','https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz', untar=True)
This example assumes that you have the cv2 Python library installed. The example below loads the image and converts it into a NumPy array.
all_images = list(data_dir.glob('*/*')) all_images = [str(path) for path in all_images] random.shuffle(all_images) data_size=len(all_images) channels = 3 IMG_SIZE=150 dataset = np.ndarray(shape=(len(all_images), IMG_SIZE, IMG_SIZE,channels), dtype=np.float32) i = 0 for _file in all_images: image = cv2.imread(_file, 1) image=cv2.resize(image, (IMG_SIZE, IMG_SIZE)) dataset[i] = image i += 1
We do this for the entire training set, once before we start training. We don’t do this per batch, so we have a good sample, an empirical mean. If you take it per batch you should be getting the same values for the mean. It’s more efficient and easier just do this once at the beginning. You might not even have to really take it over the entire training data. You could also just sample enough training images to get a good estimate of your mean.
Normalize Image Array
Inputs with large integer values can disrupt or slow down the learning process. As such it is good practice to normalize the pixel values so that each pixel value has a value between 0 and 1.This can be achieved by dividing all pixel values by the largest pixel value(255). This is performed across all channels.
Zero centering means that you process your image so that the mean of your image lies on the zero. Mathematically this can be done by calculating the mean in your images and subtracting each image item with that mean.
The mean and standard deviation required to standardize pixel values can be calculated from the pixel values in each image only (sample-wise) or across the entire training dataset (feature-wise).
NumPy allows us to specify the dimensions over which a statistic like the mean, min, and max are calculated via the “axis” argument. In this example, we set this to (0,1,2) for the width and height dimensions, which leaves the third dimension or channels. The result is three mean, min, or max for each of the three-channel arrays.
mean = dataset.mean(axis=(0,1,2)) std = dataset.std(axis=(0,1,2)) print(mean, std)
what is the mean taken over? The mean is taking over all of your training images. So, you’ll take all of your training images and just compute the mean of all of those. The mean value will depend on the intensity distribution in the image.
We do zero-center by just of substracting a per-channel mean, instead of having an entire mean image. This is just because it turns out that it was similar enough across the whole image, it didn’t make such a big difference to subtract the mean image vs a per-channel value. This is easier to just pass around and deal with.
dataset[..., 0] -= mean dataset[..., 1] -= mean dataset[..., 2] -= mean
We compute the mean from the training image and then subtract that from each image that we’re passing through the network. Then we apply this exact same mean to the test data.
The distribution of image pixel values often follows a Normal or Gaussian distribution. There may be a benefit in transforming the distribution of image pixel values to be a standard Gaussian. It’s centering the image pixel values on zero and normalizing the values by the standard deviation. The result is a standard Gaussian of pixel values with a mean of 0.0 and a standard deviation of 1.0.
dataset[..., 0] /= std dataset[..., 1] /= std dataset[..., 2] /= std
You might also do more complicated things, like PCA or whitening but again with images, we typically just stick with the zero means, we don’t do some of these more complicated pre-processing. One reason for this (with images) we don’t really want to take all of our input pixel values and project this onto a lower-dimensional space of new kinds of features that we’re dealing with. We typically just want to apply a convolutional network spatially and have our spatial structure over the original image.