VGG experiments with the depth of the Convolutional Network for image recognition. It is increasing depth using very small ( 3 × 3) convolution filters in all layers. In this tutorial, we present the details of VGG16 network configurations and the details of image augmentation for training and evaluation.

VGG16 Architecture

VGG16 ConvNet configurations are quite different from the other ones, rather than using relatively large convolutional filters at first Conv. layers (e.g. 11×11 with stride 4, or 7×7 with stride 2) VGG use very small 3 × 3 filters throughout the whole net, which are convolved with the input at every pixel (with stride 1). For instance, a stack of three 3×3 Conv. layers instead of a single 7×7 layer. All ConvNet layers designed using the same principles.

VGG16 ConvNet

During the training, the input to our ConvNets is a fixed-size 224 x 224 x 3 RGB image. 

input_shape = [IMG_SIZE,IMG_SIZE,3]
img_input = keras.layers.Input(shape=input_shape)

# Block 1
x = keras.layers.Conv2D(64, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block1_conv1')(img_input)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dropout(0.3)(x)
x = keras.layers.Conv2D(64, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block1_conv2')(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)


# Block 2
x = keras.layers.Conv2D(128, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block2_conv1')(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dropout(0.3)(x)

x = keras.layers.Conv2D(128, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block2_conv2')(x)

x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dropout(0.3)(x)
x = keras.layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

# Block 3
x = keras.layers.Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv1')(x)
x = keras.layers.BatchNormalization()(x)                      
x = keras.layers.Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv2')(x)
x = keras.layers.BatchNormalization()(x)                      
x = keras.layers.Conv2D(256, (1, 1),
                      activation='relu',
                      padding='same',
                      name='block3_conv3')(x)
x = keras.layers.BatchNormalization()(x)  
x = keras.layers.Dropout(0.3)(x)

x = keras.layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

# Block 4
x = keras.layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv1')(x)
x = keras.layers.BatchNormalization()(x)                      
x = keras.layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv2')(x)
x = keras.layers.BatchNormalization()(x)                      
x = keras.layers.Conv2D(512, (1, 1),
                      activation='relu',
                      padding='same',
                      name='block4_conv3')(x)
x = keras.layers.BatchNormalization()(x)   
x = keras.layers.Dropout(0.3)(x)

x = keras.layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)


# Block 5
x = keras.layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv1')(x)
x = keras.layers.BatchNormalization()(x)                      
x = keras.layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv2')(x)
x = keras.layers.BatchNormalization()(x)                      
x = keras.layers.Conv2D(512, (1, 1),
                      activation='relu',
                      padding='same',
                      name='block5_conv3')(x)
x = keras.layers.BatchNormalization()(x) 
x = keras.layers.Dropout(0.3)(x)

x = keras.layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)


x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.5)(x)
# Classification block
x = keras.layers.Flatten(name='flatten')(x)
x = keras.layers.Dense(4096, activation='relu', name='fc1')(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(4096, activation='relu', name='fc2')(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(len(CLASS_NAMES), activation='softmax', name='predictions')(x)


model = keras.models.Model(img_input, x, name='vgg16')

Convolutional Layer

The number of channels is small, starting from 64 in the first layer and then increasing by a factor of 2 after each max-pooling layer until it reaches 512.

The image is passed through a stack of convolutional layers, where VGG uses 3×3 filters which are the smallest size to capture the notion of left/right, up/down, and center.

The convolution stride is fixed to 1 pixel.

The padding of Conv. layer input is the same” that preserved the spatial resolution after convolution.

Pooling Layer

Spatial pooling is carried out by five max-pooling layers, which follow some of the Conv. layers (not all the conv. layers are followed by max-pooling). max-pooling is performed over a 2 × 2-pixel window, with stride 2.

Fully Connected Layer

A stack of convolutional layers is followed by three Fully-Connected (FC) layers. The first two have 4096 channels each. The final layer is the soft-max layer. Dropout regularisation for the first two fully-connected layers is set to 0.5.

Prepare Dataset

In this tutorial, we use the Fruits 360 dataset. It containing fruits and vegetables. This dataset is available for download from GitHub.

training_path=pathlib.Path("fruits-360_dataset/fruits-360/Training/")
testing_path=pathlib.Path("fruits-360_dataset/fruits-360/Test/")

CLASS_NAMES = np.array([item.name for item in testing_path.glob('*')])
CLASS_NAMES

train_images=glob.glob(str(training_path/'*/*.jpg'))
test_images=glob.glob(str(testing_path/'*/*.jpg'))

Image Augmentation

To obtain the fixed-size 224×224 ConvNet input images, they were randomly cropped from rescaled training images. To further augment the training set, the crops underwent random horizontal flipping. Training image rescaling is explained below.

VGG_MEAN = [123.68, 116.78, 103.94]

def train_augment(image, label):
  crop_image = tf.image.random_crop(image, [IMG_SIZE, IMG_SIZE, 3])  
  
  flip_image = tf.image.random_flip_left_right(crop_image)

  means = tf.reshape(tf.constant(VGG_MEAN), [1, 1, 3])

  centered_image = flip_image - means

  return flip_image, tf.cast(label,tf.float32)

We also subtracting the mean RGB value from each pixel.

def validation_augment(image, label):
  crop_image = tf.image.resize_with_crop_or_pad(image, IMG_SIZE, IMG_SIZE)  
  
  means = tf.reshape(tf.constant(VGG_MEAN), [1, 1, 3])

  centered_image = crop_image - means

  return crop_image, tf.cast(label,tf.float32)

Since the fully-convolutional network is applied over the whole image, there is no need to sample multiple crops at test time, which is less efficient as it requires network re-computation for each crop.

def parse_image(filename):
  parts = tf.strings.split(filename, '/')
  
  label =parts[-2] == CLASS_NAMES

  image = tf.io.read_file(filename)
  
  image = tf.image.decode_jpeg(image,channels=3)

  image = tf.image.convert_image_dtype(image, tf.float32)

  smallest_side = 125.0
  height, width = tf.shape(image)[0], tf.shape(image)[1]
  height = tf.cast(height,tf.float32)
  width = tf.cast(width,tf.float32)
  scale = tf.cond(tf.greater(height, width),
                            lambda: smallest_side / width,
                            lambda: smallest_side / height)
  
  new_height = tf.cast(height * scale,tf.int32)
  new_width = tf.cast(width * scale,tf.int32)


  image = tf.image.resize(image, [new_height, new_width])
  return image, label


def create_dataset(file_list,is_training=False,shuffle_buffer_size=1000):
  
  ds=tf.data.Dataset.from_tensor_slices(file_list)
  
  ds=ds.shuffle(buffer_size=len(file_list))

  ds=ds.map(parse_image,num_parallel_calls=AUTOTUNE)

  if is_training:
    ds=ds.map(train_augment,num_parallel_calls=AUTOTUNE)
  else:
    ds=ds.map(validation_augment,num_parallel_calls=AUTOTUNE)    

  ds=ds.repeat()

  ds=ds.batch(BATCH_SIZE)

  ds = ds.prefetch(buffer_size=AUTOTUNE)
  
  return ds

Compile Model

You must compile the model before training it. Since there are 120 classes, use a categorical_crossentropy loss.

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

Train Model

Train the model by passing the Dataset object to the model’s fit function. Set the number of epochs.

train_ds=create_dataset(train_images,True)
test_ds=create_dataset(test_images)

history = model.fit(train_ds,
                    epochs=10, 
                    steps_per_epoch=steps_per_epoch,
                    validation_steps=validation_steps,
                    validation_data=test_ds)

Evaluate Model

And let’s see how the model performs. Two values will be returned. Loss (a number that represents our error, lower values are better), and accuracy.

loss,accuracy = model.evaluate(test_ds, steps = validation_steps)

print("loss: {:.2f}".format(loss))
print("accuracy: {:.2f}".format(accuracy0))

Related Post

How to fine-tune the pre-trained VGG model in TensorFlow Keras

How to use a saved Keras model to Predict Text from scratch

How to Predict Images Using Trained Keras Model

Run this code in Google Colab

References

Very Deep Convolutional Networks for Large-Scale Image Recognition