I won’t go into too much detail about their background and how they work. I am using TensorFlow as a Machine Learning framework. In case you are not familiar with TensorFlow, make sure to check out my recent post getting started with TensorFlow.


The Kaggle Dog vs Cat dataset consists of 25,000 color images of dogs and cats that we use for training. Each image is a different size of pixel intensities, represented as [0, 255] integer values in RGB color space.


You need to convert the data to native TFRecord format. Google provide a single script for converting Image data to TFRecord format.
When the script finishes you will find 2 shards for the training and validation files in the DATA_DIR. The files will match the patterns train-?????-of-00002 and validation-?????-of-00002, respectively.

Convolution neural network Architecture

We use three types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer. We will stack these layers to form a full ConvNet architecture.

Building the CNN for Image Classifier

You’re inputting an image which is 252x252x3 it’s an RGB image and trying to recognize either Dog or Cat. Let’s build a neural network to do this.
What’s gonna use in this post is inspired and similar to one of the classic neural networks called LeNet-5.
convolution neural network architecture
252x252x3 input image that is the first layer uses a 32,5x5 filter stride of 1 and same padding. Next, apply max pooling of parameter, filter 2x2 and strides=2.This should reduce the height and width of the representation by a factor of 2. so 252x252x32 now become 126x126x32.The number of channels remains the same. we are going to call this max pooling 1.
Next given 126x126x32 volume and apply another convolution layer to it.Use a filter size this 5×5 and stride 1 and 64 filters this time. So now you end up with a 126x126x64 volume called conv2. Then in this network do max pooling with a Filter:2×2 and Strides:2 and the 126X126X64 this will the half the height and width(63X63X64).

Dense Layer

Next, we want to add a dense layer (with 1,024 neurons and ReLU activation) to our CNN to perform classification on the features extracted by the convolution/pooling layers.
Before we connect the layer, we’ll flatten our feature map (max pooling 2) to shape [batch_size, features], so that our tensor has only two dimensions:
63x63x64=254016 so let’s now fatten output to a 254016x1 dimensional vector we also think of this a flattened result into just a set of neurons.

Logits Layer

You have 1024 real numbers that you can feed to a softmax unit. If you’re trying to do classifying images like either dog or cat then this would be a softmax with 2 outputs so this is a reasonably typical example of what a convolutional network looks like.

"""Model function for CNN."""
def cnn_model_fn(features, labels, mode):
    # Input Layer
    input_layer = tf.reshape(features["image"], [-1, _DEFAULT_IMAGE_SIZE, _DEFAULT_IMAGE_SIZE, 3])
    # Convolutional Layer #1
    conv1 = tf.layers.conv2d(
        kernel_size=[5, 5],
    # Pooling Layer #1
    pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
    # Convolutional Layer #2 and Pooling Layer #2
    conv2 = tf.layers.conv2d(
        kernel_size=[5, 5],
    pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
    # Dense Layer
    pool2_flat = tf.reshape(pool2, [-1, 126 * 126 * 64])
    dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
    dropout = tf.layers.dropout(
        inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)
    # Logits Layer
    logits = tf.layers.dense(inputs=dropout, units=2)

Generate Predictions

The logits layer of our model returns our predictions as raw values in a [batch_size, 2]-dimensional tensor. Let’s convert these raw values into two different formats that our model function can return:

  • The predicted class for each example: Dog or Cat

Our predicted class is the element in the corresponding row of the logits tensor with the highest raw value. We can find the index of this element using the
tf.argmax function:

The input argument specifies the tensor from which to extract maximum values—here logits. The axisargument specifies the axis of the input tensor along which to find the greatest value. Here, we want to find the largest value along the dimension with index of 1, which corresponds to our predictions (recall that our logits tensor has shape [batch_size, 2]).

We can derive probabilities from our logits layer by applying softmax activation using tf.nn.softmax:

def cnn_model_fn(features, labels, mode):
    predictions = {
        # Generate predictions (for PREDICT and EVAL mode)
        "classes": tf.argmax(input=logits, axis=1),
        # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
        # `logging_hook`.
        "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

Calculate Loss

That measures how closely the model’s predictions match the target classes. For classification problems, cross entropy is typically used as the loss metric. The following code calculates cross entropy when the model runs in either TRAIN or EVAL mode:

#Calculate Loss (for both TRAIN and EVAL modes)
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=2)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)

Training Operation

we defined loss for the model as the softmax cross-entropy of the logits layer and our labels. Let’s configure our model to optimize this loss value during training. We’ll use a learning rate of 0.001 and stochastic gradient descent as the optimization algorithm:

 if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

Add evaluation metrics

Define eval_metric_ops dict in EVAL mode as follows:

    eval_metric_ops = {
        "accuracy": tf.metrics.accuracy(
            labels=labels, predictions=predictions["classes"])}
    return tf.estimator.EstimatorSpec(
        mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

Load Training and Test Data

Convert whatever data you have into a TFRecordes supported format.This approach makes it easier to mix and match data sets. The recommended format for TensorFlow is an TFRecords file containing tf.train.Example protocol buffers  which contain Features as a field.
To read a file of TFRecords, use tf.TFRecordReader with the tf.parse_single_example decoder. The parse_single_example op decodes the example protocol buffers into tensors.

def get_file_lists(data_dir):
    import glob
    train_list = glob.glob(data_dir + '/' + 'train-*')
    valid_list = glob.glob(data_dir + '/' + 'validation-*')
    if len(train_list) == 0 and \
                    len(valid_list) == 0:
        raise IOError('No files found at specified path!')
    return train_list, valid_list
def parse_record(raw_record, is_training):
    """Parse an ImageNet record from `value`."""
    keys_to_features = {
            tf.FixedLenFeature((), tf.string, default_value=''),
            tf.FixedLenFeature((), tf.string, default_value='jpeg'),
            tf.FixedLenFeature([], dtype=tf.int64, default_value=-1),
            tf.FixedLenFeature([], dtype=tf.string, default_value=''),
    parsed = tf.parse_single_example(raw_record, keys_to_features)
    image = tf.image.decode_image(
        tf.reshape(parsed['image/encoded'], shape=[]),
    # Note that tf.image.convert_image_dtype scales the image data to [0, 1).
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    image = vgg_preprocessing.preprocess_image(
    label = tf.cast(
        tf.reshape(parsed['image/class/label'], shape=[]),
    return {"image": image}, label

Train a model with a different image size.

The simplest solution is to artificially resize your images to 252×252 pixels. See Images section for many resizing, cropping and padding methods. Note that the entire model architecture is predicated on a 252x252 image, thus if you wish to change the input image size, then you may need to redesign the entire model architecture.

Fused decode and crop

If inputs are JPEG images that also require cropping, use fused tf.image.decode_and_crop_jpeg to speed up preprocessing. tf.image.decode_and_crop_jpeg only decodes the part of the image within the crop window. This significantly speeds up the process if the crop window is much smaller than the full image. For image data, this approach could speed up the input pipeline by up to 30%.

Create input functions

You must create input functions to supply data for training, evaluating, and prediction

def input_fn(is_training, filenames, batch_size, num_epochs=1, num_parallel_calls=1):
    dataset = tf.data.TFRecordDataset(filenames)
    if is_training:
        dataset = dataset.shuffle(buffer_size=1500)
    dataset = dataset.map(lambda value: parse_record(value, is_training),
    dataset = dataset.shuffle(buffer_size=10000)
    dataset = dataset.batch(batch_size)
    dataset = dataset.repeat(num_epochs)
    iterator = dataset.make_one_shot_iterator()
    features, labels = iterator.get_next()
    return features, labels
def train_input_fn(file_path):
    return input_fn(True, file_path, 100, None, 10)
def validation_input_fn(file_path):
    return input_fn(False, file_path, 50, 1, 1)

The Dataset API can handle a lot of common cases for you. Using the Dataset API, you can easily read in records from a large collection of files in parallel and join them into a single stream.

Create the Estimator

Next, let’s create an Estimator a TensorFlow class for performing high-level model training, evaluation, and inference for our model. Add the following code to main():

classifier = tf.estimator.Estimator(model_fn=cnn_model_fn, model_dir="/tmp/convnet_model")

The model_fn argument specifies the model function to use for training, evaluation, and prediction; we pass it the cnn_model_fn that we have created.The model_dir argument specifies the directory where model data (checkpoints) will be saved (here, we specify the temp directory /tmp/convnet_model, but feel free to change to another directory of your choice).

Set Up a Logging Hook

CNN can take time to train, let’s set up some logging so we can track progress during training. We can use TensorFlow’s tf.train.SessionRunHook to create a tf.train.LoggingTensorHook that will log the probability values from the softmax layer of our CNN. Add the following to main().

  # Set up logging for predictions
  tensors_to_log = {"probabilities": "softmax_tensor"}
  logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=50)

We store a dict of the tensors we want to log in tensors_to_log. Each key is a label of our choice that will be printed in the log output, and the corresponding label is the name of a Tensor in the TensorFlow graph. Here, our probabilities can be found in softmax_tensor, the name we gave our softmax operation earlier when we generated the probabilities in cnn_model_fn.
Next, we create the LoggingTensorHook, passing tensors_to_log to the tensors argument. We set every_n_iter=50, which specifies that probabilities should be logged after every 50 steps of training.

Train the Model

Now we’re ready to train our model, which we can do by creating train_input_fn ans calling train() on mnist_classifier. Add the following to main()

classifier.train(input_fn=lambda: train_input_fn(train_list), steps=10, hooks=[logging_hook])

Evaluate the Model

Once training is complete, we want to evaluate our model to determine its accuracy on the test set. We call the evaluate method, which evaluates the metrics we specified in eval_metric_ops argument in the cnn_model_fn. Add the following to main()

evalution = classifier.evaluate(input_fn=lambda: validation_input_fn(valid_list))

Run the Model

We’ve coded the CNN model function, Estimator, and the training/evaluation logic; now run the python script.
Training CNN is quite computationally intensive. Estimated completion time of python script will vary depending on your processor.To train more quickly, you can decrease the number of steps passed to train(), but note that this will affect accuracy.

Download this project from GitHub

Related Post

Convolutional Neural Network with Batch Normalization
Convert a directory of images to TFRecords
Deep learning model for Car Price prediction using TensorFlow
Importance of Batch Normalization in TensorFlow