I won’t go into too much detail about their background and how they work. I am using TensorFlow as a Machine Learning framework. In case you are not familiar with TensorFlow, make sure to check out my recent post getting started with TensorFlow.
Dataset
The Kaggle Dog vs Cat dataset consists of 25,000 color images of dogs and cats that we use for training. Each image is a different size of pixel intensities, represented as [0, 255] integer values in RGB color space.
TFRecords
You need to convert the data to native TFRecord format. Google provide a single script for converting Image data to TFRecord format.
When the script finishes you will find 2 shards for the training and validation files in the DATA_DIR
. The files will match the patterns train-?????-of-00002
and validation-?????-of-00002
, respectively.
Convolution neural network Architecture
We use three types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer. We will stack these layers to form a full ConvNet architecture.
Building the CNN for Image Classifier
You’re inputting an image which is 252x252x3
it’s an RGB image and trying to recognize either Dog or Cat. Let’s build a neural network to do this.
What’s gonna use in this post is inspired and similar to one of the classic neural networks called LeNet-5.
252x252x3
input image that is the first layer uses a 32,5x5
filter stride of 1 and same padding. Next, apply max pooling of parameter, filter 2x2
and strides=2.This should reduce the height and width of the representation by a factor of 2. so 252x252x32
now become 126x126x32
.The number of channels remains the same. we are going to call this max pooling 1.
Next given 126x126x32
volume and apply another convolution layer to it.Use a filter size this 5×5 and stride 1 and 64 filters this time. So now you end up with a 126x126x64
volume called conv2. Then in this network do max pooling with a Filter:2×2 and Strides:2 and the 126X126X64
this will the half the height and width(63X63X64).
Dense Layer
Next, we want to add a dense layer (with 1,024 neurons and ReLU activation) to our CNN to perform classification on the features extracted by the convolution/pooling layers.
Before we connect the layer, we’ll flatten our feature map (max pooling 2) to shape [batch_size, features], so that our tensor has only two dimensions:
63x63x64=254016
so let’s now fatten output to a 254016x1
dimensional vector we also think of this a flattened result into just a set of neurons.
Logits Layer
You have 1024 real numbers that you can feed to a softmax unit. If you’re trying to do classifying images like either dog or cat then this would be a softmax with 2 outputs so this is a reasonably typical example of what a convolutional network looks like.
_DEFAULT_IMAGE_SIZE = 252 _NUM_CHANNELS = 3 _NUM_CLASSES = 2 """Model function for CNN.""" def cnn_model_fn(features, labels, mode): # Input Layer input_layer = tf.reshape(features["image"], [-1, _DEFAULT_IMAGE_SIZE, _DEFAULT_IMAGE_SIZE, 3]) # Convolutional Layer #1 conv1 = tf.layers.conv2d( inputs=input_layer, filters=32, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) # Pooling Layer #1 pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2) # Convolutional Layer #2 and Pooling Layer #2 conv2 = tf.layers.conv2d( inputs=pool1, filters=64, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2) # Dense Layer pool2_flat = tf.reshape(pool2, [-1, 126 * 126 * 64]) dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu) dropout = tf.layers.dropout( inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN) # Logits Layer logits = tf.layers.dense(inputs=dropout, units=2) ...
Generate Predictions
The logits layer of our model returns our predictions as raw values in a [
batch_size, 2]
-dimensional tensor. Let’s convert these raw values into two different formats that our model function can return:
- The predicted class for each example: Dog or Cat
Our predicted class is the element in the corresponding row of the logits tensor with the highest raw value. We can find the index of this element using the
tf.argmax
function:
We can derive probabilities from our logits layer by applying softmax activation using tf.nn.softmax
:
we defined loss for the model as the softmax cross-entropy of the logits layer and our labels. Let’s configure our model to optimize this loss value during training. We’ll use a learning rate of 0.001 and stochastic gradient descent as the optimization algorithm:
if mode == tf.estimator.ModeKeys.TRAIN: optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001) train_op = optimizer.minimize( loss=loss, global_step=tf.train.get_global_step()) return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
Add evaluation metrics
Define eval_metric_ops
dict in EVAL mode as follows:
eval_metric_ops = { "accuracy": tf.metrics.accuracy( labels=labels, predictions=predictions["classes"])} return tf.estimator.EstimatorSpec( mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)
Load Training and Test Data
Convert whatever data you have into a TFRecordes supported format.This approach makes it easier to mix and match data sets. The recommended format for TensorFlow is an TFRecords file containing tf.train.Example
protocol buffers which contain Features
as a field.
To read a file of TFRecords, use tf.TFRecordReader
with the tf.parse_single_example
decoder. The parse_single_example
op decodes the example protocol buffers into tensors.
def get_file_lists(data_dir): import glob train_list = glob.glob(data_dir + '/' + 'train-*') valid_list = glob.glob(data_dir + '/' + 'validation-*') if len(train_list) == 0 and \ len(valid_list) == 0: raise IOError('No files found at specified path!') return train_list, valid_list def parse_record(raw_record, is_training): """Parse an ImageNet record from `value`.""" keys_to_features = { 'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''), 'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'), 'image/class/label': tf.FixedLenFeature([], dtype=tf.int64, default_value=-1), 'image/class/text': tf.FixedLenFeature([], dtype=tf.string, default_value=''), } parsed = tf.parse_single_example(raw_record, keys_to_features) image = tf.image.decode_image( tf.reshape(parsed['image/encoded'], shape=[]), _NUM_CHANNELS) # Note that tf.image.convert_image_dtype scales the image data to [0, 1). image = tf.image.convert_image_dtype(image, dtype=tf.float32) image = vgg_preprocessing.preprocess_image( image=image, output_height=_DEFAULT_IMAGE_SIZE, output_width=_DEFAULT_IMAGE_SIZE, is_training=is_training) label = tf.cast( tf.reshape(parsed['image/class/label'], shape=[]), dtype=tf.int32) return {"image": image}, label
Train a model with a different image size.
The simplest solution is to artificially resize your images to 252×252 pixels. See Images section for many resizing, cropping and padding methods. Note that the entire model architecture is predicated on a 252x252 image, thus if you wish to change the input image size, then you may need to redesign the entire model architecture.
Fused decode and crop
If inputs are JPEG images that also require cropping, use fused tf.image.decode_and_crop_jpeg to speed up preprocessing. tf.image.decode_and_crop_jpeg only decodes the part of the image within the crop window. This significantly speeds up the process if the crop window is much smaller than the full image. For image data, this approach could speed up the input pipeline by up to 30%.
Create input functions
You must create input functions to supply data for training, evaluating, and prediction
def input_fn(is_training, filenames, batch_size, num_epochs=1, num_parallel_calls=1): dataset = tf.data.TFRecordDataset(filenames) if is_training: dataset = dataset.shuffle(buffer_size=1500) dataset = dataset.map(lambda value: parse_record(value, is_training), num_parallel_calls=num_parallel_calls) dataset = dataset.shuffle(buffer_size=10000) dataset = dataset.batch(batch_size) dataset = dataset.repeat(num_epochs) iterator = dataset.make_one_shot_iterator() features, labels = iterator.get_next() return features, labels def train_input_fn(file_path): return input_fn(True, file_path, 100, None, 10) def validation_input_fn(file_path): return input_fn(False, file_path, 50, 1, 1)
The Dataset API can handle a lot of common cases for you. Using the Dataset API, you can easily read in records from a large collection of files in parallel and join them into a single stream.
Create the Estimator
Next, let’s create an Estimator
a TensorFlow class for performing high-level model training, evaluation, and inference for our model. Add the following code to main()
:
classifier = tf.estimator.Estimator(model_fn=cnn_model_fn, model_dir="/tmp/convnet_model")
The model_fn
argument specifies the model function to use for training, evaluation, and prediction; we pass it the cnn_model_fn
that we have created.The model_dir
argument specifies the directory where model data (checkpoints) will be saved (here, we specify the temp directory /tmp/convnet_model
, but feel free to change to another directory of your choice).
Set Up a Logging Hook
CNN can take time to train, let’s set up some logging so we can track progress during training. We can use TensorFlow’s tf.train.SessionRunHook to create a tf.train.LoggingTensorHook that will log the probability values from the softmax layer of our CNN. Add the following to main().
# Set up logging for predictions tensors_to_log = {"probabilities": "softmax_tensor"} logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=50)
We store a dict of the tensors we want to log in tensors_to_log
. Each key is a label of our choice that will be printed in the log output, and the corresponding label is the name of a Tensor
in the TensorFlow graph. Here, our probabilities
can be found in softmax_tensor
, the name we gave our softmax operation earlier when we generated the probabilities in cnn_model_fn
.
Next, we create the LoggingTensorHook
, passing tensors_to_log
to the tensors
argument. We set every_n_iter=50
, which specifies that probabilities should be logged after every 50 steps of training.
Train the Model
Now we’re ready to train our model, which we can do by creating train_input_fn
ans calling train()
on mnist_classifier
. Add the following to main()
classifier.train(input_fn=lambda: train_input_fn(train_list), steps=10, hooks=[logging_hook])
Evaluate the Model
Once training is complete, we want to evaluate our model to determine its accuracy on the test set. We call the evaluate method, which evaluates the metrics we specified in eval_metric_ops
argument in the cnn_model_fn
. Add the following to main()
evalution = classifier.evaluate(input_fn=lambda: validation_input_fn(valid_list))
Run the Model
We’ve coded the CNN model function, Estimator
, and the training/evaluation logic; now run the python script.
Training CNN is quite computationally intensive. Estimated completion time of python script
will vary depending on your processor.To train more quickly, you can decrease the number of steps
passed to train()
, but note that this will affect accuracy.
Download this project from GitHub
Related Post
Convolutional Neural Network with Batch Normalization
Convert a directory of images to TFRecords
Deep learning model for Car Price prediction using TensorFlow
Importance of Batch Normalization in TensorFlow
References
http://cs231n.github.io/convolutional-networks/
https://www.tensorflow.org/tutorials/layers