Handwritten digits recognition is a very classical problem in the machine. It’s been around for 20 years is you go to MNIST the website you will see that people have been working on this have 20 years of scientific papers published.

In this tutorial we build simplest possible neural network for recognizing handwritten digits. We build a model using TensorFlow Keras high-level API. Instead of training it using Keras, we will convert it to TensorFlow Estimator and train it as a TensorFlow Estimator for the ability to do better-distributed training.

Building CNN MNIST Classifier


Convolutional networks invented specifically for 2d data where shape information or locality information is important. Let’s build a model to classify the images in the MNIST dataset using the following CNN architecture.

TensorFlow Keras MNIST CNN Arch

  1. Conv Layer #1: Applies 32 3×3 filters, with ReLU activation function and BatchNormalization regularization.
  2. Conv Layer #2: Applies 32 3×3 filters, with ReLU activation functionand  BatchNormalization regularization.
  3. Pooling Layer #1: Performs max pooling with a 2×2 filter with dropout regularization rate of 0.20.
  4. Conv Layer #3: Applies 64 3×3 filters, with ReLU activation function BatchNormalization regularization.
  5. Conv Layer #4: Applies 64 3×3 filters, with ReLU activation function BatchNormalization regularization.
  6. Pooling Layer #2: Again, performs max pooling with a 2×2 filter with dropout regularization rate of 0.30.
  7. Conv Layer #5: Applies 128 3×3 filters, with ReLU activation function BatchNormalization regularization.
  8. Pooling Layer #3: Again, performs max pooling with a 2×2 filter with dropout regularization rate of 0.40.
  9. Dense Layer #1: 1152 neurons, with dropout regularization rate of 0.5.
  10. Dense Layer #2 (Logits Layer): 10 neurons, one for each digit target class (0–9).
model.add(
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation=tf.nn.relu, input_shape=(28, 28, 1), padding='same'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation=tf.nn.relu, padding='same'))
model.add(tf.keras.layers.BatchNormalization())

model.add(tf.keras.layers.MaxPool2D((2, 2)))
model.add(tf.keras.layers.Dropout(0.20))

model.add(tf.keras.layers.Conv2D(64, (3, 3), activation=tf.nn.relu, padding='same'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation=tf.nn.relu, padding='same'))
model.add(tf.keras.layers.BatchNormalization())

model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.30))

model.add(tf.keras.layers.Conv2D(128, (3, 3), activation=tf.nn.relu, padding='same'))
model.add(tf.keras.layers.BatchNormalization())

model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.40))

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(200, activation=tf.nn.relu))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.50))

model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax, name="output"))


model.compile(loss=tf.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

model.summary()

Convert Keras model in TensorFlow Estimators


The typical reason you would export a Keras model, or at least convert a Keras model to an estimator is for the ability to do better-distributed training. You get distribution and GPU scaling for free in terms of there’s no additional work.

mnist_estimator = tf.keras.estimator.model_to_estimator(keras_model=model, model_dir="check-point")

Load Training and Test Data


First, let’s load our training and test data.

mnist = tf.contrib.learn.datasets.load_dataset("mnist")

train_data = mnist.train.images.reshape(mnist.train.images.shape[0], 28, 28, 1)  # Returns np.array
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images.reshape(mnist.test.images.shape[0], 28, 28, 1)  # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

num_class = 10

train_labels = tf.keras.utils.to_categorical(train_labels, num_class)
eval_labels = tf.keras.utils.to_categorical(eval_labels, num_class)

We store the training feature data and training labels as numpy arrays in train_data and train_labels, respectively. Similarly, we store the evaluation feature data (10,000 images) and evaluation labels in eval_data and eval_labels, respectively.

Train Model


Now we’re ready to train our model, which we can do by creating train_input_fn and calling train() on mnist_estimator.

train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"conv2d_input": train_data},
    y=train_labels,
    batch_size=100,
    num_epochs=None,
    shuffle=True)

mnist_estimator.train(
    input_fn=train_input_fn,
    steps=10000)

In the numpy_input_fn call, we pass the training feature data and labels to x as a dict and y, respectively. We set a batch_size of 100 which means that the model will train on minibatches of 100 examples at each step. num_epochs=None means that the model will train until the specified number of steps is reached. We also set shuffle=True to shuffle the training data. In the train call, we set steps=10000.

Evaluate Model


Once training is complete, we want to evaluate our model to determine its accuracy on the MNIST test set. We call the evaluate method, which evaluates the metrics.

eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"conv2d_input": eval_data},
    y=eval_labels,
    num_epochs=1,
    shuffle=False
)

eval_results = mnist_estimator.evaluate(input_fn=eval_input_fn)
print(eval_results)

To create eval_input_fn, we set num_epochs=1, so that the model evaluates the metrics over one epoch of data and returns the result. We also set shuffle=False to iterate through the data sequentially.

Evaluate the trained model

Predictions (inferring) from Trained Model


We now have a trained model that produces .9953% evaluation results. We can now use the trained model to predict handwritten digits based on some unlabeled data. As with training and evaluation, we make predictions using a single function call.

predict_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"conv2d_input": eval_data[1:45]},
    num_epochs=1,
    shuffle=False
)

predict_result = list(mnist_estimator.predict(input_fn=predict_input_fn))
import matplotlib.pyplot as plt
pos = 1
for img, lbl, predict_lbl in zip(eval_data[1:45], eval_labels[1:45], predict_result):
    output = np.argmax(predict_lbl.get('output'), axis=None)
    lbl = np.argmax(lbl, axis=None)
    plt.subplot(4, 11, pos)
    plt.imshow(img.reshape(28, 28))
    plt.axis('off')
    if output == lbl:
        plt.title(output)
    else:
        plt.title(output + "/" + lbl, color='#ff0000')
    pos += 1

plt.show()

MNIST Prediction Result

Download from GitHub