Handwritten digits recognition is a very classical problem in the machine. It’s been around for 20 years is you go to MNIST the website you will see that people have been working on this have 20 years of scientific papers published.
In this tutorial we build simplest possible neural network for recognizing handwritten digits. We build a model using TensorFlow Keras high-level API. Instead of training it using Keras, we will convert it to TensorFlow Estimator and train it as a TensorFlow Estimator for the ability to do better-distributed training.
Building CNN MNIST Classifier
Convolutional networks invented specifically for 2d data where shape information or locality information is important. Let’s build a model to classify the images in the MNIST dataset using the following CNN architecture.
- Conv Layer #1: Applies 32 3×3 filters, with ReLU activation function and BatchNormalization regularization.
- Conv Layer #2: Applies 32 3×3 filters, with ReLU activation functionand BatchNormalization regularization.
- Pooling Layer #1: Performs max pooling with a 2×2 filter with dropout regularization rate of 0.20.
- Conv Layer #3: Applies 64 3×3 filters, with ReLU activation function BatchNormalization regularization.
- Conv Layer #4: Applies 64 3×3 filters, with ReLU activation function BatchNormalization regularization.
- Pooling Layer #2: Again, performs max pooling with a 2×2 filter with dropout regularization rate of 0.30.
- Conv Layer #5: Applies 128 3×3 filters, with ReLU activation function BatchNormalization regularization.
- Pooling Layer #3: Again, performs max pooling with a 2×2 filter with dropout regularization rate of 0.40.
- Dense Layer #1: 1152 neurons, with dropout regularization rate of 0.5.
- Dense Layer #2 (Logits Layer): 10 neurons, one for each digit target class (0–9).
model.add( tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation=tf.nn.relu, input_shape=(28, 28, 1), padding='same')) model.add(tf.keras.layers.BatchNormalization()) model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation=tf.nn.relu, padding='same')) model.add(tf.keras.layers.BatchNormalization()) model.add(tf.keras.layers.MaxPool2D((2, 2))) model.add(tf.keras.layers.Dropout(0.20)) model.add(tf.keras.layers.Conv2D(64, (3, 3), activation=tf.nn.relu, padding='same')) model.add(tf.keras.layers.BatchNormalization()) model.add(tf.keras.layers.Conv2D(64, (3, 3), activation=tf.nn.relu, padding='same')) model.add(tf.keras.layers.BatchNormalization()) model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2))) model.add(tf.keras.layers.Dropout(0.30)) model.add(tf.keras.layers.Conv2D(128, (3, 3), activation=tf.nn.relu, padding='same')) model.add(tf.keras.layers.BatchNormalization()) model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2))) model.add(tf.keras.layers.Dropout(0.40)) model.add(tf.keras.layers.Flatten()) model.add(tf.keras.layers.Dense(200, activation=tf.nn.relu)) model.add(tf.keras.layers.BatchNormalization()) model.add(tf.keras.layers.Dropout(0.50)) model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax, name="output")) model.compile(loss=tf.keras.losses.categorical_crossentropy, optimizer=tf.keras.optimizers.Adam(), metrics=['accuracy']) model.summary()
Convert Keras model in TensorFlow Estimators
The typical reason you would export a Keras model, or at least convert a Keras model to an estimator is for the ability to do better-distributed training. You get distribution and GPU scaling for free in terms of there’s no additional work.
mnist_estimator = tf.keras.estimator.model_to_estimator(keras_model=model, model_dir="check-point")
Load Training and Test Data
First, let’s load our training and test data.
mnist = tf.contrib.learn.datasets.load_dataset("mnist") train_data = mnist.train.images.reshape(mnist.train.images.shape[0], 28, 28, 1) # Returns np.array train_labels = np.asarray(mnist.train.labels, dtype=np.int32) eval_data = mnist.test.images.reshape(mnist.test.images.shape[0], 28, 28, 1) # Returns np.array eval_labels = np.asarray(mnist.test.labels, dtype=np.int32) num_class = 10 train_labels = tf.keras.utils.to_categorical(train_labels, num_class) eval_labels = tf.keras.utils.to_categorical(eval_labels, num_class)
We store the training feature data and training labels as numpy arrays in train_data
and train_labels
, respectively. Similarly, we store the evaluation feature data (10,000 images) and evaluation labels in eval_data
and eval_labels
, respectively.
Train Model
Now we’re ready to train our model, which we can do by creating train_input_fn and calling train() on mnist_estimator.
train_input_fn = tf.estimator.inputs.numpy_input_fn( x={"conv2d_input": train_data}, y=train_labels, batch_size=100, num_epochs=None, shuffle=True) mnist_estimator.train( input_fn=train_input_fn, steps=10000)
In the numpy_input_fn
call, we pass the training feature data and labels to x as a dict and y, respectively. We set a batch_size
of 100 which means that the model will train on minibatches of 100 examples at each step. num_epochs=None
means that the model will train until the specified number of steps is reached. We also set shuffle=True
to shuffle the training data. In the train call, we set steps=10000
.
Evaluate Model
Once training is complete, we want to evaluate our model to determine its accuracy on the MNIST test set. We call the evaluate method, which evaluates the metrics.
eval_input_fn = tf.estimator.inputs.numpy_input_fn( x={"conv2d_input": eval_data}, y=eval_labels, num_epochs=1, shuffle=False ) eval_results = mnist_estimator.evaluate(input_fn=eval_input_fn) print(eval_results)
To create eval_input_fn
, we set num_epochs=1
, so that the model evaluates the metrics over one epoch of data and returns the result. We also set shuffle=False
to iterate through the data sequentially.
Predictions (inferring) from Trained Model
We now have a trained model that produces .9953% evaluation results. We can now use the trained model to predict handwritten digits based on some unlabeled data. As with training and evaluation, we make predictions using a single function call.
predict_input_fn = tf.estimator.inputs.numpy_input_fn( x={"conv2d_input": eval_data[1:45]}, num_epochs=1, shuffle=False ) predict_result = list(mnist_estimator.predict(input_fn=predict_input_fn)) import matplotlib.pyplot as plt pos = 1 for img, lbl, predict_lbl in zip(eval_data[1:45], eval_labels[1:45], predict_result): output = np.argmax(predict_lbl.get('output'), axis=None) lbl = np.argmax(lbl, axis=None) plt.subplot(4, 11, pos) plt.imshow(img.reshape(28, 28)) plt.axis('off') if output == lbl: plt.title(output) else: plt.title(output + "/" + lbl, color='#ff0000') pos += 1 plt.show()