Feeding your own data set into the CNN model in TensorFlow

I’m assuming you already know a fair bit about Neural Network and Convolutional Neural Network, as I won’t go into too much detail about their background and how they work. I am using TensorFlow as a Machine Learning framework. In case you are not familiar with TensorFlow, make sure to check out my recent post getting started with TensorFlow.


The Kaggle Dog vs Cat dataset consists of 25,000 color images of dogs and cats that we are supposed to use for training.Each image is a different size of pixel intensities, represented as [0, 255] integer values in RGB color space.


Before you run the training script for the first time, you will need to convert the data to native TFRecord format. The TFRecord format consists of a set of sharded files where each entry is a serialized tf.Example proto. Each tf.Example proto contains the image (JPEG encoded) as well as metadata such as label height, width no of channels.Google provide a single script for converting Image data to TFRecord format.

When the script finishes you will find 2 shards for the training and validation files in the DATA_DIR. The files will match the patterns train-?????-of-00002 and validation-?????-of-00002, respectively.

Convolution neural network architecture

ConvNet is a sequence of layers, and every layer of a ConvNet transforms one volume of activations to another through a differentiable function. We use three main types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer.We will stack these layers to form a full ConvNet architecture.

Building the CNN for Image Classifier

You need to know the building block to building a full convolution neural network. Let’s look at an example let’s say that you’re inputting an image which is 252x252x3 it’s an RGB image and trying to recognize either Dog or Cat.Let’s build a neural network to do this.
What’s gonna use in this post is inspired and it’s actually quite similar to one of the classic neural networks called LeNet-5.what up the show here isn’t exactly LeNet-5 but inspired by it but many of parameter choices were inspired by it.
convolution neural network architecture
252x252x3 input image lets say that the first layer uses a 32,5x5 filter stride of 1 and same padding so so the output of this layer same as the input call this layer conv1. Next, let’s apply a pooling layer so I’m going apply max pooling here and let’s use a filter 2x2 and strides=2.This should reduce the height and width of the representation by a factor of 2 so 252x252x32 now become 126x126x32.The number of channels remains the same. we are going to call this max pooling 1.
Next given 126x126x32 volume let’s apply another convolution layer to it let’s use a filter size this 5×5 and stride 1 and let’s use a 64 filters this time so now you end up with a 126x126x64 volume so called conv2 and then in this network lets’ do max pooling with a Filter:2×2 and Strides:2 and the 126X126X64 this will the half the height and width.

Dense Layer

Next, we want to add a dense layer (with 1,024 neurons and ReLU activation) to our CNN to perform classification on the features extracted by the convolution/pooling layers. Before we connect the layer, we’ll flatten our feature map (max pooling 2) to shape [batch_size, features], so that our tensor has only two dimensions:
63x63x64=254016 so let’s now fatten output to a 254016x1 dimensional vector we also think of this a flattened result into just a set of neurons.What we’re going to do is then take this 254016 units and let’s build the next layer as having 1024 units so this is actually our first fully connected layer I’m gonna call this FC2 because we have 254016 unit density connected to 1024 units. So this fully connected unit is just like the single neural network layer or this is just a standard neural network where you have a weight matrix that’s call W3 of dimension 1024x254016 and this is called fully connected because each of the 254016 units here is connected to each of the 1024 units.You also have a bias parameter that’s going to be just 1024 dimensional because of 1024 outputs.

Logits Layer

Finally, you now have 1024 real numbers that you can feed to a softmax unit and if you’re trying to do classifying images like either dog or cat then this would be a softmax with 2 outputs so this is a reasonably typical example of what a convolutional network looks like.

Generate Predictions

The logits layer of our model returns our predictions as raw values in a [batch_size, 2]-dimensional tensor. Let’s convert these raw values into two different formats that our model function can return:

  • The predicted class for each example: Dog or Cat

Our predicted class is the element in the corresponding row of the logits tensor with the highest raw value. We can find the index of this element using the tf.argmax function:

 The input argument specifies the tensor from which to extract maximum values—here logits. The axisargument specifies the axis of the input tensor along which to find the greatest value. Here, we want to find the largest value along the dimension with index of 1, which corresponds to our predictions (recall that our logits tensor has shape [batch_size, 2]).

We can derive probabilities from our logits layer by applying softmax activation using tf.nn.softmax:

Calculate Loss

For training and evaluation, we need to define a loss function that measures how closely the model’s predictions match the target classes. For classification problems, cross entropy is typically used as the loss metric. The following code calculates cross entropy when the model runs in either TRAIN or EVAL mode:

Training Operation

we defined loss for the model as the softmax cross-entropy of the logits layer and our labels. Let’s configure our model to optimize this loss value during training. We’ll use a learning rate of 0.001 and stochastic gradient descent as the optimization algorithm:

Add evaluation metrics

Define eval_metric_ops dict in EVAL mode as follows:

Load Training and Test Data

Convert whatever data you have into a TFRecordes supported format.This approach makes it easier to mix and match data sets. The recommended format for TensorFlow is an TFRecords file containing tf.train.Example protocol buffers  which contain Features as a field.

To read a file of TFRecords, use tf.TFRecordReader with the tf.parse_single_example decoder. The parse_single_example op decodes the example protocol buffers into tensors.

Train a model with a different image size.

The simplest solution is to artificially resize your images to 252×252 pixels. See Images section for many resizing, cropping and padding methods. Note that the entire model architecture is predicated on a 252x252 image, thus if you wish to change the input image size, then you may need to redesign the entire model architecture.

Fused decode and crop

If inputs are JPEG images that also require cropping, use fused tf.image.decode_and_crop_jpeg to speed up preprocessing. tf.image.decode_and_crop_jpeg only decodes the part of the image within the crop window. This significantly speeds up the process if the crop window is much smaller than the full image. For image data, this approach could speed up the input pipeline by up to 30%.

Create input functions

You must create input functions to supply data for training, evaluating, and prediction.Input function is a function that returns the following two-element tuple:

  • “features” – A Python dictionary in which:
    • Each key is the name of a feature.
    • Each value is an array containing all of that feature’s values.
  • “label” – An array containing the values of the label for every example.

The Dataset API can handle a lot of common cases for you. Using the Dataset API, you can easily read in records from a large collection of files in parallel and join them into a single stream.

Create the Estimator

Next, let’s create an Estimator a TensorFlow class for performing high-level model training, evaluation, and inference for our model. Add the following code to main():

The model_fn argument specifies the model function to use for training, evaluation, and prediction; we pass it the cnn_model_fn that we have created.The model_dir argument specifies the directory where model data (checkpoints) will be saved (here, we specify the temp directory /tmp/convnet_model, but feel free to change to another directory of your choice).

Set Up a Logging Hook

CNNs can take time to train, let’s set up some logging so we can track progress during training. We can use TensorFlow’s tf.train.SessionRunHook to create a tf.train.LoggingTensorHook that will log the probability values from the softmax layer of our CNN. Add the following to main().

We store a dict of the tensors we want to log in tensors_to_log. Each key is a label of our choice that will be printed in the log output, and the corresponding label is the name of a Tensor in the TensorFlow graph. Here, our probabilities can be found in softmax_tensor, the name we gave our softmax operation earlier when we generated the probabilities in cnn_model_fn.

Next, we create the LoggingTensorHook, passing tensors_to_log to the tensors argument. We set every_n_iter=50, which specifies that probabilities should be logged after every 50 steps of training.

Train the Model

Now we’re ready to train our model, which we can do by creating train_input_fn ans calling train() on mnist_classifier. Add the following to main()

Evaluate the Model

Once training is complete, we want to evaluate our model to determine its accuracy on the test set. We call the evaluate method, which evaluates the metrics we specified in eval_metric_ops argument in the cnn_model_fn. Add the following to main()

Run the Model

We’ve coded the CNN model function, Estimator, and the training/evaluation logic; now run the python script.

Training CNNs is quite computationally intensive. Estimated completion time of python script will vary depending on your processor.To train more quickly, you can decrease the number of steps passed to train(), but note that this will affect accuracy.

Download this project from GitHub


Related Post






Convert a directory of images to TFRecords

In this post, I’ll show you how you can convert the dataset into a TFRecord file so you can fine-tune the model.

Before you run the training script for the first time, you will need to convert the Image data to native TFRecord format. The TFRecord format consists of a set of shared files where each entry is a serialized tf.Example proto. Each tf.Example proto contains the image as well as metadata such as label and bounding box information.

TFRecord file format is a simple record-oriented binary format that many TensorFlow applications use for training data.It is default file format for TensorFlow.Binary files are sometimes easier to use because you don’t have to specify different directories for images and annotations. While storing your data in the binary file, you have your data in one block of memory, compared to storing each image and annotation separately. Opening a file is a considerably time-consuming operation especially if you use HDD.Overall, by using binary files you make it easier to distribute and make the data better aligned for efficient reading.

This native file format used in Tensorflow allows you to shuffle, batch and split datasets with its own functions.Most of the batch operations aren’t done directly from images, rather they are converted into a single tfrecord file.

Convert images into a TFRecord

Before you start any training, you’ll need a set of images to teach the model about the new classes you want to recognize.When you are working with an image dataset, what is the first thing you do? Split into Train and Validate sets.

Here’s an example, which assumes you have a folder containing class-named subfolders, each full of images for each label. The example folder animal_photos should have a structure like this:

The subfolder names are important since they define what label is applied to each image, but the filenames themselves don’t matter.The label for each image is taken from the name of the subfolder it’s in.

The list of valid labels is held in label file. The code assumes that the fill contains entries as such:

where each line corresponds to a label. Script map each label contained in the file to an integer corresponding to the line number starting from 0.

Code Organization

The code for this tutorial resides in data/build_image_data.py.Change train_directory path which contain training image data,validation_directory path which contain validation image data,output_directory which contain tfrecord file after run python script and labels_file which is contains a list of valid labels are held in this file.

This TensorFlow script converts the training and evaluation data into a sharded data set consisting of TFRecord files

where we have selected 1024 and 128 shards for each data set. Each record within the TFRecord file is a serialized Example proto.


Related Post


Deep learning model for Car Price prediction using TensorFlow

Before AI, Image search looks at the metadata and looks where the images are and now, with AI the computer looks at the images.If you search for say, Tiger, it looks at the images, all of them in the world and whenever it sees one that has Tiger in it, it returns it. AI is now going to be everywhere, and machine learning is everywhere.What makes this image search work, in reality, is TensorFlow.Once you’ve written one of these models in TensorFlow you can deploy it anywhere in mobile.TensorFlow is truly what enables apps to classify the image.

TensorFlow gives you distribution out of the box. So that you can run it in the cloud if you need to do that.It works on all of the hardware you need to work on.It’s fast and it’s flexible and what I’m going to tell you today is that it’s also super easy to get started.

tensorflow programming environment

The generic thing that people used to say this is TensorFlow, it’s pretty low level.So you’re thinking about like multiplying matrices, adding vectors together, that kind of thing.What TensorFlow built on top of this is libraries that help you do more complex things easier.TensorFlow built a library of layers to help you build models, it built training infrastructure that helps you actually train a model and evaluate a model and put it into production.This you can do with Keras or you can do with estimators.Finally, it built models in a box and those are really full, complete machine learning algorithms such as run, and all you have to do is instantiate one and go.That’s mostly what I’m going to talk about today.

So usually when you talk about, oh, my first model in TensorFlow it’s usually something simple, like let’s fit a line to a bunch of points or something like that.But nobody is actually interested in fitting a line to a bunch of points, distributed.It doesn’t really happen all that much in reality.So we’re not going to do that. I’m going to show you instead how to handle a variety of features, and then train and evaluate different types of models.We do that on a data set of cars.So the first model today will be about predicting the price of a car from a bunch of features about the car, information about the car.


The next thing I can do with TensorBord is I can actually look at the model that was created and look at the lower levels of the model, look at what we call the graph. TehsorFlow works by generating a graph, and then this graph is shipped to all of the distributed workers that it has, it’s executed there.You don’t have to worry about this too much, but it’s awfully useful to be able to inspect this graph when you’re doing debugging or something like that.

Launching TensorBoard from Python

To run TensorBoard, use the following code.logdir points to the directory where the FileWriter serialized its data.Once TensorBoard is running, navigate your web browser to localhost:6006 to view the TensorBoard.

Data Set

First thing to do Download dataset.We’re using pandas to read the CSV file. This is easy for small datasets but for large and complex datasets.

The CSV file does not have a header, so we have to fill in column names.We also have to specify dtypes.

The training set contains the examples that we’ll use to train the model; the test set contains the examples that we’ll use to evaluate the trained model’s effectiveness.

The training set and test set started out as a single data set. Then, we split the examples, with the majority going into the training set and the remainder going into the test set. Adding examples to the training set usually builds a better model; however, adding more examples to the test set enables us to better gauge the model’s effectiveness. Regardless of the split, the examples in the test set must be separate from the examples in the training set. Otherwise, you can’t accurately determine the model’s effectiveness.

Feature Columns

Feature Columns are the intermediaries between raw data and Estimators. Feature columns are very rich, enabling you to transform a diverse range of raw data into formats that Estimators can use, allowing easy experimentation.

Every neuron in a neural network performs multiplication and addition operations on weights and input data. Real-life input data often contains non-numerical (categorical) data. For example, consider a fuel-type feature that can contain the following two non-numerical values:

  • gas
  • diesel

ML models generally represent categorical values as simple vectors in which a 1 represents the presence of a value and a 0 represents the absence of a value. For example, when fuel_type is set to diesel, an ML model would usually represent fuel_type as [1,0], meaning:

  • 0gas is absent
  • 1diesel is present

So, although raw data can be numerical or categorical, an ML model represents all features as numbers.

Numeric column

The price predictor calls the tf.feature_column.numeric_column function for numeric input features:

Categorical column

We cannot input strings directly to a model. Instead, we must first map strings to numeric or categorical values. Categorical vocabulary columns provide a good way to represent strings as a one-hot vector.

Hashed Column

the number of categories can be so big that it’s not possible to have individual categories for each vocabulary word or integer because that would consume too much memory. For these cases, we can instead turn the question around and ask, “How many categories am I willing to have for my input?” In fact, the tf.feature_column.categorical_column_with_hash_bucket function enables you to specify the number of categories.

Create Input Functions

I still have to give it some input data.TensorFlow has off-the-shelf input pipeline for most formats, or for many formats.And particularly, here in this example, I’m using input from pandas.So I’m going to read input from a pandas data frame.

What I’m telling it here is I want to use the batches of 64. So each iteration of the algorithm will use 64 input data pieces. I’m going to shuffle the input, which always a good thing to do when you’re training.Please always shuffle the input and num_epochs=None means to cycle through the data indefinitely.If you’re done with the data, just do it again.

Instantiate an Estimator

We specify what kind of machine learning algorithm we want to apply to prediction Car price and in my case here, I’m going to use first a linear regression, which is kind of the simplest way to learn something and all I have to do is tell him, hey,look,you’re going to use these input features that I’ve just declared.

Train, Evaluate, and Predict

Now that we have an Estimator object, we can call methods to do the following:

  • Train the model.
  • Evaluate the trained model.
  • Use the trained model to make predictions.

Train the model

Train the model by calling the Estimator’s train method as follows:

The steps argument tells the method to stop training after a number of training steps.

Evaluate the trained model

Now that the model has been trained, we can get some statistics on its performance. The following code block evaluates the accuracy of the trained model on the test data:

Unlike our call to the train method, we did not pass the steps argument to evaluate. Our eval_input_fn only yields a single epoch of data.

Making predictions (inferring) from the trained model

We now have a trained model that produces good evaluation results. We can now use the trained model to predict the price of a car flower based on some unlabeled measurements. As with training and evaluation, we make predictions using a single function call:

Deep Neural Network

We have to obviously change the name of the class that we’re using.Then we’ll also have to adapt the inputs to something that this new model can use.So in this case, a DNN model can’t use these categorical features directly.we have to do something to it and the two things that you can do to a categorical feature, typically, to make it work with a deep neural network is you either embed it or you transform it into what’s called a one-hot or an indicator.So we do this by simply saying, hey, make me an embedding, and out of the cylinders, make it an indicator column because there are not so many values there.Usually, this is fairly complicated stuff, and you have to write a lot of code.

Then also, most of these more complicated models have hyperparameters and in this case, the DNN, basically we tell it, hey, make me a three-layer neural network with layer size 50,30, and 10 and that’s all really you need to–this is a very high-level interface.


TensorFlow, implementations of complete machine learning models.You can get started with them extremely quickly.They come with all of the integrations, with TensorBord, visualization for serving and production, for different hardware, different use cases.They obviously work in distributed settings.We use them in data centers.You can use them on your home computer network if that’s what you’d like.You can use them in flocks of mobile devices.Everything is possible.They run on all kinds of different hardware.Particularly, they will run on TPU.They also always run on GPU, on CPU.

Download this project from GitHub


Related Post