In this post, I’ll show you how you can convert the dataset into a TFRecord file so you can fine-tune the model.
Before you run the training script for the first time, you will need to convert the Image data to native TFRecord format. The TFRecord format consists of a set of shared files where each entry(image) is a serialized tf.Example
proto. Each tf.Example
proto contains the image as well as metadata such as label and bounding box information.
TFRecord file format is a simple record-oriented binary format that many TensorFlow applications use for training data.It is default file format for TensorFlow.
Advantages Of Binary Format
Binary files are sometimes easier to use because you don’t have to specify different directories for images and annotations. While storing your data in the binary file, you have your data in one block of memory, compared to storing each image and annotation separately. Opening a file is a considerably time-consuming operation especially if you use HDD.Overall, by using binary files you make it easier to distribute and make the data better aligned for efficient reading. This file format allows you to shuffle, batch and split datasets with its own functions.Most of the batch operations aren’t done directly from images, rather they are converted into a single tfrecord file.
Convert images into a TFRecord
Before you start any training, you’ll need a set of images to teach the model about the new classes you want to recognize.When you are working with an image dataset, what is the first thing you do? Split into Train and Validate sets. Here’s an example, which assumes you have a folder containing class-named subfolders, each full of images for each label. The example folder animal_photos should have a structure like this:
~/animal_photos/dog/photo1.jpg ~/animal_photos/dog/photo2.jpg ... ~/animal_photos/cat/anotherphoto77.jpg ... ~/animal_photos/cat/somepicture.jpg
The subfolder names are important since they define what label is applied to each image, but the filenames themselves don’t matter.The label for each image is taken from the name of the subfolder it’s in. The list of valid labels is held in label file. The code assumes that the fill contains entries as such:
dog cat
where each line corresponds to a label. Script map each label contained in the file to an integer corresponding to the line number starting from 0.
Code Organization
The code for this tutorial resides in data/build_image_data.py.Change train_directory
path which contain training image data,validation_directory
path which contain validation image data,output_directory
which contain tfrecord file after run python script and labels_file
which is contains a list of valid labels are held in this file. This TensorFlow script converts the training and evaluation data into a sharded data set consisting of TFRecord files
train_directory/train-00000-of-01024 train_directory/train-00001-of-01024 ... train_directory/train-01023-of-01024 and validation_directory/validation-00000-of-00128 validation_directory/validation-00001-of-00128 ... validation_directory/validation-00127-of-00128
where we have selected 1024 and 128 shards for each data set. Each record within the TFRecord file is a serialized Example proto.
Related Post
- Feeding your own data set into the CNN model in TensorFlow
- Deep learning model for Car Price prediction using TensorFlow
]]>