What can we do if there are multiple type objects in an image? Let’s understand the concept of multi-label image classification with an example. Check out the below image:
How many objects did you identify? There are too many – Zoo, animals, tree, park, etc. So, Here the image belongs to more than one class and hence it is a multi-label image classification problem. In the multi-label problem, there is no constraint on how many classes the instance can be assigned to.
In this tutorial, you will discover how to develop a convolutional neural network to classify satellite images of the Amazon forest.
In this tutorial, we use the world’s largest constellation of Earth-imaging satellites by Planet, While considerable research has been devoted to tracking changes in forests. You can Download entire dataset from Kaggle.To download dataset first you need an account on Kaggle and after that, you need to accept competition rule.
You do not need to download all of the files. The specific files required for this tutorial are as follows:
- train-jpg.tar.7z – files for the training/test set.
- train.csv – a list of training file names and their labels, the labels are space-delimited
Inspecting the train_v2.csv file, you will see a mapping of jpg files in the training dataset and their mapping to class labels separated by a space.
image_label_mapping=read_csv('train_v2.csv') image_label_mapping.head(5) image_name=image_label_mapping['image_name'] labels=image_label_mapping['tags'] TARGET_SIZE=255
Image is small squares of satellite images taken from space of the Amazon rainforest in Brazil in terms of 17 classes, such as “road”, “primary”, or “clear”.
Next, we prepare satellite photos and labels of the Amazon tropical rainforest for modeling.
tokenizer = Tokenizer(filters=' ') tokenizer.fit_on_texts(labels) label_seq = tokenizer.texts_to_sequences(labels) label_length=len(tokenizer.word_index)+1 print(tokenizer.word_index) x_train, x_test, y_train, y_test = train_test_split(image_paths, labels, test_size=0.2, random_state=1)
Here, we use the Keras’ Tokenizer class to tokenize our labels. Next, we create one-hot-encoding using Keras’s to_categotical method and sum up all the label so it’s become multi-label.
labels=[np_utils.to_categorical(label,num_classes=label_length,dtype='float32').sum(axis=0)[1:] for label in label_seq] image_paths=[img_folder+img+".png" for img in image_name]
Multi-label classification is the problem of finding a model that maps inputs x to binary vectors y (assigning a value of 0 or 1 for each label in y).
Tensorflow detects colorspace incorrectly for this dataset, or the colorspace information encoded in the images is incorrect. It seems like Tensorflow doesn’t allow to enforce colorspace while decoding images. So probably the easiest way is to “fix” the images.
for filename in tqdm(listdir('train-jpg')): im = Image.open('train-jpg/'+filename) im.convert('RGB').save('train-png/'+filename.split('.')+'.png', "PNG", optimize=True)
Create a Model
In this tutorial, we will keep things simple and use the MobileNet V2 transfer learning. We will create the base model from the MobileNet model developed at Google, and pre-trained on the ImageNet dataset.
IMG_SHAPE = (TARGET_SIZE, TARGET_SIZE, 3) base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
We will freeze the convolutional base created from the previous step and use that as a feature extractor, add a classifier on top of it and train the top-level classifier.
base_model.trainable = False
Add a classification head
We have more than one label for a single image. We need probabilities to be independent of each other. We use the sigmoid activation function. This will predict the probability for each class independently. One for each class and predict the probability for each class.
model = tf.keras.Sequential([ base_model, tf.keras.layers.GlobalAveragePooling2D(), tf.keras.layers.Dense(17, activation='sigmoid') ])
Compile the model
We have to convert the model into an N – binary classification problem, so we will use the binary_crossentropy loss.
model_2.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.0001), loss='binary_crossentropy', metrics=['accuracy'])
Train the model
def read_image(path,label): img_raw = tf.io.read_file(path) image = tf.image.decode_png(img_raw, channels=3) img_final = tf.image.resize(image, [TARGET_SIZE, TARGET_SIZE]) img_final = img_final/255.0 return img_final,label def get_dataset(x,y,batch_size=32): dataset=tf.data.Dataset.from_tensor_slices((x,y)) dataset=dataset.map(read_image) dataset=dataset.shuffle(buffer_size=4000) dataset = dataset.repeat() dataset = dataset.batch(batch_size) return dataset steps_per_epoch=int(len(x_train)/32) validation_step=int(len(x_test)/32) train_ds=get_dataset(x_train,np.float32(y_train)) test_ds=get_dataset(x_test,np.float32(y_test))
fit method uses the
steps_per_epoch argument—this is the number of training steps the model runs before it moves to the next epoch.
history = model.fit(train_ds, epochs=50, steps_per_epoch=steps_per_epoch, validation_steps=validation_step, validation_data=test_ds)
We can use our model to make a prediction on new images. The model assumes that new images are color and that they have been squares with the size of 255×255.
def tags_mapping(one_hot_encoding): values = one_hot_encoding.round() tags = [tokenizer.index_word[i+1] for i in range(len(values)) if values[i] == 1.0] return tags
img,label=read_image(x_train[img_id],y_train[img_id]) img=tf.expand_dims(img,axis=0) prediction=model.predict(img,steps=1) prediction_tags=tags_mapping(prediction) original_tags=tags_mapping(label) image = imread(x_train[img_id]) plt.imshow(image)