What can we do if there are multiple types of objects in an image? Let’s understand the concept of multi-label image classification with an example. Check out the below image:

How many objects did you identify? There are too many – Zoo, animals, trees, parks, etc. So, Here the image belongs to more than one class and hence it is a multi-label image classification problem. In the multi-label problem, there is no constraint on how many classes the instance can be assigned to.
In this tutorial, you will discover how to develop a convolutional neural network to classify satellite images of the Amazon forest.
Download Dataset
In this tutorial, we use the world’s largest constellation of Earth-imaging satellites by Planet, While considerable research has been devoted to tracking changes in forests. You can download the entire dataset from Kaggle.To download the dataset first you need an account on Kaggle and after that, you need to accept competition rule.
You do not need to download all of the files. The specific files required for this tutorial are as follows:
- train-jpg.tar.7z – files for the training/test set.
- train.csv – a list of training file names and their labels, the labels are space-delimited
Inspecting the train_v2.csv file, you will see a mapping of jpg files in the training dataset and their mapping to class labels separated by a space.
image_label_mapping=read_csv('train_v2.csv')
image_label_mapping.head(5)
image_name=image_label_mapping['image_name']
labels=image_label_mapping['tags']
TARGET_SIZE=255

Image is small squares of satellite images taken from space of the Amazon rainforest in Brazil in terms of 17 classes, such as “road”, “primary”, or “clear”.

Prepare Dataset
Next, we prepare satellite photos and labels of the Amazon tropical rainforest for modeling.
tokenizer = Tokenizer(filters=' ')
tokenizer.fit_on_texts(labels)
label_seq = tokenizer.texts_to_sequences(labels)
label_length=len(tokenizer.word_index)+1
print(tokenizer.word_index)
x_train, x_test, y_train, y_test = train_test_split(image_paths, labels, test_size=0.2, random_state=1)
Here, we use Keras’ Tokenizer class to tokenize our labels. Next, we create one-hot-encoding using Keras’s to_categotical method and sum up all the labels so it’s become multi-label.
labels=[np_utils.to_categorical(label,num_classes=label_length,dtype='float32').sum(axis=0)[1:] for label in label_seq]
image_paths=[img_folder+img+".png" for img in image_name]
Multi-label classification is the problem of finding a model that maps inputs x to binary vectors y (assigning a value of 0 or 1 for each label in y).
Tensorflow detects colorspace incorrectly for this dataset, or the colorspace information encoded in the images is incorrect. It seems like Tensorflow doesn’t allow us to enforce colorspace while decoding images. So probably the easiest way is to “fix” the images.
for filename in tqdm(listdir('train-jpg')):
im = Image.open('train-jpg/'+filename)
im.convert('RGB').save('train-png/'+filename.split('.')[0]+'.png', "PNG", optimize=True)
Create a Model
In this tutorial, we will keep things simple and use the MobileNet V2 transfer learning. We will create the base model from the MobileNet model developed at Google and pre-trained on the ImageNet dataset.
IMG_SHAPE = (TARGET_SIZE, TARGET_SIZE, 3)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
We will freeze the convolutional base created from the previous step and use that as a feature extractor, add a classifier on top of it, and train the top-level classifier.
base_model.trainable = False
Add a classification head
We have more than one label for a single image. We need probabilities to be independent of each other. We use the sigmoid activation function. This will predict the probability for each class independently. One for each class and predict the probability for each class.
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(17, activation='sigmoid')
])
Compile the model
We have to convert the model into an N – binary classification problem, so we will use the binary_crossentropy loss.
model_2.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.0001),
loss='binary_crossentropy',
metrics=['accuracy'])
Train the model
Use the Datasets API to scale to large datasets. Pass a tf.data.Dataset
instance to the fit
method:
def read_image(path,label):
img_raw = tf.io.read_file(path)
image = tf.image.decode_png(img_raw, channels=3)
img_final = tf.image.resize(image, [TARGET_SIZE, TARGET_SIZE])
img_final = img_final/255.0
return img_final,label
def get_dataset(x,y,batch_size=32):
dataset=tf.data.Dataset.from_tensor_slices((x,y))
dataset=dataset.map(read_image)
dataset=dataset.shuffle(buffer_size=4000)
dataset = dataset.repeat()
dataset = dataset.batch(batch_size)
return dataset
steps_per_epoch=int(len(x_train)/32)
validation_step=int(len(x_test)/32)
train_ds=get_dataset(x_train,np.float32(y_train))
test_ds=get_dataset(x_test,np.float32(y_test))
The fit
method uses the steps_per_epoch
argument—this is the number of training steps the model runs before it moves to the next epoch.
history = model.fit(train_ds,
epochs=50,
steps_per_epoch=steps_per_epoch,
validation_steps=validation_step,
validation_data=test_ds)
Predict Image
We can use our model to make a prediction on new images. The model assumes that new images are color and that they have been squares with a size of 255×255.
def tags_mapping(one_hot_encoding):
values = one_hot_encoding.round()
tags = [tokenizer.index_word[i+1] for i in range(len(values)) if values[i] == 1.0]
return tags
img,label=read_image(x_train[img_id],y_train[img_id])
img=tf.expand_dims(img,axis=0)
prediction=model.predict(img,steps=1)[0]
prediction_tags=tags_mapping(prediction)
original_tags=tags_mapping(label)
image = imread(x_train[img_id])
plt.imshow(image)

Related Post
Loss function for multi-class and multi-label classification in Keras and PyTorch