If you observe that the model is overfitting, you can increase the dropout rate. Conversely, you should try decreasing the dropout rate if the model underfits the training set.
All images for training are located in one folder and the target labels are in a CSV file. It should be possible to use a list of labels instead of inferring the classes from the directory structure. We have a list of labels corresponding number of files in the directory.
The batch size defines the number of samples in the batch. That batch propagated through the network before updating the model parameters.
Keras library provides a way to calculate standard metrics when training and evaluating deep learning models. In Keras, metrics are passed during the compile stage as shown below. You can pass several metrics by comma separating them
The saturated neurons can kill gradients if we’re too positive or too negative of an input. They’re also not zero-centered and so we get these, this inefficient kind of gradient update. The third problem is an exponential function. This is a little bit computationally expensive.
One advantage of using sparse categorical cross-entropy is it saves time in memory as well as computation because it simply uses a single integer for a class, rather than a whole vector.
Forcing your network to learn redundant representations might sound very inefficient. But in practice, it makes things more robust and prevents overfitting. It also makes your network act as if taking the consensus over an ensemble of networks.
As we go deeper in the neural network typically you start off with larger images [32x32x3] then the height and width will gradually trend down as you go deeper in the neural network. Whereas the number of channels generally increases. You see this general trend in a lot of other convolutional neural networks.
It is sometimes desirable to use a separate penalty with a different coefficient for each layer of the network.
A macro-average will compute the metric independently for each class and then take the average hence treating all classes equally, whereas a micro-average will aggregate the contributions of all classes to compute the average metric.