Apply machine learning is a highly empirical process and highly intuitive process in which you just have to train a lot of models to find one that works really well. One thing that makes it more difficult is that deep learning works best with Big Data and training on large data sets is just slow. 

Steps Per Epoch

It is useful if you have a huge data set or if you are generating random data augmentations on the fly, i.e. infinite size. steps_per_epoch is batches of samples to train. It is used to define how many batches of samples to use in one epoch. It is used to declare one epoch finished and start the next epoch. If you have a training set of the fixed size you can ignore it. 

Validation Steps

Validation steps are similar to steps_per_epoch but it is on the validation data instead of the training data. If you have a validation dataset fixed size you can ignore it.

It is only relevant if validation_data is provided and is a tf.data dataset object. If validation_steps is specified and only part of the dataset will be consumed, the evaluation will start from the beginning of the dataset at each epoch. This ensures that the same validation samples are used every time.

Calculate steps_per_epoch and validation_steps

By default, both parameters are None, it is equal to the number of samples in your dataset divided by the batch size or 1 if that cannot be determined. 

If the input data is a tf.data dataset object, and steps_per_epoch is None, the epoch will run until the input dataset is empty. This argument is not supported with array inputs.

If you want to your model passes through all of your training data one time in each epoch you should provide steps per epoch equal to a number of batches like this:

BATCH_SIZE=32

TRAINING_SIZE = 40000

VALIDATION_SIZE = 10000

# We take the ceiling because we do not drop the remainder of the batch
compute_steps_per_epoch = lambda x: int(math.ceil(1. * x / BATCH_SIZE))

steps_per_epoch = compute_steps_per_epoch(TRAINING_SIZE)
val_steps = compute_steps_per_epoch(VALIDATION_SIZE)

history = model.fit(x=train_batches,
              epochs=5,
              steps_per_epoch=steps_per_epoch,
              callbacks=[model_checkpoint],
              validation_data=val_batches,
              validation_steps=val_steps,
              shuffle=True)

As from the above equation the largest the batch_size, the lower the steps_per_epoch.

Validation Split

The fraction of the training data to be used as validation data. It is a float between 0 and 1 and will evaluate the loss and any model metrics on this data at the end of each epoch. 

Validation data is always fixed and taken from the bottom of the training dataset when you set the validation_ split. The function is designed to ensure that the data is separated in such a way that it always trains on the same portion of the data for each epoch. All shuffling is done after split training data. The model will not train on this fraction of the validation data.

Problem with Validation Split

However, for some datasets, the last few instances are not useful, specifically if the dataset is regroup based on class. Then the distribution of your classes will be skewed. For this, I always like to use the sklearn function train_test_split and you confirm that this is a better method since it will randomly get test/validation data from the dataset.

from sklearn.model_selection import train_test_split
train_test_split(X, Y, test_size=0.2, random_state=42)

This argument is not supported when x is a dataset, generator or keras.utils.Sequence instance.

Related Post

Keras Early Stopping Monitor Options (Validation vs Training loss