Data loading is one of the first steps in building a Deep Learning pipeline or training a model. In this post, we will learn how to iterate the DataLoader using the iter() and next() functions to get a batch of data from the Dataloader.

To retrieve the next value from an iterator, we can use the next() function. We cannot use next() directly with a DataLoader we need to make a DataLoader an iterator and then use next(). If we want to create an iterable DataLoader, we can use iter() function and pass that DataLoader in the argument.

The DataLoader is a function that iterates through all our available data and returns it in the form of batches. For example, if we have a dataset of 32 images, and we decide to batch the data with a size of 4. Our DataLoader would process the data, and return 8 batches of 4 images each.

The Dataset class is an abstract class representing the dataset. It allows us to treat the dataset as an object of a class, rather than a set of data and labels. Dataset class returns a pair of [input, label] every time it is called. 

!wget https://download.pytorch.org/tutorial/hymenoptera_data.zip
data_dir='/content/hymenoptera_data/'
data_transforms = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
image_dataset = datasets.ImageFolder(os.path.join(data_dir, 'train'),transform=data_transforms)
dataloader =  torch.utils.data.DataLoader(image_dataset, batch_size=4,shuffle=True, num_workers=2)

To access an individual batch from the DataLoader, we first pass the DataLoader object to Python’s iter() built-in function, which returns an object representing a stream of data.

Iterator object allows you to traverse through all the elements of a DataLoader, regardless of its specific implementation. An iterator is an object representing a stream of data. You can create an iterator object by applying the iter() built-in function to an iterable.

iterator=iter(dataloaders)

With the stream of data, we can use Python built-in next() function to get the next data element in the stream of data. From this, we are expecting to get a batch of samples.

inputs, classes = next(iterator)
print(len(inputs)) #4

We can get the next element in a sequence without keeping the entire dataset in memory.

You can use an iterator to manually loop over the iterable it or repeated passing of an iterator to the built-in function next() returns successive items in the stream. Once, when you consume an item from an iterator, it’s gone. When no more data are available a StopIteration exception is raised.

Related Post