Data loading is one of the first steps in building a Deep Learning pipeline, or training a model. In this post, we will learn how to iterate the DataLoader using the iter() and next() functions to get a batch of data from the Dataloader.
To retrieve the next value from an iterator, we can use the next() function. We cannot use next() directly with a DataLoader we need to make a DataLoader an iterator and then use next(). If we want to create an iterable DataLoader, we can use iter() function and pass that DataLoader in the argument.
The DataLoader is a function that iterates through all our available data and returns it in the form of batches. For example, if we have a dataset of 32 images, and we decide to batch the data with a size of 4. Our DataLoader would process the data, and return 8 batches of 4 images each.
The Dataset class is an abstract class representing the dataset. It allows us to treat the dataset as an object of a class, rather than a set of data and labels. Dataset class returns a pair of [input, label] every time it is called.
!wget https://download.pytorch.org/tutorial/hymenoptera_data.zip data_dir='/content/hymenoptera_data/' data_transforms = transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) image_dataset = datasets.ImageFolder(os.path.join(data_dir, 'train'),transform=data_transforms) dataloader = torch.utils.data.DataLoader(image_dataset, batch_size=4,shuffle=True, num_workers=2)
To access an individual batch from the DataLoader, we first pass the DataLoader object to Python’s iter() built-in function, which returns an object representing a stream of data.
Iterator object allows you to traverse through all the elements of a DataLoader, regardless of its specific implementation. An iterator is an object representing a stream of data. You can create an iterator object by applying the iter() built-in function to an iterable.
With the stream of data, we can use Python built-in next() function to get the next data element in the stream of data. From this, we are expecting to get a batch of samples.
inputs, classes = next(iterator) print(len(inputs)) #4
We can get the next element in a sequence without keeping the entire dataset in memory.
You can use an iterator to manually loop over the iterable it or repeated passing of an iterator to the built-in function next() returns successive items in the stream. Once, when you consumed an item from an iterator, it’s gone. When no more data are available a StopIteration exception is raised.