feed-dict is the slowest way to feed data into the TensorFlow model. The TensorFlow DataSet API is the best way to feed data into your models. It also ensures that the GPU has never to wait for new data to come in.

The Dataset is a high-level TensorFlow API that makes a more streamlined and efficient way of creating data input pipelines. Reading the data from CSV or text files or Numpy array and transforming it, shuffling it batch it. It’s all automatically optimized and paralleled to provide efficient consumption of data.

In this tutorial, we are going to see how we can create an input pipeline from a CSV file.

The CSV file is a popular format for storing tabular data. The Dataset API provides a class to extract records from one or more CSV files. Given one or more filenames and a list of defaults, a CsvDataset will produce a tuple of elements whose types correspond to the types of the defaults provided, per CSV record.

import tensorflow as tf

csv_path = "/home/manu/PycharmProjects/dataset/data/sales.csv"

def sales_map(name, platform, year):
    return {'Name': name, "Platform": platform, "Year": year}

sales_dataset = tf.data.experimental.CsvDataset(
    filenames=csv_path,
    record_defaults=[tf.string, tf.string, tf.constant([1900], dtype=tf.int32)],
    select_cols=[1, 2, 3],
    field_delim=",",
    header=True)

sales_dataset = sales_dataset.filter(lambda name, platform, year: year >= 1990)
sales_dataset = sales_dataset.map(map_func=sales_map)
sales_dataset = sales_dataset.batch(1)

for data in sales_dataset:
    tf.print(data)  # Print : {'Name': [Wii Sports], 'Platform': [Wii], 'Year': [2006]}
    break
  • Default Values: A list of default values for One per column of CSV data. Each item in the list is either a valid CSV DType or a Tensor object with one of the types.
  • Select Columns:A sorted list of column indices to select from the input data. If specified, only this subset of columns will be parsed. Defaults to parsing all columns.