In this post you will how to handle a variety of features, and then train and evaluate different types of models. We do that on a data set of cars.

Launching TensorBoard from Python

To run TensorBoard, use the following code.logdir points to the directory where the FileWriter serialized its data.Once TensorBoard is running, navigate your web browser to localhost:6006 to view the TensorBoard.

import tempfile
import subprocess

log_dir = tempfile.mkdtemp()
print("tensorbord-dir", log_dir)
subprocess.Popen(['pkill', '-f', 'tensorboard'])
subprocess.Popen(['tensorboard', '--logdir', log_dir])

Data Set


First thing to do Download dataset. We’re using pandas to read the CSV file.

import numpy as np
import pandas as pd
import tensorflow as tf

The CSV file does not have a header, so we have to fill in column names. We also have to specify dtypes.

names = [
    'symboling',
    'normalized-losses',
    'make',
    'fuel-type',
    'aspiration',
    'num-of-doors',
    'body-style',
    'drive-wheels',
    'engine-location',
    'wheel-base',
    'length',
    'width',
    'height',
    'curb-weight',
    'engine-type',
    'num-of-cylinders',
    'engine-size',
    'fuel-system',
    'bore',
    'stroke',
    'compression-ratio',
    'horsepower',
    'peak-rpm',
    'city-mpg',
    'highway-mpg',
    'price',
]

dtypes = {
    'symboling': np.int32,
    'normalized-losses': np.float32,
    'make': str,
    'fuel-type': str,
    'aspiration': str,
    'num-of-doors': str,
    'body-style': str,
    'drive-wheels': str,
    'engine-location': str,
    'wheel-base': np.float32,
    'length': np.float32,
    'width': np.float32,
    'height': np.float32,
    'curb-weight': np.float32,
    'engine-type': str,
    'num-of-cylinders': str,
    'engine-size': np.float32,
    'fuel-system': str,
    'bore': np.float32,
    'stroke': np.float32,
    'compression-ratio': np.float32,
    'horsepower': np.float32,
    'peak-rpm': np.float32,
    'city-mpg': np.float32,
    'highway-mpg': np.float32,
    'price': np.float32,
}


def raw_dataframe():
    df = pd.read_csv('imports-85.data', names=names, dtype=dtypes, na_values="?")
    return df

def load_data(y_name="price", train_fraction=0.7, seed=None):
    # Load the raw data columns.
    data = raw_dataframe()

    # Delete rows with unknowns
    data = data.dropna()

    # Shuffle the data
    np.random.seed(seed)

    # Split the data into train/test subsets.
    x_train = data.sample(frac=train_fraction, random_state=seed)
    x_test = data.drop(x_train.index)

    # Extract the label from the features DataFrame.
    y_train = x_train.pop(y_name)
    y_test = x_test.pop(y_name)

    return (x_train, y_train), (x_test, y_test)

The training set contains the examples that we’ll use to train the model; the test set contains the examples that we’ll use to evaluate the trained model’s effectiveness.

The training set and test set started out as a single data set. Then, we split the examples, with the majority going into the training set and the remainder going into the test set.

Adding examples to the training set usually builds a better model; however, adding more examples to the test set enables us to better gauge the model’s effectiveness.

Regardless of the split, the examples in the test set must be separate from the examples in the training set. Otherwise, you can’t accurately determine the model’s effectiveness.

Feature Columns


Feature columns enabling you to transform raw data into formats that Estimators can use, allowing easy experimentation.

The price predictor calls the tf.feature_column.numeric_column function for numeric input features:

feature_columns = [
        ....
        tf.feature_column.indicator_column(num_of_cylinders),
        tf.feature_column.indicator_column(fuel_system),
        
        tf.feature_column.numeric_column('symboling'),
        ...]

Categorical column

We cannot input strings directly to a model. Instead, we must first map strings to numeric or categorical values. Categorical vocabulary columns provide a good way to represent strings as a one-hot vector.

feature_columns[
    tf.feature_column.categorical_column_with_vocabulary_list('fuel-type', vocabulary_list=['diesel', 'gas'])
    tf.feature_column.categorical_column_with_vocabulary_list('aspiration', vocabulary_list=['std', 'turbo'])
    tf.feature_column.categorical_column_with_vocabulary_list('num-of-doors', vocabulary_list=['two', 'four'])
    tf.feature_column.categorical_column_with_vocabulary_list('body-style', vocabulary_list=['hardtop', 'wagon', 'sedan', 'hatchback', 'convertible'])
    ....
]

Hashed Column

The number of categories can be so big that it’s not possible to have individual categories for each vocabulary word or integer because that would consume too much memory. For these cases, we can instead turn the question around and ask, “How many categories am I willing to have for my input?” In fact, the tf.feature_column.categorical_column_with_hash_bucket function enables you to specify the number of categories.

make = tf.feature_column.categorical_column_with_hash_bucket('make', 50)

Create Input Functions


TensorFlow has off-the-shelf input pipeline for many formats.Particularly, here in this example, I’m using input from pandas.So I’m going to read input from a pandas data frame.

(train_x, train_y), (test_x, test_y) = automobile_data.load_data()

train_y /= args.price_norm_factor
test_y /= args.price_norm_factor

# Make input function for training: 
#   num_epochs=None -> will cycle through input data forever
#   shuffle=True -> randomize order of input dat
training_input_fn = tf.estimator.inputs.pandas_input_fn(x=train_x, y=train_y, batch_size=64,
                                                            shuffle=True, num_epochs=None)

# Make input function for evaluation:
#shuffle=False -> do not randomize input data
eval_input_fn = tf.estimator.inputs.pandas_input_fn(x=test_x, y=test_y, batch_size=64, shuffle=False)

What I’m telling it here is I want to use the batches of 64. So each iteration of the algorithm will use 64 input data pieces.

I’m going to shuffle the input, which always a good thing to do when you’re training.Please always shuffle the input and num_epochs=None means to cycle through the data indefinitely.If you’re done with the data, just do it again.

Instantiate an Estimator


We specify what kind of machine learning algorithm we want to apply to prediction Car price and in my case here, I’m going to use first a linear regression, which is kind of the simplest way to learn something and all I have to do is tell him, hey,look,you’re going to use these input features that I’ve just declared.

model = tf.estimator.LinearRegressor(feature_columns=automobile_data.features_columns(), model_dir=log_dir)

Train, Evaluate, and Predict


Now that we have an Estimator object, we can call methods to do the following:

  • Train the model.
  • Evaluate the trained model.
  • Use the trained model to make predictions.

Train the model

Train the model by calling the Estimator’s train method as follows:

 model.train(input_fn=training_input_fn, steps=args.train_steps)

The steps argument tells the method to stop training after a number of training steps.

Evaluate the trained model

Now that the model has been trained, we can get some statistics on its performance. The following code block evaluates the accuracy of the trained model on the test data:

# Evaluate how the model performs on data it has not yet seen.
eval_result = model.evaluate(input_fn=eval_input_fn)

# The evaluation returns a Python dictionary. The "average_loss" key holds the
# Mean Squared Error (MSE).
average_loss = eval_result["average_loss"]

# Convert MSE to Root Mean Square Error (RMSE).
print("\n" + 80 * "*")
print("\nRMS error for the test set: ${:.0f}".format(args.price_norm_factor * average_loss ** 0.5))

Unlike our call to the train method, we did not pass the steps argument to evaluate. Our eval_input_fn only yields a single epoch of data.

Making predictions (inferring) from the trained model

We now have a trained model that produces good evaluation results. We can now use the trained model to predict the price of a car flower based on some unlabeled measurements. As with training and evaluation, we make predictions using a single function call:

df = test_x[:1]
predict_input_fn = tf.estimator.inputs.pandas_input_fn(x=df, shuffle=False)

predict_results = model.predict(input_fn=predict_input_fn)

# Print the prediction results.
print("\nPrediction results:")
for i, prediction in enumerate(predict_results):
    print(args.price_norm_factor * prediction['predictions'])

Deep Neural Network


model = tf.estimator.DNNRegressor(hidden_units=[50,30,10], feature_columns=automobile_data.features_columns(),
                                      model_dir=log_dir)

We have to obviously change the name of the class that we’re using.Then we’ll also have to adapt the inputs to something that this new model can use.

DNN model can’t use these categorical features directly.we have to do something to it and the two things that you can do to a categorical feature, typically, to make it work with a deep neural network is you either embed it or you transform it into what’s called a one-hot or an indicator.So we do this by simply saying, hey, make me an embedding, and out of the cylinders, make it an indicator column because there are not so many values there.Usually, this is fairly complicated stuff, and you have to write a lot of code.

# Use the same categorical columns as in `linear_regression_categorical`
  body_style_vocab = ["hardtop", "wagon", "sedan", "hatchback", "convertible"]
  body_style_column = tf.feature_column.categorical_column_with_vocabulary_list(
      key="body-style", vocabulary_list=body_style_vocab)
  make_column = tf.feature_column.categorical_column_with_hash_bucket(
      key="make", hash_bucket_size=50)

  feature_columns = [
      ...
      # Since this is a DNN model, categorical columns must be converted from
      # sparse to dense.
      # Wrap them in an `indicator_column` to create a
      # one-hot vector from the input.
      tf.feature_column.indicator_column(body_style_column),
      # Or use an `embedding_column` to create a trainable vector for each
      # index.
      tf.feature_column.embedding_column(make_column, dimension=3),
  ]

Most of these more complicated models have hyperparameters and in this case, the DNN, basically we tell it, hey, make me a three-layer neural network with layer size 50,30, and 10 and that’s all really you need to–this is a very high-level interface.

Conclusion

TensorFlow, implementations of complete machine learning models.You can get started with them extremely quickly.They come with all of the integrations, with TensorBord, visualization for serving and production, for different hardware, different use cases.

Download this project from GitHub

Related Post

Feeding your own data set into the CNN model in TensorFlow
Convert a directory of images to TFRecords
Convolutional Neural Network using TensorFlow High Level API
Importance of Batch Normalization in TensorFlow