The Wide & Deep neural network architecture was introduced in a 2016 paper by Heng-Tze Cheng et al.16. It is a nonsequential neural network, it connects all or part of the inputs directly to the output layer:

Wide & Deep model in Keras

This architecture makes it possible for the neural network to learn both deep patterns using the deep path and simple rules through the short path. In contrast, regular MLP forces all the data to flow through the entire stack of layers. These simple patterns in the data may end up being distorted by this sequence of transformations.


Let’s use the California housing problem and tackle it using a regression neural network. For simplicity, we will use Scikit-Learn’s fetch_california_housing() function to load the data. This dataset is simpler since it contains only numerical features (there is no ocean_proximity feature) and no missing value. 

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import tensorflow as tf

housing = fetch_california_housing()

After loading the data, we split it into a training set, a validation set, and a test set, and we scale all the features:

X_train_full, X_test, y_train_full, y_test = train_test_split(,
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full)

Create Model

Let’s build such a neural network to tackle the California housing problem. First, we need to create an Input object. This is a specification of the kind of input the model will get, including its shape and dtype. A model may actually have multiple inputs. Next, we create a two Dense layer with 30 neurons, using the ReLU activation function. The second hidden layer takes the output of the first hidden layer.

input_ = tf.keras.layers.Input(shape=X_train.shape[1:])
hidden1 = tf.keras.layers.Dense(30, activation="relu")(input_)
hidden2 = tf.keras.layers.Dense(30, activation="relu")(hidden1)

Concatenate Layer

Next, we create a concatenate layer, and once again we immediately use it like a function, to concatenate the input and the output of the second hidden layer. You can use the tf.keras.layers.concatenate() function, which creates a concatenate layer and immediately calls it with the given inputs.

concat = tf.keras.layers.Concatenate()([input_, hidden2])
output = tf.keras.layers.Dense(1)(concat)
model = tf.keras.Model(inputs=[input_], outputs=[output])

The output layer has a single neuron since we only want to predict a single value and uses no activation function, and the loss function is the mean squared error. Since the dataset is quite noisy, we just use two hidden layers with fewer neurons than before, to avoid overfitting.

Train Model

Once you have built the Keras model, you must compile the model, train it, evaluate it, and use it to make predictions.

model.compile(loss="mean_squared_error", optimizer="sgd")
history =, y_train, epochs=20,
validation_data=(X_valid, y_valid))

Multiple inputs using Concatenate Layer

What If you want to send a subset of the features through the wide path and a different subset possibly overlapping through the deep path.

Handling multiple inputs in Keras

In this case, one solution is to use multiple inputs. For example, suppose we want to send five features through the wide path (features 0 to 4), and six features through the deep path (features 2 to 7):

input_A = tf.keras.layers.Input(shape=[5], name="wide_input")
input_B = tf.keras.layers.Input(shape=[6], name="deep_input")
hidden1 = tf.keras.layers.Dense(30, activation="relu")(input_B)
hidden2 = tf.keras.layers.Dense(30, activation="relu")(hidden1)
concat = tf.keras.layers.concatenate([input_A, hidden2])
output = tf.keras.layers.Dense(1, name="output")(concat)
model = tf.keras.Model(inputs=[input_A, input_B], outputs=[output])

Train Multiple inputs Model

Now we can compile the model as usual, but when we call the fit() method, instead of passing a single input matrix X_train, we must pass a pair of matrices (X_train_A, X_train_B): one per input. The same is true for X_valid, and also for X_test and X_new when you call evaluate() or predict():

model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-3))

X_train_A, X_train_B = X_train[:, :5], X_train[:, 2:]
X_valid_A, X_valid_B = X_valid[:, :5], X_valid[:, 2:]
X_test_A, X_test_B = X_test[:, :5], X_test[:, 2:]
X_new_A, X_new_B = X_test_A[:3], X_test_B[:3]

history =, X_train_B), y_train, epochs=20,
validation_data=((X_valid_A, X_valid_B), y_valid))

mse_test = model.evaluate((X_test_A, X_test_B), y_test)
y_pred = model.predict((X_new_A, X_new_B))

Related Post