To train a Machin Learning model you need a huge amount of data. Most of the samples in the dataset are zeros. For example, imagine a matrix where the columns are every movie, the rows are every user, and the values are how many times a user has watched that particular movie. 

This matrix would have thousands of columns and millions of rows. However, since most users do not watch most movies, most of the elements would be zero

In a sparse matrix, most elements are 0. SciPy sparse matrices store only nonzero elements and assume all other values will be zero, leading to significant computational savings.

Sparse matrices are memory-efficient ways to represent data composed of mostly 0s. In this tutorial, we used SciPy to create a sparse matrix that was no longer a NumPy array.

Let’s first create a simple NumPy array with very few nonzero values.

import numpy as np
from scipy import sparse

#numpy matrix
matrix = np.array([[5, 0],
                   [0, 0],
                   [0, 8]])

# compressed sparse row (CSR) matrix
matrix_sparse = sparse.csr_matrix(matrix)

We created a NumPy array with two nonzero values, then converted it into compressed sparse row (CSR) matrix that was no longer a NumPy array. If we view the sparse matrix we can see that only the nonzero values are stored:

SciPy Sparse matrix

compressed sparse row (CSR) matrices, (0, 0) and (2, 1) represent the (zero-indexed) indices of the nonzero values 5 and 8, respectively. 

We can see the advantage of sparse matrices if we create a much larger matrix with many more zero elements and then compare this larger matrix with our original sparse matrix:

# Larger numpy matrix
matrix2 = np.array([[0, 0, 0, 0, 0, 0, 8, 0, 0, 0],
                    [0, 0, 0, 0, 6, 0, 0, 0, 0, 0],
                    [4, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

matrix_sparse2 = sparse.csr_matrix(matrix2)
Sparse matrix from numpy

As we can see, despite the fact that we added many more zero elements in the larger matrix, its sparse representation is exactly the same as our original sparse matrix. That is, the addition of zero elements did not change the size of the sparse matrix.

There are many different types of sparse matrices, such as compressed sparse columns, lists of lists, and dictionaries of keys. An explanation of the different types and their implications is outside the scope of this tutorial.

Related Post