SciPy Sparse Matrix
A sparse matrix (English: sparse matrix) refers to a matrix in numerical analysis where the majority of the values are zero. Conversely, if most elements are non-zero, the matrix is considered dense.
Large sparse matrices frequently appear when solving linear models in science and engineering.
In the image above, the matrix on the left is sparse, containing many zero elements, while the matrix on the right is dense, with most elements being non-zero.
Consider a simple example:
The above sparse matrix contains only 9 non-zero elements and 26 zero elements. Its sparsity is 74%, and its density is 26%.
SciPy's scipy.sparse
module provides functions for handling sparse matrices.
We primarily use the following two types of sparse matrices:
- CSC - Compressed Sparse Column, compressed by column.
- CSR - Compressed Sparse Row, compressed by row.
In this section, we mainly use CSR matrices.
CSR Matrix
We can create a CSR matrix by passing an array to the scipy.sparse.csr_matrix()
function.
Example
Creating a CSR matrix.
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([0, 0, 0, 0, 0, 1, 1, 0, 2])
print(csr_matrix(arr))
The above code outputs:
(0, 5) 1
(0, 6) 1
(0, 8) 2
Result Analysis:
- First line: There is a value of 1 at the sixth position (index 5) in the first row (index 0).
- Second line: There is a value of 1 at the seventh position (index 6) in the first row (index 0).
- Third line: There is a value of 2 at the ninth position (index 8) in the first row (index 0).
CSR Matrix Methods
We can use the data
attribute to view the stored data (excluding zero elements):
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
print(csr_matrix(arr).data)
The above code outputs:
[1 1 2]
Use the count_nonzero()
method to count the total number of non-zero elements:
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
print(csr_matrix(arr).count_nonzero())
The above code outputs:
3
Use the eliminate_zeros()
method to remove zero elements from the matrix:
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
mat = csr_matrix(arr)
mat.eliminate_zeros()
print(mat)
The above code outputs:
(1, 2) 1
(2, 0) 1
(2, 2) 2
Use the sum_duplicates()
method to remove duplicate entries:
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
mat = csr_matrix(arr)
mat.sum_duplicates()
print(mat)
The above code outputs:
(1, 2) 1
(2, 0) 1
(2, 2) 2
Convert CSR to CSC using the tocsc()
method:
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
newarr = csr_matrix(arr).tocsc()
print(newarr)
The above code outputs:
(2, 0) 1
(1, 2) 1
(2, 2) 2