SciPy Spatial Data
Spatial data, also known as geometric data, is used to represent information about an object's location, shape, size, and distribution, such as points on a coordinate system.
SciPy processes spatial data through the scipy.spatial
module, which can determine if a point is within a boundary, find the nearest point to a given point, and locate all points within a specified distance.
Triangulation
Triangulation in trigonometry and geometry is a method of determining a distance to a point by measuring angles from known reference points on a fixed baseline.
Polygon triangulation involves dividing a polygon into triangles, which can then be used to calculate the area of the polygon.
A well-known fact in topology states that any surface can be triangulated.
Suppose a surface has a triangulation. If we denote the total number of vertices of all triangles as ( p ) (with shared vertices counted only once), the number of edges as ( a ), and the number of triangles as ( n ), then ( e = p - a + n ) is a topological invariant of the surface. This means ( e ) always yields the same value regardless of the triangulation. ( e ) is known as the Euler characteristic.
The Delaunay() triangulation is a method for triangulating a set of points.
Example
Creating triangles from given points:
import numpy as np
from scipy.spatial import Delaunay
import matplotlib.pyplot as plt
points = np.array([
[2, 4],
[3, 4],
[3, 0],
[2, 2],
[4, 1]
])
simplices = Delaunay(points).simplices # Indices of vertices in triangles
plt.triplot(points[:, 0], points[:, 1], simplices)
plt.scatter(points[:, 0], points[:, 1], color='r')
plt.show()
Note: The IDs of the triangle vertices are stored in the simplices
attribute of the triangulation object.
Convex Hull
The convex hull (Convex Hull) is a concept in computational geometry.
In a real vector space ( V ), for a given set ( X ), the intersection ( S ) of all convex sets containing ( X ) is called the convex hull of ( X ). The convex hull of ( X ) can be constructed from the convex combinations of all points in ( X ).
We can use the ConvexHull() method to create a convex hull.
Example
Creating a convex hull from given points:
import numpy as np
from scipy.spatial import ConvexHull
import matplotlib.pyplot as plt
points = np.array([
[2, 4],
[3, 4],
[3, 0],
[2, 2],
[4, 1],
[1, 2],
[5, 0],
[3, 1],
[1, 2],
[0, 2]
])
hull = ConvexHull(points)
hull_points = hull.simplices
plt.scatter(points[:,0], points[:,1])
for simplex in hull_points:
plt.plot(points[simplex,0], points[simplex,1], 'k-')
plt.show()
K-D Tree
A kd-tree (short for k-dimensional tree) is a tree data structure used for organizing points in a k-dimensional space, particularly for efficient searches such as range searches and nearest neighbor searches.
The KDTree() method returns a KDTree object.
The query() method returns the nearest distance and position.
Example
Finding the nearest distance to (1,1):
from scipy.spatial import KDTree
points = [(1, -1), (2, 3), (-2, 3), (2, -3)]
kdtree = KDTree(points)
res = kdtree.query((1, 1))
print(res)
Output:
(2.0, 0)
Distance Matrix
In mathematics, a distance matrix is a matrix (two-dimensional array) containing the distances between points. Given ( N ) points in Euclidean space, the distance matrix is an ( N \times N ) symmetric matrix with non-negative real numbers as elements. Distance matrices are similar to adjacency matrices but include the distance information between connected points.
For example, consider the following 2D points ( a ) to ( f ), with Euclidean distances between them:
The distance matrix for these points can be visualized as a heatmap, where black represents zero distance and white represents the maximum distance.
In bioinformatics, distance matrices are used to represent protein structures independent of a coordinate system and the distances between sequences in sequence space. These representations are used in structural alignment, sequence alignment, and determining protein structures through NMR, X-ray, and crystallography.
Euclidean Distance
In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" straight-line distance between two points in Euclidean space. Using this distance, Euclidean space becomes a metric space. The associated norm is called the Euclidean norm. Older literature refers to it as the Pythagorean metric.
The Euclidean metric (also known as the Euclidean distance) is a commonly used distance definition, representing the actual distance between two points in m-dimensional space or the natural length of a vector (the distance from the point to the origin). In two and three-dimensional spaces, the Euclidean distance is the actual distance between two points.
Example
Calculating the Euclidean distance between two points:
from scipy.spatial.distance import euclidean
p1 = (1, 0)
p2 = (10, 2)
res = euclidean(p1, p2)
print(res)
Output:
9.21954445729
Manhattan Distance
The Manhattan distance, also known as taxicab geometry or rectilinear distance, is a metric in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates.
The Manhattan distance can only move in four directions: up, down, left, and right, and is the shortest distance between two points in this context.
Manhattan distance versus Euclidean distance: Red, blue, and yellow lines represent Manhattan distances of 12, while the green line represents an Euclidean distance of ( 6 \times \sqrt{2} \approx 8.48 ).
Example
Calculating the Manhattan distance between two points:
from scipy.spatial.distance import cityblock
p1 = (1, 0)
p2 = (10, 2)
res = cityblock(p1, p2)
print(res)
Output:
11
Cosine Distance
Cosine distance, also known as cosine similarity, measures the similarity between two vectors by calculating the cosine of the angle between them.
The cosine of a 0-degree angle is 1, and the cosine of any other angle is not greater than 1, with a minimum value of -1.
Example
Calculating the cosine distance between two points:
from scipy.spatial.distance import cosine
p1 = (1, 0)
p2 = (10, 2)
res = cosine(p1, p2)
print(res)
Output:
0.019419324309079777
Hamming Distance
In information theory, the Hamming distance between two equal-length strings is the number of positions at which the corresponding symbols are different. In other words, it is the minimum number of substitutions required to change one string into the other.
The Hamming weight is the Hamming distance of a string to the zero string of the same length; for a binary string, it is the number of 1s in the string. For example, the Hamming weight of "11101" is 4.
Example
Calculating the Hamming distance between two points:
from scipy.spatial.distance import hamming
p1 = (True, False, True)
p2 = (False, True, True)
res = hamming(p1, p2)
print(res)
Output:
0.666666666667