To compute pairwise distances quickly in Python, you can use the scipy.spatial.distance.pdist
function. This function efficiently computes pairwise distances between observations in a dataset.
Here’s an example:
from scipy.spatial.distance import pdist
# Sample data
data = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
# Compute pairwise distances
distances = pdist(data)
print(distances)
In this example, we have a sample dataset represented by the data
list. The pdist
function is called with the data
as the input. It computes the pairwise distances between all pairs of observations in the dataset. The resulting distances are returned as a condensed distance matrix.
The pdist
function supports various distance metrics, such as Euclidean distance, Manhattan distance, and cosine distance. By default, it computes the Euclidean distance. You can specify the distance metric by passing the metric
parameter to the pdist
function.
The pdist
function efficiently computes pairwise distances by utilizing optimized algorithms and data structures. It is particularly useful for large datasets where computing all pairwise distances explicitly would be computationally expensive and memory-consuming.
Note that the pdist
function returns the distances in a condensed form, which is a one-dimensional array representing the upper triangular part of the pairwise distance matrix. If you need the pairwise distance matrix in a square form, you can use the scipy.spatial.distance.squareform
function to convert the condensed distances to a square matrix.
+ There are no comments
Add yours