How to Compute Pairwise Distances Quickly in Python?

Estimated read time 2 min read

To compute pairwise distances quickly in Python, you can use the scipy.spatial.distance.pdist function. This function efficiently computes pairwise distances between observations in a dataset.

Here’s an example:

from scipy.spatial.distance import pdist

# Sample data
data = [[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]]

# Compute pairwise distances
distances = pdist(data)

print(distances)

In this example, we have a sample dataset represented by the data list. The pdist function is called with the data as the input. It computes the pairwise distances between all pairs of observations in the dataset. The resulting distances are returned as a condensed distance matrix.

The pdist function supports various distance metrics, such as Euclidean distance, Manhattan distance, and cosine distance. By default, it computes the Euclidean distance. You can specify the distance metric by passing the metric parameter to the pdist function.

The pdist function efficiently computes pairwise distances by utilizing optimized algorithms and data structures. It is particularly useful for large datasets where computing all pairwise distances explicitly would be computationally expensive and memory-consuming.

Note that the pdist function returns the distances in a condensed form, which is a one-dimensional array representing the upper triangular part of the pairwise distance matrix. If you need the pairwise distance matrix in a square form, you can use the scipy.spatial.distance.squareform function to convert the condensed distances to a square matrix.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply