How can the Euclidean distance be calculated with NumPy

Question

I have two points in 3D   xa  ya  za   xb  yb  zb   And I want to calculate the distance  dist   sqrt  xa-xb  2    ya-yb  2    za-zb  2   What s the best way to do this with NumPy  or with Python in general  I have  import numpy a   numpy array  xa  ya  za   b   numpy array  xb  yb  zb

User · Answer

For anyone interested in computing multiple distances at once, I've done a little comparison using perfplot (a small project of mine).

The first advice is to organize your data such that the arrays have dimension (3, n) (and are C-contiguous obviously). If adding happens in the contiguous first dimension, things are faster, and it doesn't matter too much if you use sqrt-sum with axis=0, linalg.norm with axis=0, or

a_min_b = a - b
numpy.sqrt(numpy.einsum('ij,ij->j', a_min_b, a_min_b))

which is, by a slight margin, the fastest variant. (That actually holds true for just one row as well.)

The variants where you sum up over the second axis, axis=1, are all substantially slower.

Code to reproduce the plot:

import numpy
import perfplot
from scipy.spatial import distance


def linalg_norm(data):
    a, b = data[0]
    return numpy.linalg.norm(a - b, axis=1)


def linalg_norm_T(data):
    a, b = data[1]
    return numpy.linalg.norm(a - b, axis=0)


def sqrt_sum(data):
    a, b = data[0]
    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=1))


def sqrt_sum_T(data):
    a, b = data[1]
    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=0))


def scipy_distance(data):
    a, b = data[0]
    return list(map(distance.euclidean, a, b))


def sqrt_einsum(data):
    a, b = data[0]
    a_min_b = a - b
    return numpy.sqrt(numpy.einsum("ij,ij->i", a_min_b, a_min_b))


def sqrt_einsum_T(data):
    a, b = data[1]
    a_min_b = a - b
    return numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))


def setup(n):
    a = numpy.random.rand(n, 3)
    b = numpy.random.rand(n, 3)
    out0 = numpy.array([a, b])
    out1 = numpy.array([a.T, b.T])
    return out0, out1


perfplot.save(
    "norm.png",
    setup=setup,
    n_range=[2 ** k for k in range(22)],
    kernels=[
        linalg_norm,
        linalg_norm_T,
        scipy_distance,
        sqrt_sum,
        sqrt_sum_T,
        sqrt_einsum,
        sqrt_einsum_T,
    ],
    xlabel="len(x), len(y)",
)

User · Answer

Starting Python 3 8  the math module directly provides the dist function  which returns the euclidean distance between two points  given as tuples or lists of coordinates    from math import dist  dist  1  2  6    -2  3  2     5 0990195135927845   And if you re working with lists   dist  1  2  6    -2  3  2     5 0990195135927845

User · Answer

With Python 3 8  it s very easy   https   docs python org 3 library math html math dist  math dist p  q       Return the Euclidean distance between two points p and q  each given   as a sequence  or iterable  of coordinates  The two points must have   the same dimension       Roughly equivalent to       sqrt sum  px - qx     2 0 for px  qx in zip p  q

User · Answer

A nice one-liner   dist   numpy linalg norm a-b      However  if speed is a concern I would recommend experimenting on your machine  I ve found that using math library s sqrt with the    operator for the square is much faster on my machine than the one-liner NumPy solution   I ran my tests using this simple program      usr bin python import math import numpy from random import uniform  def fastest calc dist p1 p2       return math sqrt  p2 0  - p1 0      2                         p2 1  - p1 1      2                         p2 2  - p1 2      2   def math calc dist p1 p2       return math sqrt math pow  p2 0  - p1 0    2                         math pow  p2 1  - p1 1    2                         math pow  p2 2  - p1 2    2    def numpy calc dist p1 p2       return numpy linalg norm numpy array p1 -numpy array p2    TOTAL LOCATIONS   1000  p1   dict   p2   dict   for i in range 0  TOTAL LOCATIONS       p1 i     uniform 0 1000  uniform 0 1000  uniform 0 1000       p2 i     uniform 0 1000  uniform 0 1000  uniform 0 1000    total dist   0 for i in range 0  TOTAL LOCATIONS       for j in range 0  TOTAL LOCATIONS           dist   fastest calc dist p1 i   p2 j    change this line for testing         total dist    dist  print total dist   On my machine  math calc dist runs much faster than numpy calc dist  1 5 seconds versus 23 5 seconds   To get a measurable difference between fastest calc dist and math calc dist I had to up TOTAL LOCATIONS to 6000  Then fastest calc dist takes  50 seconds while math calc dist takes  60 seconds   You can also experiment with numpy sqrt and numpy square though both were slower than the math alternatives on my machine   My tests were run with Python 2 6 6

User · Answer

There s a function for that in SciPy  It s called Euclidean   Example   from scipy spatial import distance a    1  2  3  b    4  5  6  dst   distance euclidean a  b

User · Answer

import numpy as np   any two python array as two points a    0  0  b    3  4    You first change list to numpy array and do like this  print np linalg norm np array a  - np array b     Second method directly from python list as  print np linalg norm np subtract a b

User · Answer

import numpy as np from scipy spatial import distance input arr   np array   0 3 0   2 0 0   0 1 3   0 1 2   -1 0 1   1 1 1     test case   np array  0 0 0   dst    for i in range 0 6       temp   distance euclidean test case input arr i       dst append temp  print dst

User · Answer

The other answers work for floating point numbers  but do not correctly compute the distance for integer dtypes which are subject to overflow and underflow  Note that even scipy distance euclidean has this issue   gt  gt  gt  a1   np array  1   dtype  uint8    gt  gt  gt  a2   np array  2   dtype  uint8    gt  gt  gt  a1 - a2 array  255   dtype uint8   gt  gt  gt  np linalg norm a1 - a2  255 0  gt  gt  gt  from scipy spatial import distance  gt  gt  gt  distance euclidean a1  a2  255 0  This is common  since many image libraries represent an image as an ndarray with dtype  quot uint8 quot   This means that if you have a greyscale image which consists of very dark grey pixels  say all the pixels have color  000001  and you re diffing it against black image   000000   you can end up with x-y consisting of 255 in all cells  which registers as the two images being very far apart from each other  For unsigned integer types  e g  uint8   you can safely compute the distance in numpy as  np linalg norm np maximum x  y  - np minimum x  y    For signed integer types  you can cast to a float first  np linalg norm x astype  quot float quot   - y astype  quot float quot     For image data specifically  you can use opencv s norm method  import cv2 cv2 norm x  y  cv2 NORM L2

User · Answer

Calculate the Euclidean distance for multidimensional space    import math   x    1  2  6    y    -2  3  2    dist   math sqrt sum   xi-yi   2 for xi yi in zip x  y      5 0990195135927845

User · Answer

I find a  dist  function in matplotlib mlab  but I don t think it s handy enough    I m posting it here just for reference   import numpy as np import matplotlib as plt  a   np array  1  2  3   b   np array  2  3  4      Distance between a and b dis   plt mlab dist a  b

User · Answer

Having a and b as you defined them  you can use also   distance   np sqrt np sum  a-b   2

User · Answer

Use numpy linalg norm  dist   numpy linalg norm a-b   You can find the theory behind this in Introduction to Data Mining This works because the Euclidean distance is the l2 norm  and the default value of the ord parameter in numpy linalg norm is 2

User · Answer

I like np dot  dot product     a   numpy array  xa ya za   b   numpy array  xb yb zb    distance    np dot a-b a-b     5

User · Answer

import math  dist   math hypot math hypot xa-xb  ya-yb   za-zb

User · Answer

Since Python 3 8  Since Python 3 8 the math module includes the function math dist    See here https   docs python org 3 8 library math html math dist      math dist p1  p2    Return the Euclidean distance between two points p1 and p2    each given as a sequence  or iterable  of coordinates    import math print  math dist   0 0      1 1         sqrt 2  - gt  1 4142 print  math dist   0 0 0    1 1 1       sqrt 3  - gt  1 7321

User · Answer

I want to expound on the simple answer with various performance notes  np linalg norm will do perhaps more than you need   dist   numpy linalg norm a-b    Firstly - this function is designed to work over a list and return all of the values  e g  to compare the distance from pA to the set of points sP   sP   set points  pA   point distances   np linalg norm sP - pA  ord 2  axis 1       distances  is a list   Remember several things    Python function calls are expensive   Regular  Python doesn t cache name lookups    So  def distance pointA  pointB       dist   np linalg norm pointA - pointB      return dist   isn t as innocent as it looks    gt  gt  gt  dis dis distance    2           0 LOAD GLOBAL              0  np                2 LOAD ATTR                1  linalg                4 LOAD ATTR                2  norm                6 LOAD FAST                0  pointA                8 LOAD FAST                1  pointB               10 BINARY SUBTRACT              12 CALL FUNCTION            1              14 STORE FAST               2  dist     3          16 LOAD FAST                2  dist               18 RETURN VALUE   Firstly - every time we call it  we have to do a global lookup for  np   a scoped lookup for  linalg  and a scoped lookup for  norm   and the overhead of merely calling the function can equate to dozens of python instructions   Lastly  we wasted two operations on to store the result and reload it for return     First pass at improvement  make the lookup faster  skip the store  def distance pointA  pointB   norm np linalg norm       return  norm pointA - pointB    We get the far more streamlined    gt  gt  gt  dis dis distance    2           0 LOAD FAST                2   norm                2 LOAD FAST                0  pointA                4 LOAD FAST                1  pointB                6 BINARY SUBTRACT               8 CALL FUNCTION            1              10 RETURN VALUE   The function call overhead still amounts to some work  though  And you ll want to do benchmarks to determine whether you might be better doing the math yourself   def distance pointA  pointB       return             pointA x - pointB x     2              pointA y - pointB y     2              pointA z - pointB z     2           0 5    fast sqrt   On some platforms    0 5 is faster than math sqrt  Your mileage may vary        Advanced performance notes   Why are you calculating distance  If the sole purpose is to display it    print  The target is   2fm away     distance a  b      move along  But if you re comparing distances  doing range checks  etc   I d like to add some useful performance observations   Let   s take two cases  sorting by distance or culling a list to items that meet a range constraint     Ultra naive implementations  Hold onto your hat   def sort things by distance origin  things       return things sort key lambda thing  distance origin  thing    def in range origin  range  things       things in range          for thing in things          if distance origin  thing   lt   range              things in range append thing    The first thing we need to remember is that we are using Pythagoras to calculate the distance  dist   sqrt x 2   y 2   z 2   so we re making a lot of sqrt calls  Math 101   dist   root   x 2   y 2   z 2      dist 2   x 2   y 2   z 2 and sq N   lt  sq M  iff M  gt  N and sq N   gt  sq M  iff N  gt  M and sq N    sq M  iff N    M   In short  until we actually require the distance in a unit of X rather than X 2  we can eliminate the hardest part of the calculations     Still naive  but much faster   def distance sq left  right           Returns the square of the distance between left and right          return             left x - right x     2              left y - right y     2              left z - right z     2         def sort things by distance origin  things       return things sort key lambda thing  distance sq origin  thing    def in range origin  range  things       things in range             Remember that sqrt N   2    N  so if we square       range  we don t need to root the distances      range sq   range  2      for thing in things          if distance sq origin  thing   lt   range sq              things in range append thing    Great  both functions no-longer do any expensive square roots  That ll be much faster  We can also improve in range by converting it to a generator   def in range origin  range  things       range sq   range  2     yield from  thing for thing in things                 if distance sq origin  thing   lt   range sq    This especially has benefits if you are doing something like   if any in range origin  max dist  things              But if the very next thing you are going to do requires a distance   for nearby in in range origin  walking distance  hotdog stands       print   s   2fm     nearby name  distance origin  nearby      consider yielding tuples   def in range with dist sq origin  range  things       range sq   range  2     for thing in things          dist sq   distance sq origin  thing          if dist sq  lt   range sq  yield  thing  dist sq    This can be especially useful if you might chain range checks   find things that are near X and within Nm of Y   since you don t have to calculate the distance again    But what about if we re searching a really large list of things and we anticipate a lot of them not being worth consideration   There is actually a very simple optimization   def in range all the things origin  range  things       range sq   range  2     for thing in things          dist sq    origin x - thing x     2         if dist sq  lt   range sq              dist sq     origin y - thing y     2             if dist sq  lt   range sq                  dist sq     origin z - thing z     2                 if dist sq  lt   range sq                      yield thing   Whether this is useful will depend on the size of  things    def in range all the things origin  range  things       range sq   range  2     if len things   gt   4096          for thing in things              dist sq    origin x - thing x     2             if dist sq  lt   range sq                  dist sq     origin y - thing y     2                 if dist sq  lt   range sq                      dist sq     origin z - thing z     2                     if dist sq  lt   range sq                          yield thing     elif len things   gt  32          for things in things              dist sq    origin x - thing x     2             if dist sq  lt   range sq                  dist sq     origin y - thing y     2    origin z - thing z     2                 if dist sq  lt   range sq                      yield thing     else              just calculate distance and range-check it       And again  consider yielding the dist sq  Our hotdog example then becomes     Chaining generators info   in range with dist sq origin  walking distance  hotdog stands  info    stand  dist sq  0 5 for stand  dist sq in info  for stand  dist in info      print   s   2fm     stand  dist

User · Answer

You can easily use the formula  distance   np sqrt np sum np square a-b      which does actually nothing more than using Pythagoras  theorem to calculate the distance  by adding the squares of  x   y and  z and rooting the result

User · Answer

Here s some concise code for Euclidean distance in Python given two points represented as lists in Python   def distance v1 v2        return sum   x-y   2 for  x y  in zip v1 v2      0 5

User · Answer

You can just subtract the vectors and then innerproduct  Following your example  a   numpy array  xa  ya  za   b   numpy array  xb  yb  zb    tmp   a - b sum squared   numpy dot tmp T  tmp  result   numpy sqrt sum squared

User · Answer

It can be done like the following  I don t know how fast it is  but it s not using NumPy   from math import sqrt a    1  2  3    Data point 1 b    4  5  6    Data point 2 print sqrt sum   a - b   2 for a  b in zip a  b

User · Answer

Find difference of two matrices first  Then  apply element wise multiplication with numpy s multiply command  After then  find summation of the element wise multiplied new matrix  Finally  find square root of the summation   def findEuclideanDistance a  b       euclidean distance   a - b     euclidean distance   np sum np multiply euclidean distance  euclidean distance       euclidean distance   np sqrt euclidean distance      return euclidean distance

User · Answer

Another instance of this problem solving method   def dist x y          return numpy sqrt numpy sum  x-y   2    a   numpy array  xa ya za   b   numpy array  xb yb zb   dist a b   dist a b

[python] How can the Euclidean distance be calculated with NumPy?

Examples related to python

Examples related to numpy

Examples related to euclidean-distance