Most efficient way to map function over numpy array

Question

What is the most efficient way to map a function over a numpy array  The way I ve been doing it in my current project is as follows   import numpy as np   x   np array  1  2  3  4  5      Obtain array of square of each element in x squarer   lambda t  t    2 squares   np array  squarer xi  for xi in x     However  this seems like it is probably very inefficient  since I am using a list comprehension to construct the new array as a Python list before converting it back to a numpy array   Can we do better

User · Answer

All above answers compares well  but if you need to use custom function for mapping  and you have numpy ndarray  and you need to retain the shape of array   I have compare just two  but it will retain the shape of ndarray  I have used the array with 1 million entries for comparison  Here I use square function  which is also inbuilt in numpy and has great performance boost  since there as was need of something  you can use function of your choice   import numpy  time def timeit        y   numpy arange 1000000      now   time time       numpy array  x   x for x in y reshape -1    reshape y shape              print time time   - now      now   time time       numpy fromiter  x   x for x in y reshape -1    y dtype  reshape y shape      print time time   - now      now   time time       numpy square y        print time time   - now    Output   gt  gt  gt  timeit   1 162431240081787      list comprehension and then building numpy array 1 0775556564331055     from numpy fromiter 0 002948284149169922   using inbuilt function   here you can clearly see numpy fromiter works great considering to simple approach  and if inbuilt function is available please use that

User · Answer

How about using numpy vectorize   import numpy as np x   np array  1  2  3  4  5   squarer   lambda t  t    2 vfunc   np vectorize squarer  vfunc x    Output   array   1   4   9  16  25

User · Answer

There are numexpr  numba and cython around  the goal of this answer is to take these possibilities into consideration  But first let s state the obvious  no matter how you map a Python-function onto a numpy-array  it stays a Python function  that means for every evaluation   numpy-array element must be converted to a Python-object  e g  a Float   all calculations are done with Python-objects  which means to have the overhead of interpreter  dynamic dispatch and immutable objects   So which machinery is used to actually loop through the array doesn t play a big role because of the overhead mentioned above - it stays much slower than using numpy s built-in functionality  Let s take a look at the following example    numpy-functionality def f x       return x 2 x x 4 x x x    python-function as ufunc import numpy as np vf np vectorize f  vf   name    quot vf quot   np vectorize is picked as a representative of the pure-python function class of approaches  Using perfplot  see code in the appendix of this answer  we get the following running times   We can see  that the numpy-approach is 10x-100x faster than the pure python version  The decrease of performance for bigger array-sizes is probably because data no longer fits the cache  It is worth also mentioning  that vectorize also uses a lot of memory  so often memory-usage is the bottle-neck  see related SO-question   Also note  that numpy s documentation on np vectorize states that it is  quot provided primarily for convenience  not for performance quot   Other tools should be used  when performance is desired  beside writing a C-extension from the scratch  there are following possibilities   One often hears  that the numpy-performance is as good as it gets  because it is pure C under the hood  Yet there is a lot room for improvement  The vectorized numpy-version uses a lot of additional memory and memory-accesses  Numexp-library tries to tile the numpy-arrays and thus get a better cache utilization    less cache misses than numpy-functionality import numexpr as ne def ne f x       return ne evaluate  quot x 2 x x 4 x x x quot    Leads to the following comparison   I cannot explain everything in the plot above  we can see bigger overhead for numexpr-library at the beginning  but because it utilize the cache better it is about 10 time faster for bigger arrays   Another approach is to jit-compile the function and thus getting a real pure-C UFunc  This is numba s approach    runtime generated C-function as ufunc import numba as nb  nb vectorize target  quot cpu quot   def nb vf x       return x 2 x x 4 x x x  It is 10 times faster than the original numpy-approach    However  the task is embarrassingly parallelizable  thus we also could use prange in order to calculate the loop in parallel   nb njit parallel True  def nb par jitf x       y np empty x shape      for i in nb prange len x            y i  x i  2 x i  x i  4 x i  x i  x i      return y  As expected  the parallel function is slower for smaller inputs  but faster  almost factor 2  for larger sizes    While numba specializes on optimizing operations with numpy-arrays  Cython is a more general tool  It is more complicated to extract the same performance as with numba - often it is down to llvm  numba  vs local compiler  gcc MSVC     cython -c  openmp -a import numpy as np import cython   single core   cython boundscheck False    cython wraparound False   def cy f double   1  x       y out np empty len x       cdef Py ssize t i     cdef double   1  y y out     for i in range len x            y i    x i  2 x i  x i  4 x i  x i  x i      return y out   parallel  from cython parallel import prange  cython boundscheck False    cython wraparound False    def cy par f double   1  x       y out np empty len x       cdef double   1  y y out     cdef Py ssize t i     cdef Py ssize t n   len x      for i in prange n  nogil True           y i    x i  2 x i  x i  4 x i  x i  x i      return y out  Cython results in somewhat slower functions    Conclusion Obviously  testing only for one function doesn t prove anything  Also one should keep in mind  that for the choosen function-example  the bandwidth of the memory was the bottle neck for sizes larger than 10 5 elements - thus we had the same performance for numba  numexpr and cython in this region  In the end  the ultimative answer depends on the type of function  hardware   Python-distribution and other factors  For example Anaconda-distribution uses Intel s VML for numpy s functions and thus outperforms numba  unless it uses SVML  see this SO-post  easily for transcendental functions like exp  sin  cos and similar - see e g  the following SO-post  Yet from this investigation and from my experience so far  I would state  that numba seems to be the easiest tool with best performance as long as no transcendental functions are involved   Plotting running times with perfplot-package  import perfplot perfplot show      setup lambda n  np random rand n       n range  2  k for k in range 0 24        kernels           f           vf          ne f           nb vf  nb par jitf          cy f  cy par f                 logx True      logy True      xlabel  len x

User · Answer

I believe in newer version  I use 1 13  of numpy you can simply call the function by passing the numpy array to the fuction that you wrote for scalar type  it will automatically apply the function call to each element over the numpy array and return you another numpy array   gt  gt  gt  import numpy as np  gt  gt  gt  squarer   lambda t  t    2  gt  gt  gt  x   np array  1  2  3  4  5    gt  gt  gt  squarer x  array   1   4   9  16  25

User · Answer

TL DR  As noted by  user2357112  a  direct  method of applying the function is always the fastest and simplest way to map a function over Numpy arrays   import numpy as np x   np array  1  2  3  4  5   f   lambda x  x    2 squares   f x    Generally avoid np vectorize  as it does not perform well  and has  or had  a number of issues  If you are handling other data types  you may want to investigate the other methods shown below   Comparison of methods  Here are some simple tests to compare three methods to map a function  this example using with Python 3 6 and NumPy 1 15 4  First  the set-up functions for testing   import timeit import numpy as np  f   lambda x  x    2 vf   np vectorize f   def test array x  n       t   timeit timeit           np array  f xi  for xi in x              from   main   import np  x  f   number n      print  array   0  3f   format t    def test fromiter x  n       t   timeit timeit           np fromiter  f xi  for xi in x   x dtype  count len x              from   main   import np  x  f   number n      print  fromiter   0  3f   format t    def test direct x  n       t   timeit timeit           f x             from   main   import x  f   number n      print  direct   0  3f   format t    def test vectorized x  n       t   timeit timeit           vf x             from   main   import x  vf   number n      print  vectorized   0  3f   format t     Testing with five elements  sorted from fastest to slowest    x   np array  1  2  3  4  5   n   100000 test direct x  n         0 265 test fromiter x  n       0 479 test array x  n          0 865 test vectorized x  n     2 906   With 100s of elements   x   np arange 100  n   10000 test direct x  n         0 030 test array x  n          0 501 test vectorized x  n     0 670 test fromiter x  n       0 883   And with 1000s of array elements or more   x   np arange 1000  n   1000 test direct x  n         0 007 test fromiter x  n       0 479 test array x  n          0 516 test vectorized x  n     0 945   Different versions of Python NumPy and compiler optimization will have different results  so do a similar test for your environment

User · Answer

It seems no one has mentioned a built-in factory method of producing ufunc in numpy package  np frompyfunc which I have tested again np vectorize and have outperformed it by about 20 30   Of course it will perform well as prescribed C code or even numba which I have not tested   but it can a better alternative than np vectorize  f   lambda x  y  x   y f arr   np frompyfunc f  2  1  vf   np vectorize f  arr   np linspace 0  1  10000    timeit f arr arr  arr    307ms  timeit vf arr  arr    450ms   I have also tested larger samples  and the improvement is proportional  See the documentation also here

User · Answer

As mentioned in this post  just use generator expressions like so   numpy fromiter   lt some func gt  x  for x in  lt something gt    lt dtype gt   lt size of something gt

User · Answer

Use numpy fromfunction function  shape    kwargs   See  https   docs scipy org doc numpy reference generated numpy fromfunction html

User · Answer

Edit  the original answer was misleading  np sqrt was applied directly to the array  just with a small overhead  In multidimensional cases where you want to apply a builtin function that operates on a 1d array  numpy apply along axis is a good choice  also for more complex function compositions from numpy and scipy  Previous misleading statement  Adding the method  def along axis x       return np apply along axis f  0  x   to the perfplot code gives performance results close to np sqrt

User · Answer

squares   squarer x    Arithmetic operations on arrays are automatically applied elementwise  with efficient C-level loops that avoid all the interpreter overhead that would apply to a Python-level loop or comprehension   Most of the functions you d want to apply to a NumPy array elementwise will just work  though some may need changes  For example  if doesn t work elementwise  You d want to convert those to use constructs like numpy where   def using if x       if x  lt  5          return x     else          return x  2   becomes  def using where x       return numpy where x  lt  5  x  x  2

User · Answer

I ve tested all suggested methods plus np array map f  x   with perfplot  a small project of mine    Message  1  If you can use numpy s native functions  do that   If the function you re trying to vectorize already is vectorized  like the x  2 example in the original post   using that is much faster than anything else  note the log scale    If you actually need vectorization  it doesn t really matter much which variant you use    Code to reproduce the plots  import numpy as np import perfplot import math   def f x         return math sqrt x      return np sqrt x    vf   np vectorize f    def array for x       return np array  f xi  for xi in x     def array map x       return np array list map f  x      def fromiter x       return np fromiter  f xi  for xi in x   x dtype    def vectorize x       return np vectorize f  x    def vectorize without init x       return vf x    perfplot show      setup np random rand      n range  2    k for k in range 20        kernels  f  array for  array map  fromiter  vectorize  vectorize without init       xlabel  quot len x  quot

[python] Most efficient way to map function over numpy array

Examples related to python

Examples related to performance

Examples related to numpy