[python] Shared-memory objects in multiprocessing

This is the intended use case for Ray, which is a library for parallel and distributed Python. Under the hood, it serializes objects using the Apache Arrow data layout (which is a zero-copy format) and stores them in a shared-memory object store so they can be accessed by multiple processes without creating copies.

The code would look like the following.

import numpy as np
import ray

ray.init()

@ray.remote
def func(array, param):
    # Do stuff.
    return 1

array = np.ones(10**6)
# Store the array in the shared memory object store once
# so it is not copied multiple times.
array_id = ray.put(array)

result_ids = [func.remote(array_id, i) for i in range(4)]
output = ray.get(result_ids)

If you don't call ray.put then the array will still be stored in shared memory, but that will be done once per invocation of func, which is not what you want.

Note that this will work not only for arrays but also for objects that contain arrays, e.g., dictionaries mapping ints to arrays as below.

You can compare the performance of serialization in Ray versus pickle by running the following in IPython.

import numpy as np
import pickle
import ray

ray.init()

x = {i: np.ones(10**7) for i in range(20)}

# Time Ray.
%time x_id = ray.put(x)  # 2.4s
%time new_x = ray.get(x_id)  # 0.00073s

# Time pickle.
%time serialized = pickle.dumps(x)  # 2.6s
%time deserialized = pickle.loads(serialized)  # 1.9s

Serialization with Ray is only slightly faster than pickle, but deserialization is 1000x faster because of the use of shared memory (this number will of course depend on the object).

See the Ray documentation. You can read more about fast serialization using Ray and Arrow. Note I'm one of the Ray developers.

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to numpy

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index

Examples related to parallel-processing

Custom thread pool in Java 8 parallel stream How to do parallel programming in Python? Should I always use a parallel stream when possible? How to create threads in nodejs Shared-memory objects in multiprocessing How do I parallelize a simple Python loop? No ConcurrentList<T> in .Net 4.0? What are the differences between stateless and stateful systems, and how do they impact parallelism? How to abort a Task like aborting a Thread (Thread.Abort method)? Can Powershell Run Commands in Parallel?

Examples related to multiprocessing

Passing multiple parameters to pool.map() function in Python Dead simple example of using Multiprocessing Queue, Pool and Locking Using multiprocessing.Process with a maximum number of simultaneous processes Multiprocessing a for loop? RuntimeError on windows trying python multiprocessing How to use multiprocessing queue in Python? Shared-memory objects in multiprocessing Python multiprocessing PicklingError: Can't pickle <type 'function'> multiprocessing.Pool: When to use apply, apply_async or map? How to troubleshoot an "AttributeError: __exit__" in multiproccesing in Python?

Examples related to shared-memory

Shared-memory objects in multiprocessing How to use shared memory with Linux in C Delete all SYSTEM V shared memory and semaphores on UNIX-like systems