Simple Python Parallelism

Let’s say you have a function that’s slow and time-consuming. It’s too intensive and complex to run on the GPU (with it’s thousand-ish cores) but the single core Python uses isn’t enough. Your machine has eight cores and you want to use them.

Well, how do you use them? Python’s GIL makes it difficult. Luckily, much more knowledgeable people have spend much more time getting around the GIL lock and have made many modules to tackle this problem. But looking at the docs for some of the modules, you get lost in threads, processes and coroutines (or at least I do).

Even the multiprocessing package doesn’t seem to have a good solution. Its documentation isn’t quite clear and it’s hard to see where speedups might lie. It turns out the map function provides the speedups you’re looking for. So, let’s look at the documentation:

  1. In [4]: from multiprocessing import Pool
  3. In [5]: pool = Pool()
  5. In [6]: pool.<tab>
  6. pool.Process         pool.close           pool.join            pool.terminate
  7. pool.apply           pool.imap  
  8. pool.apply_async     pool.imap_unordered  pool.map_async
  10. In [6]:
  11. Type:        instancemethod
  12. String form: <bound method of <multiprocessing.pool.Pool object at 0x103855ed0>>
  13. File:        /Users/scott/anaconda/
  14. Definition:, func, iterable, chunksize=None)
  15. Docstring:   Equivalent of `map()` builtin
  17. In [7]: map?
  18. Type:        builtin_function_or_method
  19. String form: <built-in function map>
  20. Namespace:   Python builtin
  21. Docstring:
  22. map(function, sequence[, sequence, ...]) -> list
  24. Return a list of the results of applying the function to the items of
  25. the argument sequence(s). If more than one sequence is given, the
  26. function is called with an argument list consisting of the corresponding
  27. item of each sequence, substituting None for missing values when not all
  28. sequences have the same length.  If the function is None, return a list of
  29. the items of the sequence (or a list of tuples if more than one sequence).
  31. In [8]:

That sure seems close to what we’re looking for. We want to apply a function to a range of inputs. The docs are a tad lacking but we can rightfully assume that the multiprocessing module will speed up our code.

But you wonder how easy this is to use. The other toolboxes you’ve seen are a mess, and you (fairly) expect this toolbox to be the same.

  1. def f(x):
  2.     # complicated processing
  3.     return x+1
  5. y_serial = []
  6. x = range(100)
  7. for i in x: y_serial += [f(x)]
  8. y_parallel =, x)
  9. # y_serial == y_parallel!

To be clear, this function works for any list-type input. You can have a list of images you want to download, tuples as (x,y) coordinates or a bunch of floats for scientific data processing. Any type of variable works.

Running this code, we find that the serial result exactly matches the parallel result even with incredibly nitpicky floats. To be safe, we should call pool.close(); pool.join() after running this code: it can’t really hurt, especially if our computation takes a loooong time.

But wait. We have a method to run a function in parallel that returns the exact same result. Let’s do what all programmers should do and make it easy to call.

  1. def easy_parallize(f, sequence):
  2.     # I didn't see gains with .dummy; you might
  3.     from multiprocessing import Pool
  4.     pool = Pool(processes=8)
  5.     #from multiprocessing.dummy import Pool
  6.     #pool = Pool(16)
  8.     # f is given sequence. guaranteed to be in order
  9.     result =, sequence)
  10.     cleaned = [x for x in result if not x is None]
  11.     cleaned = asarray(cleaned)
  12.     # not optimal but safe
  13.     pool.close()
  14.     pool.join()
  15.     return cleaned

While this works in this ideal post, it doesn’t work in the real world. Your function might take more than one input. To get around that, we use functool’s partial.

  1. def parallel_attribute(f):
  2.     def easy_parallize(f, sequence):
  3.         # I didn't see gains with .dummy; you might
  4.         from multiprocessing import Pool
  5.         pool = Pool(processes=8)
  6.         #from multiprocessing.dummy import Pool
  7.         #pool = Pool(16)
  9.         # f is given sequence. Guaranteed to be in order
  10.         result =, sequence)
  11.         cleaned = [x for x in result if not x is None]
  12.         cleaned = asarray(cleaned)
  13.         # not optimal but safe
  14.         pool.close()
  15.         pool.join()
  16.         return cleaned
  17.     from functools import partial
  18.     # This assumes f has one argument, fairly easy with Python's global scope
  19.     return partial(easy_parallize, f)

There. Now we can can use some_function.parallel = easy_parallelize(some_function). Then, if we modify our functions accordingly, we can see speedups from this!

Using Python’s global scope and nested definitions, it’s pretty easy to modify our function. It’s really just a wrapper to make this function have one argument.

  1. def some_function_parallel(x, y, z):
  2.     def some_function(x):
  3.         # x is what we want to parallelize over
  4.         # complicated computation
  5.         return x+y+z
  6.     return some_function(x, y, z)
  7. some_function.parallel = easy_parallelize(some_function_parallel)

Running some test code in the real world on a 2012 Macbook Air, I saw speed results of about twice as fast, which makes sense for my 2 cores. On a 2010 iMac, I saw speedups of about 4 times, again making sense for the 4 cores that machine has. Now go and test it with your own machines!

Author: Scott S.