Python Multiprocessing

Carl

OPINION:

Support for developers using parallelization directly in Python is poor.

If your problem cannot fit into a neat parallelization scheme, do not use Python to solve it.

Many libraries accomplish parallel tasks by deferring to native code, so that can be another option.

From AM session:

from multiprocessing import Pool
from time import perf_counter

if __name__ == '__main__':
    p = Pool()
    ulim = 1000000000
    start = perf_counter()
    sum(range(1,ulim))
    print("serial: {0:f}".format(perf_counter() - start))

    start = perf_counter()
    input_list = [range(x,x+1000) for x in range(1,ulim,1000)]
    sum(p.map(sum, input_list))
    print("parallel, inc. alloc.: {0:f}".format(perf_counter() - start))

    start = perf_counter()
    sum(p.map(sum, input_list))
    print("straight parallel: {0:f}".format(perf_counter() - start))

slightly more #'s than you had

Guess Run Times?

Try it and move on

How Parallelized?

a few fundamental approaches

Shared data containers that look a lot like atomic / synchronized vals in Java

explicit shared state management via Manager

Process: looks basically like Thread in Java Pool: very similar to OpenMP approach: execute some loop structure in parallel, ignore details of how it gets divvied up

Uses map, map_async, and apply, apply_async for single items

Explore Some Other Structures

Python + Multiprocessing vs. C++?

Alternative: Parallel Python Quickstart