Process start method¶
Contents
The multiprocessing
package allows you to start processes using a few different methods: 'fork'
, 'spawn'
or
'forkserver'
. Threading is also available by using 'threading'
. For detailed information on the multiprocessing
contexts, please refer to the multiprocessing documentation and caveats section. In short:
- fork
Copies the parent process such that the child process is effectively identical. This includes copying everything currently in memory. This is sometimes useful, but other times useless or even a serious bottleneck.
fork
enables the use of copy-on-write shared objects (see Shared objects).- spawn
Starts a fresh python interpreter where only those resources necessary are inherited.
- forkserver
First starts a server process (using
'spawn'
). Whenever a new process is needed the parent process requests the server to fork a new process.- threading
Starts child threads. Suffers from the Global Interpreter Lock (GIL), but works fine for I/O intensive tasks.
For an overview of start method availability and defaults, please refer to the following table:
Start method |
Available on Unix |
Available on Windows |
---|---|---|
|
Yes (default) |
No |
|
Yes |
Yes (default) |
|
Yes |
No |
|
Yes |
Yes |
Spawn and forkserver¶
When using spawn
or forkserver
as start method, be aware that global variables (constants are fine) might have a
different value than you might expect. You also have to import packages within the called function:
import os
def failing_job(folder, filename):
return os.path.join(folder, filename)
# This will fail because 'os' is not copied to the child processes
with WorkerPool(n_jobs=2, start_method='spawn') as pool:
pool.map(failing_job, [('folder', '0.p3'), ('folder', '1.p3')])
def working_job(folder, filename):
import os
return os.path.join(folder, filename)
# This will work
with WorkerPool(n_jobs=2, start_method='spawn') as pool:
pool.map(working_job, [('folder', '0.p3'), ('folder', '1.p3')])
A lot of effort has been put into making the progress bar, dashboard, and nested pools (with multiple progress bars)
work well with spawn
and forkserver
. So, everything should work fine.