Maximum number of active tasks¶
When you have tasks that take up a lot of memory you can do a few things:
Limit the number of jobs (i.e., the number of tasks currently being available to the workers, tasks that are in the queue ready to be processed).
Limit the number of active tasks
The first option is the most obvious one to save memory when the processes themselves use up much memory. The second is convenient when the argument list takes up too much memory. For example, suppose you want to kick off an enormous amount of jobs (let’s say a billion) of which the arguments take up 1 KB per task (e.g., large strings), then that task queue would take up ~1 TB of memory!
In such cases, a good rule of thumb would be to have twice the amount of active chunks of tasks than there are jobs. This means that when all workers complete their task at the same time each would directly be able to continue with another task. When workers take on their new tasks the generator of tasks is iterated to the point that again there would be twice the amount of active chunks of tasks.
In MPIRE, the maximum number of active tasks by default is set to n_jobs * chunk_size * 2
, so you don’t have to
tweak it for memory optimization. If, for whatever reason, you want to change this behavior, you can do so by setting
the max_active_tasks
parameter:
with WorkerPool(n_jobs=4) as pool:
results = pool.map(task, range(int(1e300)), iterable_len=int(1e300),
chunk_size=int(1e5), max_tasks_active=4 * int(1e5))
Note
Setting the max_tasks_active
parameter to a value lower than n_jobs * chunk_size
can result in some workers
not being able to do anything.