Worker insights

Worker insights gives you insight in your multiprocessing efficiency by tracking worker start up time, waiting time and time spend on executing tasks. Tracking is disabled by default, but can be enabled by setting enable_insights:

with WorkerPool(n_jobs=4, enable_insights=True) as pool:
    pool.map(task, range(100))

The overhead is very minimal and you shouldn’t really notice it, even on very small tasks. You can view the tracking results using mpire.WorkerPool.get_insights() or use mpire.WorkerPool.print_insights() to directly print the insights to console:

import time

def sleep_and_square(x):
    # For illustration purposes
    time.sleep(x / 1000)
    return x * x

with WorkerPool(n_jobs=4, enable_insights=True) as pool:
    pool.map(sleep_and_square, range(100))
    insights = pool.get_insights()
    print(insights)

# Output:
{'n_completed_tasks': [28, 24, 24, 24],
 'total_start_up_time': '0:00:00.038',
 'total_init_time': '0:00:00',
 'total_waiting_time': '0:00:00.798',
 'total_working_time': '0:00:04.980',
 'total_exit_time': '0:00:00',
 'total_time': '0:00:05.816',
 'start_up_time': ['0:00:00.010', '0:00:00.008', '0:00:00.008', '0:00:00.011'],
 'start_up_time_mean': '0:00:00.009',
 'start_up_time_std': '0:00:00.001',
 'start_up_ratio': 0.006610452621805033,
 'init_time': ['0:00:00', '0:00:00', '0:00:00', '0:00:00'],
 'init_time_mean': '0:00:00',
 'init_time_std': '0:00:00',
 'init_ratio': 0.0,
 'waiting_time': ['0:00:00.309', '0:00:00.311', '0:00:00.165', '0:00:00.012'],
 'waiting_time_mean': '0:00:00.199',
 'waiting_time_std': '0:00:00.123',
 'waiting_ratio': 0.13722942739284952,
 'working_time': ['0:00:01.142', '0:00:01.135', '0:00:01.278', '0:00:01.423'],
 'working_time_mean': '0:00:01.245',
 'working_time_std': '0:00:00.117',
 'working_ratio': 0.8561601182661567,
 'exit_time': ['0:00:00', '0:00:00', '0:00:00', '0:00:00']
 'exit_time_mean': '0:00:00',
 'exit_time_std': '0:00:00',
 'exit_ratio': 0.0,
 'top_5_max_task_durations': ['0:00:00.099', '0:00:00.098', '0:00:00.097', '0:00:00.096',
                              '0:00:00.095'],
 'top_5_max_task_args': ['Arg 0: 99', 'Arg 0: 98', 'Arg 0: 97', 'Arg 0: 96', 'Arg 0: 95']}

We specified 4 workers, so there are 4 entries in the n_completed_tasks, start_up_time, init_time, waiting_time, working_time, and exit_time containers. They show per worker the number of completed tasks, the total start up time, the total time spend on the worker_init function, the total time waiting for new tasks, total time spend on main function, and the total time spend on the worker_exit function, respectively. The insights also contain mean, standard deviation, and ratio of the tracked time. The ratio is the time for that part divided by the total time. In general, the higher the working ratio the more efficient your multiprocessing setup is. Of course, your setup might still not be optimal because the task itself is inefficient, but timing that is beyond the scope of MPIRE.

Additionally, the insights keep track of the top 5 tasks that took the longest to run. The data is split up in two containers: one for the duration and one for the arguments that were passed on to the task function. Both are sorted based on task duration (desc), so index 0 of the args list corresponds to index 0 of the duration list, etc.

When using the MPIRE Dashboard you can track these insights in real-time. See Dashboard for more information.

Note

When using imap or imap_unordered you can view the insights during execution. Simply call get_insights() or print_insights() inside your loop where you process the results.