Here’s a simple example of using C++17
(parallel) execution policies for summation.
#include <vector>
#include <execution>
We have to load Threading Building Blocks library that under the hood does the actual parallelization:
#pragma cling load("libtbb.so.2")
const std::vector<double> v(10'000'007, 0.1);
%%timeit
std::reduce(std::execution::seq, v.cbegin(), v.cend());
Output
203 ms +- 46.3 ms per loop (mean +- std. dev. of 7 runs 10 loops each)
%%timeit
std::reduce(std::execution::par, v.cbegin(), v.cend());
Output
54.5 ms +- 1.62 ms per loop (mean +- std. dev. of 7 runs 10 loops each)
auto s = std::reduce(std::execution::par, v.cbegin(), v.cend());
s
Output
1000000.7