Function std.parallelism.TaskPool.parallel
Implements a parallel
foreach loop over a range
. This works by implicitly
creating and submitting one
to the Task
for each worker
thread. A work unit is a set of consecutive elements of TaskPool
to
be processed by a worker thread between communication with any other
thread. The number of elements processed per work unit is controlled by the
range
parameter. Smaller work units provide better load
balancing, but larger work units avoid the overhead of communicating
with other threads frequently to fetch the next work unit. Large work
units also avoid false sharing in cases where the workUnitSize
range
is being modified.
The less time a single iteration of the loop takes, the larger
should be. For very expensive loop bodies,
workUnitSize
should be 1. An overload that chooses a default work
unit workUnitSize
size
is also available.
Prototypes
ParallelForeach!R parallel(R)( R range, size_t workUnitSize ); ParallelForeach!R parallel(R)( R range );
Examples
// Find the logarithm of every number from 1 to // 10_000_000 in parallel. auto logs = new double[10_000_000]; // Parallel foreach works with or without an index // variable. It can be iterate by ref if range.front // returns by ref. // Iterate over logs using work units of size 100. foreach(i, ref elem; taskPool.parallel(logs, 100)) { elem = log(i + 1.0); } // Same thing, but use the default work unit size. // // Timings on an Athlon 64 X2 dual core machine: // // Parallel foreach: 388 milliseconds // Regular foreach: 619 milliseconds foreach(i, ref elem; taskPool.parallel(logs)) { elem = log(i + 1.0); }
Notes
The memory usage of this implementation is guaranteed to be constant
in range.length
.
Breaking from a parallel
foreach loop via a break, labeled break,
labeled continue, return or goto statement throws a
ParallelForeachError
.
In the case of non-random access ranges, parallel
foreach buffers lazily
to an array of size
before executing the workUnitSize
parallel
portion
of the loop. The exception is that, if a parallel
foreach is executed
over a range
returned by
or asyncBuf
, the copying is elided
and the buffers are simply swapped. In this case map
is
ignored and the work unit workUnitSize
size
is set to the buffer size
of
.
range
A memory barrier is guaranteed to be executed on exit from the loop, so that results produced by all threads are visible in the calling thread.
Exception Handling:
When at least one exception is thrown from inside a parallel
foreach loop,
the submission of additional
objects is terminated as soon as
possible, in a non-deterministic manner. All executing or
enqueued work units are allowed to complete. Then, all exceptions that
were thrown by any work unit are chained using Task
Throwable.next
and
rethrown. The order of the exception chaining is non-deterministic.