std.random.random_sample
- multiple declarations
- Function randomSample
- Struct RandomSample
Function randomSample
Selects a random subsample out of
, containing exactly r
elements. The order of elements is the same as in the original
range. The n
total
length of
must be known. If r
is
passed in, the total
total
number of sample is considered to be
. Otherwise, total
uses RandomSample
r.length
.
Prototypes
auto randomSample(Range)( Range r, size_t n, size_t total ) if (isInputRange!Range); auto randomSample(Range)( Range r, size_t n ) if (isInputRange!Range && hasLength!Range); auto randomSample(Range, UniformRNG)( Range r, size_t n, size_t total, UniformRNG rng ) if (isInputRange!Range && isUniformRNG!UniformRNG); auto randomSample(Range, UniformRNG)( Range r, size_t n, UniformRNG rng ) if (isInputRange!Range && hasLength!Range && isUniformRNG!UniformRNG);
Parameters
Name | Description |
---|---|
r | range to sample from |
n | number of elements to include in the sample;
must be less than or equal to the total number
of elements in and/or the parameter
(if provided) |
total | (semi-optional) number of elements of
from which to select the sample (counting from
the beginning); must be less than or equal to
the total number of elements in itself.
May be omitted if has the .length
property and the sample is to be drawn from
all elements of . |
rng | (optional) random number generator to use;
if not specified, defaults to
|
Returns
Range whose elements consist of a randomly selected subset of
the elements of
, in the same order as these elements
appear in r
itself. Will be a forward range if both r
and r
are forward ranges, an input range otherwise.
rng
implements Jeffrey Scott Vitter's Algorithm D
(see Vitter 1984, 1987), which selects a sample
of size RandomSample
in O(n
n
) steps and requiring O(n
) random variates,
regardless of the size of the data being sampled. The exception
to this is if traversing k elements on the input range is itself
an O(k) operation (e.g. when sampling lines from an input file),
in which case the sampling calculation will inevitably be of
O(total
).
RandomSample
will throw an exception if
is verifiably
less than the total
total
number of elements available in the input,
or if
.
n
> total
If no random number generator is passed to
, the
thread-global RNG randomSample
rndGen
will be used internally.
Example
int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]; // Print 5 random elements picked off from a foreach (e; randomSample(a, 5)) { writeln(e); }
WARNING: If an alternative RNG is desired, it is essential for this to be a new RNG seeded in an unpredictable manner. Passing it a RNG used elsewhere in the program will result in unintended correlations, due to the current implementation of RNGs as value types.
Example
int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]; foreach (e; randomSample(a, 5, Random(unpredictableSeed))) // correct! { writeln(e); } foreach (e; randomSample(a, 5, rndGen)) // DANGEROUS!! rndGen gets { // copied by value writeln(e); } foreach (e; randomSample(a, 5, rndGen)) // ... so this second random { // sample will select the same writeln(e); // values as the previous one. }
These issues will be resolved in a second-generation std.random
that
re-implements random number generators as reference types.
Struct RandomSample
Selects a random subsample out of r
, containing exactly n
elements. The order of elements is the same as in the original
range. The total length
of r
must be known. If total
is
passed in, the total number of sample is considered to be total
. Otherwise,
uses RandomSample
r.length
.
Properties
Name | Type | Description |
---|---|---|
empty
[get]
|
bool |
Range primitives. |
index
[get]
|
size_t |
Returns the index of the visited record.
|
length
[get]
|
size_t |
Range primitives. |
Methods
Name | Description |
---|---|
popFront
|
Range primitives. |
Parameters
Name | Description |
---|---|
r | range to sample from |
n | number of elements to include in the sample;
must be less than or equal to the total number
of elements in r and/or the parameter
total (if provided) |
total | (semi-optional) number of elements of r
from which to select the sample (counting from
the beginning); must be less than or equal to
the total number of elements in r itself.
May be omitted if r has the .length
property and the sample is to be drawn from
all elements of r . |
rng | (optional) random number generator to use;
if not specified, defaults to
|
Returns
Range whose elements consist of a randomly selected subset of
the elements of r
, in the same order as these elements
appear in r
itself. Will be a forward range if both r
and rng
are forward ranges, an input range otherwise.
implements Jeffrey Scott Vitter's Algorithm D
(see Vitter 1984, 1987), which selects a sample
of size RandomSample
n
in O(n) steps and requiring O(n) random variates,
regardless of the size of the data being sampled. The exception
to this is if traversing k elements on the input range is itself
an O(k) operation (e.g. when sampling lines from an input file),
in which case the sampling calculation will inevitably be of
O(total).
RandomSample
will throw an exception if total
is verifiably
less than the total number of elements available in the input,
or if n > total
.
If no random number generator is passed to
, the
thread-global RNG randomSample
rndGen
will be used internally.
Example
int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]; // Print 5 random elements picked off from a foreach (e; randomSample(a, 5)) { writeln(e); }
WARNING: If an alternative RNG is desired, it is essential for this to be a new RNG seeded in an unpredictable manner. Passing it a RNG used elsewhere in the program will result in unintended correlations, due to the current implementation of RNGs as value types.
Example
int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]; foreach (e; randomSample(a, 5, Random(unpredictableSeed))) // correct! { writeln(e); } foreach (e; randomSample(a, 5, rndGen)) // DANGEROUS!! rndGen gets { // copied by value writeln(e); } foreach (e; randomSample(a, 5, rndGen)) // ... so this second random { // sample will select the same writeln(e); // values as the previous one. }
These issues will be resolved in a second-generation std.random
that
re-implements random number generators as reference types.
Authors
Andrei Alexandrescu
Masahiro Nakagawa (Xorshift
random generator)
Joseph Rushton Wakeling (Algorithm D for random sampling)