View source code Display the source code in std/random.d from which this page was generated on github. Improve this page Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone. Page wiki View or edit the community-maintained wiki page associated with this page.

std.random.random_sample - multiple declarations

Function randomSample

Selects a random subsample out of r, containing exactly n elements. The order of elements is the same as in the original range. The total length of r must be known. If total is passed in, the total number of sample is considered to be total. Otherwise, RandomSample uses r.length.

Prototypes

auto randomSample(Range)(
  Range r,
  size_t n,
  size_t total
)
if (isInputRange!Range);

auto randomSample(Range)(
  Range r,
  size_t n
)
if (isInputRange!Range && hasLength!Range);

auto randomSample(Range, UniformRNG)(
  Range r,
  size_t n,
  size_t total,
  UniformRNG rng
)
if (isInputRange!Range && isUniformRNG!UniformRNG);

auto randomSample(Range, UniformRNG)(
  Range r,
  size_t n,
  UniformRNG rng
)
if (isInputRange!Range && hasLength!Range && isUniformRNG!UniformRNG);

Parameters

NameDescription
r range to sample from
n number of elements to include in the sample; must be less than or equal to the total number of elements in r and/or the parameter total (if provided)
total (semi-optional) number of elements of r from which to select the sample (counting from the beginning); must be less than or equal to the total number of elements in r itself. May be omitted if r has the .length property and the sample is to be drawn from all elements of r.
rng (optional) random number generator to use; if not specified, defaults to rndGen

Returns

Range whose elements consist of a randomly selected subset of the elements of r, in the same order as these elements appear in r itself. Will be a forward range if both r and rng are forward ranges, an input range otherwise.

RandomSample implements Jeffrey Scott Vitter's Algorithm D (see Vitter 1984, 1987), which selects a sample of size n in O(n) steps and requiring O(n) random variates, regardless of the size of the data being sampled. The exception to this is if traversing k elements on the input range is itself an O(k) operation (e.g. when sampling lines from an input file), in which case the sampling calculation will inevitably be of O(total).

RandomSample will throw an exception if total is verifiably less than the total number of elements available in the input, or if n > total.

If no random number generator is passed to randomSample, the thread-global RNG rndGen will be used internally.

Example

int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ];
// Print 5 random elements picked off from a
foreach (e; randomSample(a, 5))
{
    writeln(e);
}

WARNING: If an alternative RNG is desired, it is essential for this to be a new RNG seeded in an unpredictable manner. Passing it a RNG used elsewhere in the program will result in unintended correlations, due to the current implementation of RNGs as value types.

Example

int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ];
foreach (e; randomSample(a, 5, Random(unpredictableSeed)))  // correct!
{
    writeln(e);
}

foreach (e; randomSample(a, 5, rndGen))  // DANGEROUS!! rndGen gets
{                                        // copied by value
    writeln(e);
}

foreach (e; randomSample(a, 5, rndGen))  // ... so this second random
{                                        // sample will select the same
    writeln(e);                          // values as the previous one.
}

These issues will be resolved in a second-generation std.random that re-implements random number generators as reference types.

Struct RandomSample

Selects a random subsample out of r, containing exactly n elements. The order of elements is the same as in the original range. The total length of r must be known. If total is passed in, the total number of sample is considered to be total. Otherwise, RandomSample uses r.length.

Properties

Name Type Description
empty [get] bool Range primitives.
index [get] size_t Returns the index of the visited record.
length [get] size_t Range primitives.

Methods

Name Description
popFront Range primitives.

Parameters

NameDescription
r range to sample from
n number of elements to include in the sample; must be less than or equal to the total number of elements in r and/or the parameter total (if provided)
total (semi-optional) number of elements of r from which to select the sample (counting from the beginning); must be less than or equal to the total number of elements in r itself. May be omitted if r has the .length property and the sample is to be drawn from all elements of r.
rng (optional) random number generator to use; if not specified, defaults to rndGen

Returns

Range whose elements consist of a randomly selected subset of the elements of r, in the same order as these elements appear in r itself. Will be a forward range if both r and rng are forward ranges, an input range otherwise.

RandomSample implements Jeffrey Scott Vitter's Algorithm D (see Vitter 1984, 1987), which selects a sample of size n in O(n) steps and requiring O(n) random variates, regardless of the size of the data being sampled. The exception to this is if traversing k elements on the input range is itself an O(k) operation (e.g. when sampling lines from an input file), in which case the sampling calculation will inevitably be of O(total).

RandomSample will throw an exception if total is verifiably less than the total number of elements available in the input, or if n > total.

If no random number generator is passed to randomSample, the thread-global RNG rndGen will be used internally.

Example

int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ];
// Print 5 random elements picked off from a
foreach (e; randomSample(a, 5))
{
    writeln(e);
}

WARNING: If an alternative RNG is desired, it is essential for this to be a new RNG seeded in an unpredictable manner. Passing it a RNG used elsewhere in the program will result in unintended correlations, due to the current implementation of RNGs as value types.

Example

int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ];
foreach (e; randomSample(a, 5, Random(unpredictableSeed)))  // correct!
{
    writeln(e);
}

foreach (e; randomSample(a, 5, rndGen))  // DANGEROUS!! rndGen gets
{                                        // copied by value
    writeln(e);
}

foreach (e; randomSample(a, 5, rndGen))  // ... so this second random
{                                        // sample will select the same
    writeln(e);                          // values as the previous one.
}

These issues will be resolved in a second-generation std.random that re-implements random number generators as reference types.

Authors

Andrei Alexandrescu Masahiro Nakagawa (Xorshift random generator) Joseph Rushton Wakeling (Algorithm D for random sampling)

License

Boost License 1.0.

Comments