View source code Display the source code in std/algorithm/setops.d from which this page was generated on github. Improve this page Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone. Page wiki View or edit the community-maintained wiki page associated with this page.

Function std.algorithm.setops.largestPartialIntersection

Given a range of sorted forward ranges ror, copies to tgt the elements that are common to most ranges, along with their number of occurrences. All ranges in ror are assumed to be sorted by less. Only the most frequent tgt.length elements are returned.

Prototype

void largestPartialIntersection(alias less, RangeOfRanges, Range)(
  RangeOfRanges ror,
  Range tgt,
  SortOutput sorted = SortOutput.no
);

Example

// Figure which number can be found in most arrays of the set of
// arrays below.
double[][] a =
[
    [ 1, 4, 7, 8 ],
    [ 1, 7 ],
    [ 1, 7, 8],
    [ 4 ],
    [ 7 ],
];
auto b = new Tuple!(double, uint)[1];
largestPartialIntersection(a, b);
// First member is the item, second is the occurrence count
assert(b[0] == tuple(7.0, 4u));

7.0 is the correct answer because it occurs in 4 out of the 5 inputs, more than any other number. The second member of the resulting tuple is indeed 4 (recording the number of occurrences of 7.0). If more of the top-frequent numbers are needed, just create a larger tgt range. In the example above, creating b with length 2 yields tuple(1.0, 3u) in the second position.

The function largestPartialIntersection is useful for e.g. searching an inverted index for the documents most likely to contain some terms of interest. The complexity of the search is Ο(n * log(tgt.length)), where n is the sum of lengths of all input ranges. This approach is faster than keeping an associative array of the occurrences and then selecting its top items, and also requires less memory (largestPartialIntersection builds its result directly in tgt and requires no extra memory).

Warning

Because largestPartialIntersection does not allocate extra memory, it will leave ror modified. Namely, largestPartialIntersection assumes ownership of ror and discretionarily swaps and advances elements of it. If you want ror to preserve its contents after the call, you may want to pass a duplicate to largestPartialIntersection (and perhaps cache the duplicate in between calls).

Authors

Andrei Alexandrescu

License

Boost License 1.0.

Comments