View source code Display the source code in std/numeric.d from which this page was generated on github. Improve this page Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone. Page wiki View or edit the community-maintained wiki page associated with this page.

Function std.numeric.gapWeightedSimilarityNormalized

The similarity per gapWeightedSimilarity has an issue in that it grows with the lengths of the two strings, even though the strings are not actually very similar. For example, the range ["Hello", "world"] is increasingly similar with the range ["Hello", "world", "world", "world",...] as more instances of "world" are appended. To prevent that, gapWeightedSimilarityNormalized computes a normalized version of the similarity that is computed as gapWeightedSimilarity(s, t, lambda) / sqrt(gapWeightedSimilarity(s, t, lambda) * gapWeightedSimilarity(s, t, lambda)). The function gapWeightedSimilarityNormalized (a so-called normalized kernel) is bounded in [0, 1], reaches 0 only for ranges that don't match in any position, and 1 only for identical ranges.

The optional parameters sSelfSim and tSelfSim are meant for avoiding duplicate computation. Many applications may have already computed gapWeightedSimilarity(s, s, lambda) and/or gapWeightedSimilarity(t, t, lambda). In that case, they can be passed as sSelfSim and tSelfSim, respectively.

Prototype

Select!(isFloatingPoint!F,F,double) gapWeightedSimilarityNormalized(alias comp, R1, R2, F)(
  R1 s,
  R2 t,
  F lambda,
  F sSelfSim = F.init,
  F tSelfSim = F.init
)
if (isRandomAccessRange!R1 && hasLength!R1 && isRandomAccessRange!R2 && hasLength!R2);

Example

string[] s = ["Hello", "brave", "new", "world"];
string[] t = ["Hello", "new", "world"];
assert(gapWeightedSimilarity(s, s, 1) == 15);
assert(gapWeightedSimilarity(t, t, 1) == 7);
assert(gapWeightedSimilarity(s, t, 1) == 7);
assert(approxEqual(gapWeightedSimilarityNormalized(s, t, 1),
                7.0 / sqrt(15.0 * 7), 0.01));

Authors

Andrei Alexandrescu, Don Clugston, Robert Jacques

License

Boost License 1.0.

Comments