Function std.numeric.gapWeightedSimilarityNormalized
The similarity per
has an issue in that it
grows with the lengths of the two strings, even though the strings are
not actually very similar. For example, the range gapWeightedSimilarity
["Hello",
"world"]
is increasingly similar with the range ["Hello",
"world", "world", "world",...]
as more instances of "world"
are
appended. To prevent that,
computes a normalized version of the similarity that is computed as
gapWeightedSimilarityNormalized
. The function gapWeightedSimilarity
(s
, t
, lambda
) /
sqrt(gapWeightedSimilarity
(s
, t
, lambda
) * gapWeightedSimilarity
(s
, t
,
lambda
))
(a
so-called normalized kernel) is bounded in gapWeightedSimilarityNormalized
[0, 1]
, reaches 0
only for ranges that don't
match in any position, and 1
only for
identical ranges.
The optional parameters
and sSelfSim
are meant for
avoiding duplicate computation. Many applications may have already
computed tSelfSim
and/or gapWeightedSimilarity
(s
, s
, lambda
)
. In that case, they can be passed
as gapWeightedSimilarity
(t
, t
, lambda
)
and sSelfSim
, respectively.
tSelfSim
Prototype
Select!(isFloatingPoint!F,F,double) gapWeightedSimilarityNormalized(alias comp, R1, R2, F)( R1 s, R2 t, F lambda, F sSelfSim = F.init, F tSelfSim = F.init ) if (isRandomAccessRange!R1 && hasLength!R1 && isRandomAccessRange!R2 && hasLength!R2);
Example
string[] s = ["Hello", "brave", "new", "world"]; string[] t = ["Hello", "new", "world"]; assert(gapWeightedSimilarity(s, s, 1) == 15); assert(gapWeightedSimilarity(t, t, 1) == 7); assert(gapWeightedSimilarity(s, t, 1) == 7); assert(approxEqual(gapWeightedSimilarityNormalized(s, t, 1), 7.0 / sqrt(15.0 * 7), 0.01));
Authors
Andrei Alexandrescu, Don Clugston, Robert Jacques