Struct std.uni.InversionList
is a set of
represented as an array of open-right [a, b)
intervals (see InversionList
CodepointInterval
above).
The name comes from the way the representation reads left to right.
For instance a set of all values [10, 50), [80, 90),
plus a singular value 60 looks like this:
10, 50, 60, 61, 80, 90
The way to read this is: start with negative meaning that all numbers smaller then the next one are not present in this set (and positive - the contrary). Then switch positive/negative after each number passed from left to right.
This way negative spans until 10, then positive until 50, then negative until 60, then positive until 61, and so on. As seen this provides a space-efficient storage of highly redundant data that comes in long runs. A description which Unicode properties fit nicely. The technique itself could be seen as a variation on RLE encoding.
Sets are value types (just like int
is) thus they
are never aliased.
Properties
Name | Type | Description |
---|---|---|
byCodepoint
[get]
|
|
A range that spans each in this set. |
byInterval
[get]
|
|
Get range that spans all of the intervals in this InversionList .
|
empty
[get]
|
bool |
True if this set doesn't contain any . |
inverted
[get]
|
|
Obtains a set that is the inversion of this set. |
length
[get]
|
size_t |
Number of in this set |
Methods
Name | Description |
---|---|
opIndex
|
Tests the presence of code point in this set.
|
toSourceCode
|
Generates string with D source code of unary function with name of
taking a single dchar argument. If is empty
the code is adjusted to be a lambda function.
|
toString
|
Obtain a textual representation of this InversionList
in form of open-right intervals.
|
Templates
Name | Description | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
add
|
Add an interval [a, b) to this set. | |||||||||||||||
opBinary
|
Sets support natural syntax for set algebra, namely:
|
|||||||||||||||
opBinaryRight
|
Tests the presence of codepoint ch in this set,
the same as opIndex .
|
|||||||||||||||
opOpAssign
|
The 'op=' versions of the above overloaded operators. | |||||||||||||||
opUnary
|
Obtains a set that is the inversion of this set. | |||||||||||||||
this
|
Construct from another code point set of any type. | |||||||||||||||
this
|
Construct a set from a forward range of code point intervals. | |||||||||||||||
this
|
Construct a set from plain values of code point intervals. |
Example
auto a = CodepointSet('a', 'z'+1); auto b = CodepointSet('A', 'Z'+1); auto c = a; a = a | b; assert(a == CodepointSet('A', 'Z'+1, 'a', 'z'+1)); assert(a != c);
See also unicode
for simpler construction of sets
from predefined ones.
Memory usage is 8 bytes per each contiguous interval in a set. The value semantics are achieved by using the COW technique and thus it's not safe to cast this type to shared.
Note
It's not recommended to rely on the template parameters
or the exact type of a current set in
.
The type and parameters may change when the standard
allocators design is finalized.
Use std.uni
isCodepointSet
with templates or just stick with the default
alias CodepointSet
throughout the whole code base.
Authors
Dmitry Olshansky