Module std.uni
			
The std.unistd.utf
    for this functionality. 
All primitives listed operate on Unicode characters and
    sets of characters. For functions which operate on ASCII characters
    and ignore Unicode characters, see std.ascii.
    For definitions of Unicode character, code point and other terms
    used throughout this module see the terminology section
    below.
    
The focus of this module is the core needs of developing Unicode-aware applications. To that effect it provides the following optimized primitives:
- Character classification by category and common properties:
        isAlpha,isWhiteand others.
- Case-insensitive string comparison (sicmp,icmp).
- Converting text to any of the four normalization forms via normalize.
- Decoding (decodeGrapheme) and iteration (byGrapheme,graphemeStride) by user-perceived characters, that is byGraphemeclusters.
- Decomposing and composing of individual character(s) according to canonical
        or compatibility rules, see composeanddecompose, including the specific version for Hangul syllablescomposeJamoanddecomposeHangul.
It's recognized that an application may need further enhancements and extensions, such as less commonly known algorithms, or tailoring existing ones for region specific needs. To help users with building any extra functionality beyond the core primitives, the module provides:
- CodepointSet, a type for easy manipulation of sets of characters. Besides the typical set algebra it provides an unusual feature: a D source code generator for detection of code points in this set. This is a boon for meta-programming parser frameworks, and is used internally to power classification in small sets like- isWhite.
- A way to construct optimal packed multi-stage tables also known as a
        special case of Trie.
        The functions codepointTrie,codepointSetTrieconstruct custom tries that map dchar to value. The end result is a fast and predictable Ο(1) lookup that powers functions likeisAlphaandcombiningClass, but for user-defined data sets.
- A useful technique for Unicode-aware parsers that perform
        character classification of encoded code points
        is to avoid unnecassary decoding at all costs.
        utfMatcherprovides an improvement over the usual workflow of decode-classify-process, combining the decoding and classification steps. By extracting necessary bits directly from encoded code units matchers achieve significant performance improvements. SeeMatcherConceptfor the common interface of UTF matchers.
- Generally useful building blocks for customized normalization:
        combiningClassfor querying combining class andallowedInfor testing the Quick_Check property of a given normalization form.
- Access to a large selection of commonly used sets of code points.
        Supported sets include Script,
        Block and General Category. The exact contents of a set can be
        observed in the CLDR utility, on the
        property index page
        of the Unicode website.
        See unicodefor easy and (optionally) compile-time checked set queries.
Synopsis
import std.uni;
void main()
{
    // initialize code point sets using script/block or property name
    // now 'set' contains code points from both scripts.
    auto set = unicode("Cyrillic") | unicode("Armenian");
    // same thing but simpler and checked at compile-time
    auto ascii = unicode.ASCII;
    auto currency = unicode.Currency_Symbol;
    // easy set ops
    auto a = set & ascii;
    assert(a.empty); // as it has no intersection with ascii
    a = set | ascii;
    auto b = currency - a; // subtract all ASCII, Cyrillic and Armenian
    // some properties of code point sets
    assert(b.length > 45); // 46 items in Unicode 6.1, even more in 6.2
    // testing presence of a code point in a set
    // is just fine, it is O(logN)
    assert(!b['$']);
    assert(!b['\u058F']); // Armenian dram sign
    assert(b['¥']);
    // building fast lookup tables, these guarantee O(1) complexity
    // 1-level Trie lookup table essentially a huge bit-set ~262Kb
    auto oneTrie = toTrie!1(b);
    // 2-level far more compact but typically slightly slower
    auto twoTrie = toTrie!2(b);
    // 3-level even smaller, and a bit slower yet
    auto threeTrie = toTrie!3(b);
    assert(oneTrie['£']);
    assert(twoTrie['£']);
    assert(threeTrie['£']);
    // build the trie with the most sensible trie level
    // and bind it as a functor
    auto cyrillicOrArmenian = toDelegate(set);
    auto balance = find!(cyrillicOrArmenian)("Hello ընկեր!");
    assert(balance == "ընկեր!");
    // compatible with bool delegate(dchar)
    bool delegate(dchar) bindIt = cyrillicOrArmenian;
    // Normalization
    string s = "Plain ascii (and not only), is always normalized!";
    assert(s is normalize(s));// is the same string
    string nonS = "A\u0308ffin"; // A ligature
    auto nS = normalize(nonS); // to NFC, the W3C endorsed standard
    assert(nS == "Äffin");
    assert(nS != nonS);
    string composed = "Äffin";
    assert(normalize!NFD(composed) == "A\u0308ffin");
    // to NFKD, compatibility decomposition useful for fuzzy matching/searching
    assert(normalize!NFKD("2¹⁰") == "210");
}
Terminology
The following is a list of important Unicode notions and definitions. Any conventions used specifically in this module alone are marked as such. The descriptions are based on the formal definition as found in chapter three of The Unicode Standard Core Specification.
A unit of information used for the organization, control, or representation of textual data. Note that:
- When representing data, the nature of that data is generally symbolic as opposed to some other kind of data (for example, visual).
- An abstract character has no concrete form and should not be confused with a glyph.
- An abstract character does not necessarily
        correspond to what a user thinks of as a “character”
         and should not be confused with a Grapheme.
- The abstract characters encoded (see Encoded character) are known as Unicode abstract characters.
- Abstract characters not directly encoded by the Unicode Standard can often be represented by the use of combining character sequences.
The decomposition of a character or character sequence that results from recursively applying the canonical mappings found in the Unicode Character Database and these described in Conjoining Jamo Behavior (section 12 of Unicode Conformance).
The precise definition of the Canonical composition is the algorithm as specified in Unicode Conformance section 11. Informally it's the process that does the reverse of the canonical decomposition with the addition of certain rules that e.g. prevent legacy characters from appearing in the composed result.
Two character sequences are said to be canonical equivalents if their full canonical decompositions are identical.
Typically differs by context. For the purpose of this documentation the term character implies encoded character, that is, a code point having an assigned abstract character (a symbolic meaning).
Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF (hex). Not all code points are assigned to encoded characters.
The minimal bit combination that can represent a unit of encoded text for processing or interchange. Depending on the encoding this could be: 8-bit code units in the UTF-8 (
char),
    16-bit code units in the UTF-16 (wchar),
    and 32-bit code units in the UTF-32 (dchar).
    Note that in UTF-32, a code unit is a code point
    and is represented by the D dchar type.
    
A character with the General Category of Combining Mark(M).
- All characters with non-zero canonical combining class are combining characters, but the reverse is not the case: there are combining characters with a zero combining class.
- These characters are not normally used in isolation unless they are being described. They include such characters as accents, diacritics, Hebrew points, Arabic vowel signs, and Indic matras.
A numerical value used by the Unicode Canonical Ordering Algorithm to determine which sequences of combining marks are to be considered canonically equivalent and which are not.
The decomposition of a character or character sequence that results from recursively applying both the compatibility mappings and the canonical mappings found in the Unicode Character Database, and those described in Conjoining Jamo Behavior no characters can be further decomposed.
Two character sequences are said to be compatibility equivalents if their full compatibility decompositions are identical.
An association (or mapping) between an abstract character and a code point.
The actual, concrete image of a glyph representation having been rasterized or otherwise imaged onto some display surface.
A character with the property Grapheme_Base, or any standard Korean syllable block.
Defined as the text between grapheme boundaries as specified by Unicode Standard Annex #29, Unicode text segmentation. Important general properties of a grapheme:
- The grapheme cluster represents a horizontally segmentable unit of text, consisting of some grapheme base (which may consist of a Korean syllable) together with any number of nonspacing marks applied to it.
- A grapheme cluster typically starts with a grapheme base and then extends across any subsequent sequence of nonspacing marks. A grapheme cluster is most directly relevant to text rendering and processes such as cursor placement and text selection in editing, but may also be relevant to comparison and searching.
- For many processes, a grapheme cluster behaves as if it was a single character with the same properties as its grapheme base. Effectively, nonspacing marks apply graphically to the base, but do not change its properties.
This module defines a number of primitives that work with graphemes:
        Grapheme, decodeGrapheme and graphemeStride.
        All of them are using extended grapheme boundaries
        as defined in the aforementioned standard annex.
        
A combining character with the General Category of Nonspacing Mark (Mn) or Enclosing Mark (Me).
A combining character that is not a nonspacing mark.
Normalization
The concepts of canonical equivalent
     or compatibility equivalent
    characters in the Unicode Standard make it necessary to have a full, formal
    definition of equivalence for Unicode strings.
    String equivalence is determined by a process called normalization,
    whereby strings are converted into forms which are compared
    directly for identity. This is the primary goal of the normalization process,
    see the function normalize to convert into any of
    the four defined forms.
    
A very important attribute of the Unicode Normalization Forms is that they must remain stable between versions of the Unicode Standard. A Unicode string normalized to a particular Unicode Normalization Form in one version of the standard is guaranteed to remain in that Normalization Form for implementations of future versions of the standard.
The Unicode Standard specifies four normalization forms. Informally, two of these forms are defined by maximal decomposition of equivalent sequences, and two of these forms are defined by maximal composition of equivalent sequences.
- Normalization Form D (NFD): The canonical decomposition of a character sequence.
- Normalization Form KD (NFKD): The compatibility decomposition of a character sequence.
- Normalization Form C (NFC): The canonical composition of the canonical decomposition of a coded character sequence.
- Normalization Form KC (NFKC): The canonical composition of the compatibility decomposition of a character sequence
The choice of the normalization form depends on the particular use case.
    NFC is the best form for general text, since it's more compatible with
    strings converted from legacy encodings. NFKC is the preferred form for
    identifiers, especially where there are security concerns. NFD and NFKD
    are the most useful for internal processing.
    
Construction of lookup tables
The Unicode standard describes a set of algorithms that depend on having the ability to quickly look up various properties of a code point. Given the the codespace of about 1 million code points, it is not a trivial task to provide a space-efficient solution for the multitude of properties.
Common approaches such as hash-tables or binary search over
     sorted code point intervals (as in InversionList) are insufficient.
     Hash-tables have enormous memory footprint and binary search
     over intervals is not fast enough for some heavy-duty algorithms.
     
The recommended solution (see Unicode Implementation Guidelines) is using multi-stage tables that are an implementation of the Trie data structure with integer keys and a fixed number of stages. For the remainder of the section this will be called a fixed trie. The following describes a particular implementation that is aimed for the speed of access at the expense of ideal size savings.
Taking a 2-level Trie as an example the principle of operation is as follows. Split the number of bits in a key (code point, 21 bits) into 2 components (e.g. 15 and 8). The first is the number of bits in the index of the trie and the other is number of bits in each page of the trie. The layout of the trie is then an array of size 2^^bits-of-index followed an array of memory chunks of size 2^^bits-of-page/bits-per-element.
The number of pages is variable (but not less then 1) unlike the number of entries in the index. The slots of the index all have to contain a number of a page that is present. The lookup is then just a couple of operations - slice the upper bits, lookup an index for these, take a page at this index and use the lower bits as an offset within this page.
        Assuming that pages are laid out consequently
        in one array at pages, the pseudo-code is:
    
auto elemsPerPage = (2 ^^ bits_per_page) / Value.sizeOfInBits; pages[index[n >> bits_per_page]][n & (elemsPerPage - 1)];
Where if elemsPerPage is a power of 2 the whole process is
    a handful of simple instructions and 2 array reads. Subsequent levels
    of the trie are introduced by recursing on this notion - the index array
    is treated as values. The number of bits in index is then again
    split into 2 parts, with pages over 'current-index' and the new 'upper-index'.
    
For completeness a level 1 trie is simply an array.
    The current implementation takes advantage of bit-packing values
    when the range is known to be limited in advance (such as bool).
    See also BitPacked for enforcing it manually.
    The major size advantage however comes from the fact
    that multiple identical pages on every level are merged by construction.
    
The process of constructing a trie is more involved and is hidden from
    the user in a form of the convenience functions codepointTrie,
    codepointSetTrie and the even more convenient toTrie.
    In general a set or built-in AA with dchar type
    can be turned into a trie. The trie object in this module
    is read-only (immutable); it's effectively frozen after construction.
    
Unicode properties
This is a full list of Unicode properties accessible through unicode
    with specific helpers per category nested within. Consult the
    CLDR utility
    when in doubt about the contents of a particular set.
General category sets listed below are only accessible with the
    unicode shorthand accessor.
| Abb. | Long form | Abb. | Long form | Abb. | Long form | 
|---|---|---|---|---|---|
| L | Letter | Cn | Unassigned | Po | Other_Punctuation | 
| Ll | Lowercase_Letter | Co | Private_Use | Ps | Open_Punctuation | 
| Lm | Modifier_Letter | Cs | Surrogate | S | Symbol | 
| Lo | Other_Letter | N | Number | Sc | Currency_Symbol | 
| Lt | Titlecase_Letter | Nd | Decimal_Number | Sk | Modifier_Symbol | 
| Lu | Uppercase_Letter | Nl | Letter_Number | Sm | Math_Symbol | 
| M | Mark | No | Other_Number | So | Other_Symbol | 
| Mc | Spacing_Mark | P | Punctuation | Z | Separator | 
| Me | Enclosing_Mark | Pc | Connector_Punctuation | Zl | Line_Separator | 
| Mn | Nonspacing_Mark | Pd | Dash_Punctuation | Zp | Paragraph_Separator | 
| C | Other | Pe | Close_Punctuation | Zs | Space_Separator | 
| Cc | Control | Pf | Final_Punctuation | - | Any | 
| Cf | Format | Pi | Initial_Punctuation | - | ASCII | 
Sets for other commonly useful properties that are
    accessible with unicode:
| Name | Name | Name | 
|---|---|---|
| Alphabetic | Ideographic | Other_Uppercase | 
| ASCII_Hex_Digit | IDS_Binary_Operator | Pattern_Syntax | 
| Bidi_Control | ID_Start | Pattern_White_Space | 
| Cased | IDS_Trinary_Operator | Quotation_Mark | 
| Case_Ignorable | Join_Control | Radical | 
| Dash | Logical_Order_Exception | Soft_Dotted | 
| Default_Ignorable_Code_Point | Lowercase | STerm | 
| Deprecated | Math | Terminal_Punctuation | 
| Diacritic | Noncharacter_Code_Point | Unified_Ideograph | 
| Extender | Other_Alphabetic | Uppercase | 
| Grapheme_Base | Other_Default_Ignorable_Code_Point | Variation_Selector | 
| Grapheme_Extend | Other_Grapheme_Extend | White_Space | 
| Grapheme_Link | Other_ID_Continue | XID_Continue | 
| Hex_Digit | Other_ID_Start | XID_Start | 
| Hyphen | Other_Lowercase | |
| ID_Continue | Other_Math | 
Bellow is the table with block names accepted by unicode.block.
    Note that the shorthand version unicode requires "In"
    to be prepended to the names of blocks so as to disambiguate
    scripts and blocks.
| Aegean Numbers | Ethiopic Extended | Mongolian | 
| Alchemical Symbols | Ethiopic Extended-A | Musical Symbols | 
| Alphabetic Presentation Forms | Ethiopic Supplement | Myanmar | 
| Ancient Greek Musical Notation | General Punctuation | Myanmar Extended-A | 
| Ancient Greek Numbers | Geometric Shapes | New Tai Lue | 
| Ancient Symbols | Georgian | NKo | 
| Arabic | Georgian Supplement | Number Forms | 
| Arabic Extended-A | Glagolitic | Ogham | 
| Arabic Mathematical Alphabetic Symbols | Gothic | Ol Chiki | 
| Arabic Presentation Forms-A | Greek and Coptic | Old Italic | 
| Arabic Presentation Forms-B | Greek Extended | Old Persian | 
| Arabic Supplement | Gujarati | Old South Arabian | 
| Armenian | Gurmukhi | Old Turkic | 
| Arrows | Halfwidth and Fullwidth Forms | Optical Character Recognition | 
| Avestan | Hangul Compatibility Jamo | Oriya | 
| Balinese | Hangul Jamo | Osmanya | 
| Bamum | Hangul Jamo Extended-A | Phags-pa | 
| Bamum Supplement | Hangul Jamo Extended-B | Phaistos Disc | 
| Basic Latin | Hangul Syllables | Phoenician | 
| Batak | Hanunoo | Phonetic Extensions | 
| Bengali | Hebrew | Phonetic Extensions Supplement | 
| Block Elements | High Private Use Surrogates | Playing Cards | 
| Bopomofo | High Surrogates | Private Use Area | 
| Bopomofo Extended | Hiragana | Rejang | 
| Box Drawing | Ideographic Description Characters | Rumi Numeral Symbols | 
| Brahmi | Imperial Aramaic | Runic | 
| Braille Patterns | Inscriptional Pahlavi | Samaritan | 
| Buginese | Inscriptional Parthian | Saurashtra | 
| Buhid | IPA Extensions | Sharada | 
| Byzantine Musical Symbols | Javanese | Shavian | 
| Carian | Kaithi | Sinhala | 
| Chakma | Kana Supplement | Small Form Variants | 
| Cham | Kanbun | Sora Sompeng | 
| Cherokee | Kangxi Radicals | Spacing Modifier Letters | 
| CJK Compatibility | Kannada | Specials | 
| CJK Compatibility Forms | Katakana | Sundanese | 
| CJK Compatibility Ideographs | Katakana Phonetic Extensions | Sundanese Supplement | 
| CJK Compatibility Ideographs Supplement | Kayah Li | Superscripts and Subscripts | 
| CJK Radicals Supplement | Kharoshthi | Supplemental Arrows-A | 
| CJK Strokes | Khmer | Supplemental Arrows-B | 
| CJK Symbols and Punctuation | Khmer Symbols | Supplemental Mathematical Operators | 
| CJK Unified Ideographs | Lao | Supplemental Punctuation | 
| CJK Unified Ideographs Extension A | Latin-1 Supplement | Supplementary Private Use Area-A | 
| CJK Unified Ideographs Extension B | Latin Extended-A | Supplementary Private Use Area-B | 
| CJK Unified Ideographs Extension C | Latin Extended Additional | Syloti Nagri | 
| CJK Unified Ideographs Extension D | Latin Extended-B | Syriac | 
| Combining Diacritical Marks | Latin Extended-C | Tagalog | 
| Combining Diacritical Marks for Symbols | Latin Extended-D | Tagbanwa | 
| Combining Diacritical Marks Supplement | Lepcha | Tags | 
| Combining Half Marks | Letterlike Symbols | Tai Le | 
| Common Indic Number Forms | Limbu | Tai Tham | 
| Control Pictures | Linear B Ideograms | Tai Viet | 
| Coptic | Linear B Syllabary | Tai Xuan Jing Symbols | 
| Counting Rod Numerals | Lisu | Takri | 
| Cuneiform | Low Surrogates | Tamil | 
| Cuneiform Numbers and Punctuation | Lycian | Telugu | 
| Currency Symbols | Lydian | Thaana | 
| Cypriot Syllabary | Mahjong Tiles | Thai | 
| Cyrillic | Malayalam | Tibetan | 
| Cyrillic Extended-A | Mandaic | Tifinagh | 
| Cyrillic Extended-B | Mathematical Alphanumeric Symbols | Transport And Map Symbols | 
| Cyrillic Supplement | Mathematical Operators | Ugaritic | 
| Deseret | Meetei Mayek | Unified Canadian Aboriginal Syllabics | 
| Devanagari | Meetei Mayek Extensions | Unified Canadian Aboriginal Syllabics Extended | 
| Devanagari Extended | Meroitic Cursive | Vai | 
| Dingbats | Meroitic Hieroglyphs | Variation Selectors | 
| Domino Tiles | Miao | Variation Selectors Supplement | 
| Egyptian Hieroglyphs | Miscellaneous Mathematical Symbols-A | Vedic Extensions | 
| Emoticons | Miscellaneous Mathematical Symbols-B | Vertical Forms | 
| Enclosed Alphanumerics | Miscellaneous Symbols | Yijing Hexagram Symbols | 
| Enclosed Alphanumeric Supplement | Miscellaneous Symbols and Arrows | Yi Radicals | 
| Enclosed CJK Letters and Months | Miscellaneous Symbols And Pictographs | Yi Syllables | 
| Enclosed Ideographic Supplement | Miscellaneous Technical | |
| Ethiopic | Modifier Tone Letters | 
Bellow is the table with script names accepted by unicode.script
    and by the shorthand version unicode:
| Arabic | Hanunoo | Old_Italic | 
| Armenian | Hebrew | Old_Persian | 
| Avestan | Hiragana | Old_South_Arabian | 
| Balinese | Imperial_Aramaic | Old_Turkic | 
| Bamum | Inherited | Oriya | 
| Batak | Inscriptional_Pahlavi | Osmanya | 
| Bengali | Inscriptional_Parthian | Phags_Pa | 
| Bopomofo | Javanese | Phoenician | 
| Brahmi | Kaithi | Rejang | 
| Braille | Kannada | Runic | 
| Buginese | Katakana | Samaritan | 
| Buhid | Kayah_Li | Saurashtra | 
| Canadian_Aboriginal | Kharoshthi | Sharada | 
| Carian | Khmer | Shavian | 
| Chakma | Lao | Sinhala | 
| Cham | Latin | Sora_Sompeng | 
| Cherokee | Lepcha | Sundanese | 
| Common | Limbu | Syloti_Nagri | 
| Coptic | Linear_B | Syriac | 
| Cuneiform | Lisu | Tagalog | 
| Cypriot | Lycian | Tagbanwa | 
| Cyrillic | Lydian | Tai_Le | 
| Deseret | Malayalam | Tai_Tham | 
| Devanagari | Mandaic | Tai_Viet | 
| Egyptian_Hieroglyphs | Meetei_Mayek | Takri | 
| Ethiopic | Meroitic_Cursive | Tamil | 
| Georgian | Meroitic_Hieroglyphs | Telugu | 
| Glagolitic | Miao | Thaana | 
| Gothic | Mongolian | Thai | 
| Greek | Myanmar | Tibetan | 
| Gujarati | New_Tai_Lue | Tifinagh | 
| Gurmukhi | Nko | Ugaritic | 
| Han | Ogham | Vai | 
| Hangul | Ol_Chiki | Yi | 
Bellow is the table of names accepted by unicode.hangulSyllableType.
| Abb. | Long form | 
|---|---|
| L | Leading_Jamo | 
| LV | LV_Syllable | 
| LVT | LVT_Syllable | 
| T | Trailing_Jamo | 
| V | Vowel_Jamo | 
References
ASCII Table, Wikipedia, The Unicode Consortium, Unicode normalization forms, Unicode text segmentation Unicode Implementation Guidelines Unicode Conformance
Trademarks
Unicode(tm) is a trademark of Unicode, Inc.
Standards
Functions
| Name | Description | 
|---|---|
| allowedIn | Tests if dchar is always allowed (Quick_Check=YES) in normalization
    formnorm. | 
| asCapitalized | Capitalize input range or string, meaning convert the first character to upper case and subsequent characters to lower case. | 
| asLowerCase | Convert input range or string to upper or lower case. | 
| asUpperCase | Convert input range or string to upper or lower case. | 
| byCodePoint | Lazily transform a  | 
| byGrapheme | Iterate a string by grapheme. | 
| combiningClass | Returns the Combining class, combining class of  | 
| compose | Try to canonically compose2 .
    Returns the composed  if they docomposeand dchar.init otherwise. | 
| composeJamo | Try to composehangul syllable out of a leading consonant (),
    aand optionalconsonant jamos. | 
| decodeGrapheme | Reads one full grapheme cluster from an input range of dchar . | 
| decompose | Returns a full Canonical decomposition, Canonical
    (by default) or Compatibility decomposition, Compatibility
    decomposition of .
    If no decomposition is available returns aGraphemewith theitself. | 
| decomposeHangul | Decomposes a Hangul syllable. If is not a composed syllable
    then this function returnsGraphemecontaining onlyas is. | 
| graphemeStride | Computes the length of grapheme cluster starting at .
    Both the resulting length and theare measured
    in Code unit, code units. | 
| icmp | Does case insensitive comparison of  | 
| isAlpha | Returns whether is a Unicode alphabetic 
    (general Unicode category: Alphabetic). | 
| isControl | Returns whether is a Unicode control 
    (general Unicode category: Cc). | 
| isFormat | Returns whether is a Unicode formatting 
    (general Unicode category: Cf). | 
| isGraphical | Returns whether is a Unicode graphical 
    (general Unicode category: L, M, N, P, S, Zs). | 
| isLower | Return whether is a Unicode lowercase . | 
| isMark | Returns whether is a Unicode mark
    (general Unicode category: Mn, Me, Mc). | 
| isNonCharacter | Returns whether is a Unicode non-character i.e.
    a  with no assigned abstract character.
    (general Unicode category: Cn) | 
| isNumber | Returns whether is a Unicode numerical 
    (general Unicode category: Nd, Nl, No). | 
| isPrivateUse | Returns whether is a Unicode Private Use 
    (general Unicode category: Co). | 
| isPunctuation | Returns whether is a Unicode punctuation 
    (general Unicode category: Pd, Ps, Pe, Pc, Po, Pi, Pf). | 
| isSpace | Returns whether is a Unicode space 
    (general Unicode category: Zs) | 
| isSurrogate | Returns whether is a Unicode surrogate 
    (general Unicode category: Cs). | 
| isSurrogateHi | Returns whether is a Unicode high surrogate (lead surrogate). | 
| isSurrogateLo | Returns whether is a Unicode low surrogate (trail surrogate). | 
| isSymbol | Returns whether is a Unicode symbol 
    (general Unicode category: Sm, Sc, Sk, So). | 
| isUpper | Return whether is a Unicode uppercase . | 
| isWhite | Whether or not is a Unicode whitespace .
    (general Unicode category: Part of C0(tab, vertical tab, form feed,
    carriage return, and linefeed characters), Zs, Zl, Zp, and NEL(U+0085)) | 
| normalize | Returns string normalized to the chosen form.
    Form C is used by default. | 
| sicmp | Does basic case-insensitive comparison of strings  | 
| toDelegate | Builds a  | 
| toLower | Returns a string which is identical to except that all of its
    characters are converted to lowercase (by preforming Unicode lowercase mapping).
    If none ofcharacters were affected, thenitself is returned. | 
| toLower | If is a Unicode uppercase , then its lowercase equivalent
    is returned. Otherwiseis returned. | 
| toLowerInPlace | Converts to lowercase (by performing Unicode lowercase mapping) in place.
    For a few characters string length may increase after the transformation,
    in such a case the function reallocates exactly once.
    Ifdoes not have any uppercase characters, thenis unaltered. | 
| toTrie | Convenience function to construct optimal configurations for
    packed Trie from any of . | 
| toUpper | If is a Unicode lowercase , then its uppercase equivalent
    is returned. Otherwiseis returned. | 
| toUpper | Returns a string which is identical to except that all of its
    characters are converted to uppercase (by preforming Unicode uppercase mapping).
    If none ofcharacters were affected, thenitself is returned. | 
| toUpperInPlace | Converts to uppercase  (by performing Unicode uppercase mapping) in place.
    For a few characters string length may increase after the transformation,
    in such a case the function reallocates exactly once.
    Ifdoes not have any lowercase characters, thenis unaltered. | 
| utfMatcher | Constructs a matcher objectto classify  from thefor encoding
    that hasCharas code unit. | 
Structs
| Name | Description | 
|---|---|
| CodepointInterval | The recommended type of std.typecons.Tuple
    to represent [a, b) intervals of . As used in InversionList.
    Any interval type should passisIntegralPairtrait. | 
| Grapheme | A structure designed to effectively pack of a . | 
| InversionList | 
 | 
| MatcherConcept | Conceptual type that outlines the common properties of all UTF Matchers. | 
| unicode | A single entry point to lookup Unicode  sets by name or alias of
    a block,scriptor general category. | 
Enums
| Name | Description | 
|---|---|
| NormalizationForm | Enumeration type for normalization forms,
    passed as template parameter for functions like normalize. | 
| UnicodeDecomposition | Unicode character decomposition type. | 
Templates
| Name | Description | 
|---|---|
| codepointSetTrie | A shorthand for creating a custom multi-level fixed Trie
    from a .sizesare numbers of bits per level,
    with the most significant bits used first. | 
| codepointTrie | A slightly more general tool for building fixed Triefor the Unicode data. | 
| isCodepointSet | Tests if T is some kind a set of code points. Intended for template constraints. | 
Enum values
| Name | Type | Description | 
|---|---|---|
| isIntegralPair | Tests if Tis a pair of integers that implicitly convert toV.
    The following code must compile for any pairT: | |
| isUtfMatcher | Test if Mis an UTF Matcher for ranges ofChar. | |
| lineSep | Constant (0x2028) - line separator. | |
| nelSep | Constant (0x0085) - next line. | |
| NFC | Shorthand aliases from values indicating normalization forms. | |
| NFD | Shorthand aliases from values indicating normalization forms. | |
| NFKC | Shorthand aliases from values indicating normalization forms. | |
| NFKD | Shorthand aliases from values indicating normalization forms. | |
| paraSep | Constant (0x2029) - paragraph separator. | 
Aliases
| Name | Type | Description | 
|---|---|---|
| CodepointSet | InversionList!(std.uni.GcPolicy) | The recommended default type for set of .
    For details, see the current implementation: InversionList. | 
| CodepointSetTrie | typeof(TrieBuilder!(bool,dchar,lastDchar+1,Prefix)(false). | Type of Trie generated by codepointSetTriefunction. | 
| CodepointTrie | typeof(TrieBuilder!(T,dchar,lastDchar+1,Prefix)(T. | Type of Trie as generated by codepointTriefunction. | 
Authors
Dmitry Olshansky