View source code
Display the source code in std/utf.d from which this page was generated on
github.
Improve this page
Quickly fork, edit online, and submit a pull request for this page.
Requires a signed-in GitHub account. This works well for small changes.
If you'd like to make larger changes you may want to consider using
local clone.
Page wiki
View or edit the community-maintained wiki page associated with this page.
Module std.utf
Encode and decode
UTF-8, UTF-16 and UTF-32 strings.
UTF character support is restricted to
'\u0000' <= character <= '\U0010FFFF'
.
See Also
Wikipedia
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1335
Functions
Name | Description |
---|---|
byCodeUnit
|
Iterate a range of char, wchar, or dchars by code unit. |
codeLength
|
Returns the number of code units that are required to encode str
in a string whose character type is C . This is particularly useful
when slicing one string with the length of another and the two string
types use different character types.
|
codeLength
|
Returns the number of code units that are required to encode the code point
when C is the character type used to encode it.
|
count
|
Returns the total number of code points encoded in .
|
decode
|
Decodes and returns the code point starting at .
is advanced to one past the decoded code point. If the code point is not
well-formed, then a is thrown and remains
unchanged.
|
decodeFront
|
is a variant of decode which specifically decodes
the first code point. Unlike decode , accepts any
input range of code units (rather than just a string or random access
range). It also takes the range by ref and pops off the elements as it
decodes them. If is passed in, it gets set to the number
of code units which were in the code point which was decoded.
|
encode
|
Encodes in 's encoding and appends it to .
|
encode
|
Encodes into the static array, , and returns the actual
length of the encoded character (a number between 1 and 4 for
char[4] buffers and a number between 1 and 2 for
wchar[2] buffers).
|
encode
|
Encodes in 's encoding and appends it to .
|
isValidDchar
|
Returns whether is a valid UTF-32 character.
|
stride
|
returns the length of the UTF-32 sequence starting at
in .
|
stride
|
returns the length of the UTF-16 sequence starting at
in .
|
stride
|
returns the length of the UTF-8 sequence starting at
in .
|
stride
|
returns the length of the UTF-16 sequence starting at
in .
|
strideBack
|
returns the length of the UTF-32 sequence ending one code
unit before in .
|
strideBack
|
returns the length of the UTF-16 sequence ending one code
unit before in .
|
strideBack
|
returns the length of the UTF-8 sequence ending one code
unit before in .
|
toUCSindex
|
Given into and assuming that is at the start
of a UTF sequence, determines the number of UCS characters
up to . So, is the index of a code unit at the
beginning of a code point, and the return value is how many code points into
the string that that code point is.
|
toUTF16
|
Encodes string into UTF-16 and returns the encoded string.
|
toUTF16z
|
is a convenience function for .
|
toUTF32
|
Encodes string s into UTF-32 and returns the encoded string.
|
toUTF8
|
Encodes string s into UTF-8 and returns the encoded string.
|
toUTFindex
|
Given a UCS index into , returns the UTF index.
So, is how many code points into the string the code point is, and
the array index of the code unit is returned.
|
validate
|
Checks to see if is well-formed unicode or not.
|
Classes
Name | Description |
---|---|
UTFException
|
Exception thrown on errors in std.utf functions.
|
Templates
Name | Description |
---|---|
byUTF
|
Iterate an input range of characters by char type C. |
toUTFz
|
Returns a C-style zero-terminated string equivalent to str . str
must not contain embedded '\0' 's as any C function will treat the first
'\0' that it sees as the end of the string. If str.empty is
true , then a string containing only '\0' is returned.
|
Enum values
Name | Type | Description |
---|---|---|
replacementDchar
|
Inserted in place of invalid UTF sequences. |
Aliases
Name | Type | Description |
---|---|---|
byChar
|
Iterate an input range of characters by char, wchar, or dchar.
These aliases simply forward to byUTF with the
corresponding C argument.
|
|
byDchar
|
Iterate an input range of characters by char, wchar, or dchar.
These aliases simply forward to byUTF with the
corresponding C argument.
|
|
byWchar
|
Iterate an input range of characters by char, wchar, or dchar.
These aliases simply forward to byUTF with the
corresponding C argument.
|
|
UseReplacementDchar
|
Flag!("useReplacementDchar")
|
Whether or not to replace invalid UTF with replacementDchar
|
Authors
Walter Bright and Jonathan M Davis