Module std.encoding
Classes and functions for handling and transcoding between various encodings.
For cases where the encoding is known at compile-time, functions are provided for arbitrary encoding and decoding of characters, arbitrary transcoding between strings of different type, as well as validation and sanitization.
Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), and WINDOWS-1252.
- The type
represents an ASCII character.AsciiChar
- The type
represents an ASCII string.AsciiString
- The type
represents an ISO-8859-1 character.Latin1Char
- The type
represents an ISO-8859-1 string.Latin1String
- The type
represents a Windows-1252 character.Windows1252Char
- The type
represents a Windows-1252 string.Windows1252String
For cases where the encoding is not known at compile-time, but is
known at run-time, we provide the abstract class
and its subclasses. To construct a run-time encoder/decoder, one does
e.g.
EncodingScheme
auto e = EncodingScheme.create("utf-8");
This library supplies
subclasses for ASCII,
ISO-8859-1 (also known as LATIN-1), WINDOWS-1252, UTF-8, and (on
little-endian architectures) UTF-16LE and UTF-32LE; or (on big-endian
architectures) UTF-16BE and UTF-32BE.
EncodingScheme
This library provides a mechanism whereby other modules may add
subclasses for any other encoding.
EncodingScheme
Functions
Name | Description |
---|---|
canEncode
|
Returns true iff it is possible to represent the specified codepoint in the encoding. |
codePoints
|
Returns a foreachable struct which can bidirectionally iterate over all code points in a string. |
codeUnits
|
Returns a foreachable struct which can bidirectionally iterate over all code units in a code point. |
decode
|
Decodes a single code point. |
decodeReverse
|
Decodes a single code point from the end of a string. |
encode
|
Encodes the contents of in units of type Tgt , writing the result to an
output range .
|
encode
|
Encodes a single code point to a delegate. |
encode
|
Encodes a single code point into an array .
|
encode
|
Encodes a single code point. |
encodedLength
|
Returns the number of code units required to encode a single code point.
|
encodingName
|
Returns the name of an encoding. |
firstSequence
|
Returns the length of the first encoded sequence. |
index
|
Returns the array index at which the (n +1)th code point begins.
|
isValid
|
Returns true if the string is encoded correctly |
isValidCodePoint
|
Returns true if c is a valid code point
|
isValidCodeUnit
|
Returns true if the code unit is legal. For example, the byte 0x80 would not be legal in ASCII, because ASCII code units must always be in the range 0x00 to 0x7F. |
lastSequence
|
Returns the length of the last encoded sequence. |
safeDecode
|
Decodes a single code point. The input does not have to be valid. |
sanitize
|
Sanitizes a string by replacing malformed code unit sequences with valid code unit sequences. The result is guaranteed to be valid for this encoding. |
transcode
|
Convert a string from one encoding to another. |
validLength
|
Returns the length of the longest possible substring, starting from the first code unit, which is validly encoded. |
Classes
Name | Description |
---|---|
EncodingException
|
The base class for exceptions thrown by this module |
EncodingScheme
|
Abstract base class of all encoding schemes |
EncodingSchemeASCII
|
EncodingScheme to handle ASCII
|
EncodingSchemeLatin1
|
EncodingScheme to handle Latin-1
|
EncodingSchemeUtf16Native
|
EncodingScheme to handle UTF-16 in native byte order
|
EncodingSchemeUtf32Native
|
EncodingScheme to handle UTF-32 in native byte order
|
EncodingSchemeUtf8
|
EncodingScheme to handle UTF-8
|
EncodingSchemeWindows1252
|
EncodingScheme to handle Windows-1252
|
Enums
Name | Description |
---|---|
AsciiChar
|
Defines various character sets. |
Latin1Char
|
Defines an Latin1-encoded character. |
Windows1252Char
|
Defines a Windows1252-encoded character. |
Enum values
Name | Type | Description |
---|---|---|
INVALID_SEQUENCE
|
Special value returned by
|
Aliases
Name | Type | Description |
---|---|---|
AsciiString
|
immutable(AsciiChar)[]
|
Defines various character sets. |
Latin1String
|
immutable(Latin1Char)[]
|
Defines an Latin1-encoded string (as an array of immutable( ).
|
Windows1252String
|
immutable(Windows1252Char)[]
|
Defines an Windows1252-encoded string (as an array of immutable( ).
|
Authors
Janice Caron