View source code Display the source code in std/encoding.d from which this page was generated on github. Improve this page Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone. Page wiki View or edit the community-maintained wiki page associated with this page.

Module std.encoding

Classes and functions for handling and transcoding between various encodings.

For cases where the encoding is known at compile-time, functions are provided for arbitrary encoding and decoding of characters, arbitrary transcoding between strings of different type, as well as validation and sanitization.

Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250 and WINDOWS-1252.

For cases where the encoding is not known at compile-time, but is known at run-time, we provide the abstract class EncodingScheme and its subclasses. To construct a run-time encoder/decoder, one does e.g.

auto e = EncodingScheme.create("utf-8");

This library supplies EncodingScheme subclasses for ASCII, ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250, WINDOWS-1252, UTF-8, and (on little-endian architectures) UTF-16LE and UTF-32LE; or (on big-endian architectures) UTF-16BE and UTF-32BE.

This library provides a mechanism whereby other modules may add EncodingScheme subclasses for any other encoding.

Functions

Name Description
canEncode Returns true iff it is possible to represent the specified codepoint in the encoding.
codePoints Returns a foreachable struct which can bidirectionally iterate over all code points in a string.
codeUnits Returns a foreachable struct which can bidirectionally iterate over all code units in a code point.
decode Decodes a single code point.
decodeReverse Decodes a single code point from the end of a string.
encode Encodes the contents of s in units of type Tgt, writing the result to an output range.
encode Encodes a single code point to a delegate.
encode Encodes a single code point into an array.
encode Encodes a single code point.
encodedLength Returns the number of code units required to encode a single code point.
encodingName Returns the name of an encoding.
firstSequence Returns the length of the first encoded sequence.
index Returns the array index at which the (n+1)th code point begins.
isValid Returns true if the string is encoded correctly
isValidCodePoint Returns true if c is a valid code point
isValidCodeUnit Returns true if the code unit is legal. For example, the byte 0x80 would not be legal in ASCII, because ASCII code units must always be in the range 0x00 to 0x7F.
lastSequence Returns the length of the last encoded sequence.
safeDecode Decodes a single code point. The input does not have to be valid.
sanitize Sanitizes a string by replacing malformed code unit sequences with valid code unit sequences. The result is guaranteed to be valid for this encoding.
transcode Convert a string from one encoding to another.
validLength Returns the length of the longest possible substring, starting from the first code unit, which is validly encoded.

Classes

Name Description
EncodingException The base class for exceptions thrown by this module
EncodingScheme Abstract base class of all encoding schemes
EncodingSchemeASCII EncodingScheme to handle ASCII
EncodingSchemeLatin1 EncodingScheme to handle Latin-1
EncodingSchemeLatin2 EncodingScheme to handle Latin-2
EncodingSchemeUtf16Native EncodingScheme to handle UTF-16 in native byte order
EncodingSchemeUtf32Native EncodingScheme to handle UTF-32 in native byte order
EncodingSchemeUtf8 EncodingScheme to handle UTF-8
EncodingSchemeWindows1250 EncodingScheme to handle Windows-1250
EncodingSchemeWindows1252 EncodingScheme to handle Windows-1252

Enums

Name Description
AsciiChar Defines various character sets.
Latin1Char Defines an Latin1-encoded character.
Latin2Char Defines a Latin2-encoded character.
Windows1250Char Defines a Wwindows1250-encoded character.
Windows1252Char Defines a Windows1252-encoded character.

Enum values

Name Type Description
INVALID_SEQUENCE Special value returned by safeDecode

Aliases

Name Type Description
AsciiString immutable(AsciiChar)[] Defines various character sets.
Latin1String immutable(Latin1Char)[] Defines an Latin1-encoded string (as an array of immutable(Latin1Char)).
Latin2String immutable(Latin2Char)[] Defines an Latin2-encoded string (as an array of immutable(Latin2Char)).
Windows1250String immutable(Windows1250Char)[] Defines an Windows1250-encoded string (as an array of immutable(Windows1250Char)).
Windows1252String immutable(Windows1252Char)[] Defines an Windows1252-encoded string (as an array of immutable(Windows1252Char)).

Authors

Janice Caron

License

Boost License 1.0.

Comments