fixUTFByteOrder

Convert byte order of an array encoded in UTF(8/16/32) to system endianness in place.

Uses the UTF byte-order-mark (BOM) to determine UTF encoding. If there is no BOM at the beginning of array, UTF-8 is assumed (this is compatible with ASCII). The BOM, if any, will be removed from the buffer.

If the encoding is determined to be UTF-16 or UTF-32 and there aren't enough bytes for the last code unit (i.e. if array.length is odd for UTF-16 or not divisible by 4 for UTF-32), the extra bytes (1 for UTF-16, 1-3 for UTF-32) are stripped.

Note that this function does not check if the array is a valid UTF string. It only works with the BOM and 1,2 or 4-byte items.

@safe @nogc pure nothrow
fixUTFByteOrder
(
ubyte[] array
)

Parameters

array ubyte[]

The array with UTF-data.

Return Value

Type: auto

A struct with the following members:

ubyte[] array A slice of the input array containing data in correct byte order, without BOM and in case of UTF-16/UTF-32, without stripped bytes, if any. UTFEncoding encoding Encoding of the result (UTF-8, UTF-16 or UTF-32) std.system.Endian endian Endianness of the original array. uint bytesStripped Number of bytes stripped from a UTF-16/UTF-32 array, if any. This is non-zero only if array.length was not divisible by 2 or 4 for UTF-16 and UTF-32, respectively.

Complexity: (BIGOH array.length)

Examples

{
    ubyte[] s = [0xEF, 0xBB, 0xBF, 'a'];
    FixUTFByteOrderResult r = fixUTFByteOrder(s);
    assert(r.encoding == UTFEncoding.UTF_8);
    assert(r.array.length == 1);
    assert(r.array == ['a']);
    assert(r.endian == Endian.littleEndian);
}

{
    ubyte[] s = ['a'];
    FixUTFByteOrderResult r = fixUTFByteOrder(s);
    assert(r.encoding == UTFEncoding.UTF_8);
    assert(r.array.length == 1);
    assert(r.array == ['a']);
    assert(r.endian == Endian.bigEndian);
}

{
    // strip 'a' b/c not complete unit
    ubyte[] s = [0xFE, 0xFF, 'a'];
    FixUTFByteOrderResult r = fixUTFByteOrder(s);
    assert(r.encoding == UTFEncoding.UTF_16);
    assert(r.array.length == 0);
    assert(r.endian == Endian.bigEndian);
}

Meta