The array with UTF-data.
A struct with the following members:
ubyte[] array A slice of the input array containing data in correct byte order, without BOM and in case of UTF-16/UTF-32, without stripped bytes, if any. UTFEncoding encoding Encoding of the result (UTF-8, UTF-16 or UTF-32) std.system.Endian endian Endianness of the original array. uint bytesStripped Number of bytes stripped from a UTF-16/UTF-32 array, if any. This is non-zero only if array.length was not divisible by 2 or 4 for UTF-16 and UTF-32, respectively.
Complexity: (BIGOH array.length)
{ ubyte[] s = [0xEF, 0xBB, 0xBF, 'a']; FixUTFByteOrderResult r = fixUTFByteOrder(s); assert(r.encoding == UTFEncoding.UTF_8); assert(r.array.length == 1); assert(r.array == ['a']); assert(r.endian == Endian.littleEndian); } { ubyte[] s = ['a']; FixUTFByteOrderResult r = fixUTFByteOrder(s); assert(r.encoding == UTFEncoding.UTF_8); assert(r.array.length == 1); assert(r.array == ['a']); assert(r.endian == Endian.bigEndian); } { // strip 'a' b/c not complete unit ubyte[] s = [0xFE, 0xFF, 'a']; FixUTFByteOrderResult r = fixUTFByteOrder(s); assert(r.encoding == UTFEncoding.UTF_16); assert(r.array.length == 0); assert(r.endian == Endian.bigEndian); }
Convert byte order of an array encoded in UTF(8/16/32) to system endianness in place.
Uses the UTF byte-order-mark (BOM) to determine UTF encoding. If there is no BOM at the beginning of array, UTF-8 is assumed (this is compatible with ASCII). The BOM, if any, will be removed from the buffer.
If the encoding is determined to be UTF-16 or UTF-32 and there aren't enough bytes for the last code unit (i.e. if array.length is odd for UTF-16 or not divisible by 4 for UTF-32), the extra bytes (1 for UTF-16, 1-3 for UTF-32) are stripped.
Note that this function does not check if the array is a valid UTF string. It only works with the BOM and 1,2 or 4-byte items.