[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] String/ArrayBuffer encoding/decoding API (follow-up)



Joshua, thanks for the updates.

Florian, the TextEncoder/Decoder APIs were written as separate and
self-contained specifications in order to avoid making the typed array
spec -- and other specs -- much larger and more complex.

-Ken



On Wed, Jan 28, 2015 at 10:11 PM, Florian Bösch <pyalot@gmail.com> wrote:
> I find the semantics a bit awkward. Let's look at how python does it:
>
> unicodestring = somearray.decode('utf-8')
> otherarray = unicodestring.encode('utf-8')
>
>
> Frequently what one does have is an array (a file, a network protocol, what
> have you) that contains strings, so you'd do something along the lines:
>
> unicodestring = somearray[123:456].decode('utf-8')
>
>
> For decoding In this specification this would work like this:
>
> var bytes = new Uint8Array(buffer, 123, 333);
> var decoder = new TextDecoder('utf-8');
> var unicodestring = decoder.decode(bytes);
>
>
> Or if you prefer a oneliner:
>
> var unicodestring = (new TextDecoder('utf-8')).decode(new Uint8Array(buffer,
> 123, 333));
>
> Encoding where you put some piece of data into bytes would go like this:
>
> result += unicode.encode('utf-8')
>
>
> But that's not quite fair a comparison because typed arrays in JS don't have
> concatenation (why not?), so the python equivalent using typed arrays
> (lending from ctypes) would be something like this (pseudocode)
>
> result = (c_byte*1024)();
> result[123:456] = (c_byte*333)(unicodestring.encode('utf-8'))
>
>
> In this specification this would work like this:
>
> result = new Uint8Array(1234);
>
> result.set((new TextDecoder('utf-8')).encode(unicodestring));
>
> That's better, almost as concise as python (but still needing to allocate
> those encoders).
>
> So some changes to the API could simplify this nicely, therefore I propose:
>
> unicodestring.encode('utf-8') --> Uint8Array
> DataView.getString('utf-8', [offset, [length]]) --> string
> Uint8Array.getString('utf-8', [offset, [length]]) --> string
>
> Having that the examples from above would be simplified:
>
> unicodestring = new Uint8Array(buffer, 123, 333).getString('utf-8');
>
> unicodestring = someview.getString('utf-8', 123, 333);
>
> result.set(unicodestring.encode('utf-8'), 123)
>
>
> On a technical note, there's no reason to have a TextDecoder/TextEncoder
> instance. String decoders/encoders are not stateful objects, they don't have
> any resources and they operate oneshot with maximally 4-byte lookahead on a
> string of data. A switch to select a codepath based on the encoding/decoding
> name is entirely sufficient for all cases of needing to convert bytes to
> strings and vice versa.
>
>
> On Thu, Jan 29, 2015 at 12:26 AM, Joshua Bell <jsbell@google.com> wrote:
>>
>> In the very distant past [1] there was discussion about APIs for
>> encoding/decoding string data from ArrayBuffers/DataViews. This resulted in
>> an API being defined as part of the Encoding Living Standard [2].
>>
>> Chrome, Firefox and Opera have been shipping this API for about a year
>> now. A polyfill is also available [3].
>>
>> The important stuff is inter-operable. A few new attributes/flags have
>> been specified but not yet implemented in all browsers. Browsers have also
>> not all converged with the spec for handling every code point of all legacy
>> encodings identically, but we're working on it.
>>
>> There are links at the top of the spec for feedback, but since discussion
>> started here I wanted to close the loop.
>>
>> [1]
>> https://www.khronos.org/webgl/public-mailing-list/archives/1111/msg00017.html
>> [2] https://encoding.spec.whatwg.org
>> [3] https://github.com/inexorabletash/text-encoding
>>
>

-----------------------------------------------------------
You are currently subscribed to public_webgl@khronos.org.
To unsubscribe, send an email to majordomo@khronos.org with
the following command in the body of your email:
unsubscribe public_webgl
-----------------------------------------------------------