[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] typed array convenience

On Tue, Oct 28, 2014 at 5:21 PM, Jukka Jylänki <jujjyl@gmail.com> wrote:
Being able to do typed array memcpys without allocating new typed array views would be very much preferred by Emscripten applications. A lot of Emscripten-generated garbage comes from this origin, and the impact of this to Firefox is being discussed at https://bugzilla.mozilla.org/show_bug.cgi?id=936168

Thanks for the link, lot's of interesting insight there!
In Emscripten specifically to implement memcpy, we have observed typed array .set() to be slower than manually looping over bytes for small copies, so in Emscripten, short <4K element memcpys do a for loop over the data, and only larger ones do a typed array .set() call. I suspect Florian's example of very small ~1K range of copying won't get any faster if there are JS function calls involved.

Yep. I've mostly dealt with audio, where (due to the current script processing strategy of Web Audio API implementations) the buffer sizes are generally around 8k samples or more, and set() has been the most efficient way to deal with copying there. Even for zeroing an array, the fastest strategy has been to dst.set(new Float32Array(dst.length)), even with the GC costs. I've sometimes gone as far as to have empty arrays lying around for zeroing, and/or sending array buffers I know to be GC'd to an empty worker... Don't take my word for it though, it's been more than a year since I was last doing this sort of stuff actively.

I think while it can be improved, set() is pretty good for memset, but FWIW in the case of filling arrays with arbitrary data, native implementations won't be of much help. I like to think of JS as a distributed system in terms of performance, where you want to pass small data in close proximity and larger data when the distance grows (where native implementations are far and JS implementations are close).

If you care about performance, eventually you'll probably want to use SIMD intrinsics for most of the array filling cases anyway (which will of course unfortunately be even harder to read than the currently fastest implementation), so adding a method specifically for the performance characteristics of that use case seems premature.

But to come back to my zeroing example, I think it would be super-handy to have a method for filling an array with a specific number, e.g. zero. This has been one of the biggest performance headaches I've had in implementing DSP stuff, where just iterating over a large array and setting all values to one specific value is severely slower than using set() on a pre-defined array. The Web Array Math [1] proposal defines other filling methods, such as filling with random or a range, but I think resetting the array should be something defined in the Typed Array spec itself.

BTW, this discussion should probably take place in es-discuss, given that ES adopted typed arrays into its spec.

- Jussi

[1] http://opera-mage.github.io/webarraymath/
2014-10-28 16:56 GMT+02:00 Jussi Kalliokoski <jussi.kalliokoski@gmail.com>:
On Tue, Oct 28, 2014 at 4:19 PM, Florian Bösch <pyalot@gmail.com> wrote:

On Tue, Oct 28, 2014 at 2:42 PM, Jussi Kalliokoski <jussi.kalliokoski@gmail.com> wrote:
On Tue, Oct 28, 2014 at 3:15 PM, Mark Callow <khronos@callow.im> wrote:

On Oct 28, 2014, at 8:04 PM, Florian Bösch <pyalot@gmail.com> wrote:

I propose an addition to the typed array specification that introduces these methods like
  • argcpy(uint dstOffset, arguments...)
  • arrcpy(uint dstOffset, Array|TypedArrayView someArray)
dst.set(someArray, dstOffset);

dst.set(0, 1, 2, 3, 4) --> invalid argument errort, not a substitute for argcpy suggestion.

Didn't try to be. :)
dst.set([1,2,3,4], dstOffset) is substantially slower than one at a time assignment (it's the second slowest method in fact). Updated jsperf http://jsperf.com/typed-array-fast-filling, attached screenshot http://codeflow.org/pictures/typed-array-test.png, blue on the bottom is array.set.

That can only be improved by improving the implementations, adding a new method that does the exact same thing (except that the arguments are reversed) won't help. I doubt the arguments-based one would be any better, since I'm quite skeptic that arguments objects have better performance characteristics (in terms of memory, GC or allocation time) than arrays.
  • memcpy(uint dstByteOffset, TypedArray[View] origin, uint srcByteOffset, uint srcByteCount)
dst.set(src.subarray(srcOffset, srcOffset + srcByteCount), dstOffset);

You do not want to allocate an object a million times if you can avoid it in an on-line preprocessing step to copy memory from one array to another. Likewise, you do not want to allocate a GCed object oftentime at runtime for fairly obvious reasons.

Obviously. Maybe we could introduce optional arguments for srcOffset and srcEnd to set(). Or add a new method to avoid overloading costs.

- Jussi