[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] Typed Array setter for partial arrays (and typed array performance)

Yep - DataView is useful for reading out headers and other things, but if you have content inside your messages it's not suitable.

Say I had a large command buffer in wire format - a bunch of commands (small, binary serialized messages) and interspersed in there is some data (large blobs). Reading the commands is easy - new up a DataView on the source buffer and start reading them out. The trouble is the blobs - I don't want to put a view onto the source buffer for the blob as then if I keep the blob around the entire source buffer will stay in memory. If not careful, one could end up keeping around thousands of potentially large buffers if they used subarray. As apps get larger and are passing around typed arrays to various pieces of code this is likely to happen. Instead I have to allocate a new array and manually copy the values.

So maybe I guess what I really want is a slice, which is different than what Mark mentioned (which is essentially a memcpy). A proper slice would be ideal, as then you could slice the blob and create a whole bunch of new views on it, having the backing store now separate from the source. I absolutely love that typed arrays has the concept of views, as gc is a big perf problem in high performance js right now and being able to prevent extra garbage in most cases is great. That said, however, sometimes you just really need a copy.

A slice operation is something that would make sense from a performance perspective, as a single call to the runtime to new and copy the data should always be faster than a new and then an copy. Plus, if js code can be faster than the system memcpy, the system memcpy must really suck ;)

All that said, when working with large blobs things like memcpy are really handy - having a nice generalized way to do that (in Mark's suggestion, an overload of set) would be a useful primitive. Of course, it's pretty easy to build function memcpy() {}, and if there really is no perf benefit then meh. Only one way to find out!

Ben Vanik

On Thu, Apr 21, 2011 at 3:25 PM, Kenneth Russell <kbr@google.com> wrote:
Mark: the poor performance of the set(Array, offset) variant in
Chromium is surprising, and I would encourage you to file a bug about
it. However, in an optimizing _javascript_ VM, writing the associated
_javascript_ code to do these sorts of operations will be faster than
adding another built-in, because in the former case the JIT has a
chance to generate specialized code, and in the latter case you're
calling in to and out of the VM's runtime system (typically written in
C++) which can't perform the same sorts of optimizations.

I am not personally convinced that we should add this new entry point.
Especially in low-level APIs like typed arrays, every new one has a
high maintenance cost. Also, there are API changes lined up which will
address known performance bottlenecks on the web platform that I think
should take priority.

Ben: since it sounds like you are doing network I/O, should you be
using DataView instead of the typed array view types?


On Thu, Apr 21, 2011 at 11:57 AM, Ben Vanik <ben.vanik@gmail.com> wrote:
> I agree this method would be really helpful. If the offset/count values were
> in bytes, it would allow for some awesome uses of typed arrays.
> The primary use case that comes to mind is packed data transfer. I've been
> experimenting with loading geometry/textures/etc from data blobs, and being
> able to subset the data efficiently (without using views, as I want an
> actual copy for modification) would make things much nicer. The other side
> of this is using it to construct data blobs - saving off content from
> client->server that contains large typed array chunks would greatly benefit
> from the speed boost.
> And as a nodejs user, this would be a tremendously useful thing when using
> protocol buffers and other sorts of binary message formats where perf really
> matters or doing large file system manipulations.
> As for the microbenchmarks, I've noticed the same thing. I just whipped this
> one up yesterday for testing out some common patterns I need for image
> processing:
> http://jsperf.com/typed-arrays-vs-arrays
> The time for creating and initializing a JS array is two orders of magnitude
> longer than for typed arrays, but all other operations are 2x+ faster than
> typed arrays.
> I threw in CanvasPixelArray just to see if there were any
> special optimizations there - it's pretty much the same as Uint8Array (which
> makes sense).
> Here's a great blog post on perf comparison that just came
> out: http://blog.n01se.net/?p=248
> --
> Ben Vanik
> http://www.noxa.org
> 2011/4/21 Mark Callow <callow_mark@hicorp.co.jp>
>> Hi,
>> Because I keep needing to do it, I have become irritated by the lack of a
>> function in the typed array specification do copy the partial contents of a
>> JS source array, something like:
>>      void set(type[] array, optional unsigned long offset, unsigned long
>> srcOffset, unsigned long count)
>> The obvious answer is a wrapper but I suspected that a loop of, e.g,
>> i16array[i] = src[i] in JS would be slower than something internal to the
>> typed array implementation. I wrote a short test for this. The result on FF4
>> is as I expected:
>> Int16Array.set(2000 byte array) x 100 times took 1ms (400000000
>> bytes/second).
>> Int16Array[j] = data[j] 2000 x 100 times took 4ms (100000000
>> bytes/second).
>> Int16Array.set(200000 byte array) x 1 times took 1ms (400000000
>> bytes/second).
>> Int16Array[j] = data[j] 200000 x 1 times took 4ms (100000000
>> bytes/second).
>> To ensure the result is not influenced by smart optimizers the test is
>> repeated with a longer array and a single iteration. The bytes copied is the
>> same in each case.
>> The "wrapper" runs at one quarter the speed of a native implementation so
>> I think the above described set function is a badly needed addition to typed
>> arrays.
>> If the source is a typed array, one can always create another view or
>> subarray so it is not so important to add a new function for this case,
>> though there is the issue of the garbage that must then be collected.
>> When I ran the same test on Chromium 12.0.717.0 (79525) I got a very
>> surprising result:
>> Int16Array.set(2000 byte array) x 100 times took 49ms (8163265.306122449
>> bytes/second).
>> Int16Array[j] = data[j] 2000 x 100 times took 6ms (66666666.666666664
>> bytes/second).
>> Int16Array.set(200000 byte array) x 1 times took 38ms (10526315.789473685
>> bytes/second).
>> Int16Array[j] = data[j] 200000 x 1 times took 24ms (16666666.666666666
>> bytes/second).
>> It is surprising for 3 reasons:
>> the overall poor performance
>> the fact that the Int16Array.set takes longer than a JS loop setting
>> individual elements
>> the fact that a single loop of 200,000 setting individual elements took 4
>> times longer than a double loop of 100 x 2000.
>> The test is attached.
>> Regards
>> -Mark