[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] Typed Array setter for partial arrays (and typed array performance)



That sounds great. I can't wait to see those changes!
The primary performance issues I've been dealing with involve long GCs causing dropped frames and long processing time on streaming data loads. Being able to shift processing to worker threads without creating more garbage would really allow for the quality bar to bump up a bit. Dropped frames = death!

--
Ben Vanik
http://www.noxa.org


On Thu, Apr 21, 2011 at 4:12 PM, Kenneth Russell <kbr@google.com> wrote:
On Thu, Apr 21, 2011 at 3:58 PM, Ben Vanik <ben.vanik@gmail.com> wrote:
> Yep - DataView is useful for reading out headers and other things, but if
> you have content inside your messages it's not suitable.
> Say I had a large command buffer in wire format - a bunch of commands
> (small, binary serialized messages) and interspersed in there is some data
> (large blobs). Reading the commands is easy - new up a DataView on the
> source buffer and start reading them out. The trouble is the blobs - I don't
> want to put a view onto the source buffer for the blob as then if I keep the
> blob around the entire source buffer will stay in memory.

Presumably the blobs are some sort of binary data that are being
uploaded directly to the graphics card? In this case, it is essential
that the endianness of the data sent over the network match that of
the host. You can certainly optimize this in your application and
server, but the DataView was designed for this use case, not the typed
array views.

> If not careful,
> one could end up keeping around thousands of potentially large buffers if
> they used subarray. As apps get larger and are passing around typed arrays
> to various pieces of code this is likely to happen. Instead I have to
> allocate a new array and manually copy the values.
> So maybe I guess what I really want is a slice, which is different than what
> Mark mentioned (which is essentially a memcpy). A proper slice would be
> ideal, as then you could slice the blob and create a whole bunch of new
> views on it, having the backing store now separate from the source. I
> absolutely love that typed arrays has the concept of views, as gc is a big
> perf problem in high performance js right now and being able to prevent
> extra garbage in most cases is great. That said, however, sometimes you just
> really need a copy.
> A slice operation is something that would make sense from a performance
> perspective, as a single call to the runtime to new and copy the data should
> always be faster than a new and then an copy. Plus, if js code can be faster
> than the system memcpy, the system memcpy must really suck ;)
> All that said, when working with large blobs things like memcpy are really
> handy - having a nice generalized way to do that (in Mark's suggestion, an
> overload of set) would be a useful primitive. Of course, it's pretty easy to
> build function memcpy() {}, and if there really is no perf benefit then meh.
> Only one way to find out!

As part of the planned Typed Array API changes to support efficient
communication with web workers, the plan is to add convenience methods
to copy ArrayBuffers and possibly sub-portions of them. I think we
should invest our time in moving those changes forward.

-Ken

> --
> Ben Vanik
> http://www.noxa.org
>
>
> On Thu, Apr 21, 2011 at 3:25 PM, Kenneth Russell <kbr@google.com> wrote:
>>
>> Mark: the poor performance of the set(Array, offset) variant in
>> Chromium is surprising, and I would encourage you to file a bug about
>> it. However, in an optimizing _javascript_ VM, writing the associated
>> _javascript_ code to do these sorts of operations will be faster than
>> adding another built-in, because in the former case the JIT has a
>> chance to generate specialized code, and in the latter case you're
>> calling in to and out of the VM's runtime system (typically written in
>> C++) which can't perform the same sorts of optimizations.
>>
>> I am not personally convinced that we should add this new entry point.
>> Especially in low-level APIs like typed arrays, every new one has a
>> high maintenance cost. Also, there are API changes lined up which will
>> address known performance bottlenecks on the web platform that I think
>> should take priority.
>>
>> Ben: since it sounds like you are doing network I/O, should you be
>> using DataView instead of the typed array view types?
>>
>> -Ken
>>
>> On Thu, Apr 21, 2011 at 11:57 AM, Ben Vanik <ben.vanik@gmail.com> wrote:
>> > I agree this method would be really helpful. If the offset/count values
>> > were
>> > in bytes, it would allow for some awesome uses of typed arrays.
>> > The primary use case that comes to mind is packed data transfer. I've
>> > been
>> > experimenting with loading geometry/textures/etc from data blobs, and
>> > being
>> > able to subset the data efficiently (without using views, as I want an
>> > actual copy for modification) would make things much nicer. The other
>> > side
>> > of this is using it to construct data blobs - saving off content from
>> > client->server that contains large typed array chunks would greatly
>> > benefit
>> > from the speed boost.
>> > And as a nodejs user, this would be a tremendously useful thing when
>> > using
>> > protocol buffers and other sorts of binary message formats where perf
>> > really
>> > matters or doing large file system manipulations.
>> > As for the microbenchmarks, I've noticed the same thing. I just whipped
>> > this
>> > one up yesterday for testing out some common patterns I need for image
>> > processing:
>> > http://jsperf.com/typed-arrays-vs-arrays
>> > The time for creating and initializing a JS array is two orders of
>> > magnitude
>> > longer than for typed arrays, but all other operations are 2x+ faster
>> > than
>> > typed arrays.
>> > I threw in CanvasPixelArray just to see if there were any
>> > special optimizations there - it's pretty much the same as Uint8Array
>> > (which
>> > makes sense).
>> > Here's a great blog post on perf comparison that just came
>> > out: http://blog.n01se.net/?p=248
>> > --
>> > Ben Vanik
>> > http://www.noxa.org
>> >
>> >
>> > 2011/4/21 Mark Callow <callow_mark@hicorp.co.jp>
>> >>
>> >> Hi,
>> >>
>> >> Because I keep needing to do it, I have become irritated by the lack of
>> >> a
>> >> function in the typed array specification do copy the partial contents
>> >> of a
>> >> JS source array, something like:
>> >>
>> >>      void set(type[] array, optional unsigned long offset, unsigned
>> >> long
>> >> srcOffset, unsigned long count)
>> >>
>> >> The obvious answer is a wrapper but I suspected that a loop of, e.g,
>> >> i16array[i] = src[i] in JS would be slower than something internal to
>> >> the
>> >> typed array implementation. I wrote a short test for this. The result
>> >> on FF4
>> >> is as I expected:
>> >>
>> >> Int16Array.set(2000 byte array) x 100 times took 1ms (400000000
>> >> bytes/second).
>> >>
>> >> Int16Array[j] = data[j] 2000 x 100 times took 4ms (100000000
>> >> bytes/second).
>> >>
>> >> Int16Array.set(200000 byte array) x 1 times took 1ms (400000000
>> >> bytes/second).
>> >>
>> >> Int16Array[j] = data[j] 200000 x 1 times took 4ms (100000000
>> >> bytes/second).
>> >>
>> >> To ensure the result is not influenced by smart optimizers the test is
>> >> repeated with a longer array and a single iteration. The bytes copied
>> >> is the
>> >> same in each case.
>> >>
>> >> The "wrapper" runs at one quarter the speed of a native implementation
>> >> so
>> >> I think the above described set function is a badly needed addition to
>> >> typed
>> >> arrays.
>> >>
>> >> If the source is a typed array, one can always create another view or
>> >> subarray so it is not so important to add a new function for this case,
>> >> though there is the issue of the garbage that must then be collected.
>> >>
>> >> When I ran the same test on Chromium 12.0.717.0 (79525) I got a very
>> >> surprising result:
>> >>
>> >> Int16Array.set(2000 byte array) x 100 times took 49ms
>> >> (8163265.306122449
>> >> bytes/second).
>> >>
>> >> Int16Array[j] = data[j] 2000 x 100 times took 6ms (66666666.666666664
>> >> bytes/second).
>> >>
>> >> Int16Array.set(200000 byte array) x 1 times took 38ms
>> >> (10526315.789473685
>> >> bytes/second).
>> >>
>> >> Int16Array[j] = data[j] 200000 x 1 times took 24ms (16666666.666666666
>> >> bytes/second).
>> >>
>> >> It is surprising for 3 reasons:
>> >>
>> >> the overall poor performance
>> >> the fact that the Int16Array.set takes longer than a JS loop setting
>> >> individual elements
>> >> the fact that a single loop of 200,000 setting individual elements took
>> >> 4
>> >> times longer than a double loop of 100 x 2000.
>> >>
>> >> The test is attached.
>> >>
>> >> Regards
>> >>
>> >> -Mark
>> >>
>> >>
>> >
>> >
>
>