[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WebGL2 and no mapBuffer/mapBufferRange



On Tue, Mar 17, 2015 at 2:39 PM, Jeff Gilbert <jgilbert@mozilla.com> wrote:
> Just warn on WRITE without INVALIDATE.
>
> Here's what the costs look like to me:
>
> MapBufferRange(READ) ~= GetBufferSubData:
> Both require a synchronous GL command followed by a copy.
>
> MapBufferRange(WRITE|INVALIDATE) ~= BufferSubData
> MapBufferRange can create a scratch shmem for writing via ArrayBuffer, send
> it across IPC on flush, and do Map+memcpy on the GL process. (1 copy)
> BufferSubData is at best from an ArrayBuffer which is already shmem, and is
> then a copy-on-write (ideally no-copy), but still needs to call
> BufferSubData or Map+memcpy on GL process. (2 copies, but only 1 copy if you
> have a heuristic which allocates shmem to ArrayBuffers)

This scenario is my main concern. out-of-process-GL will have at least
one extra copying comparing with in-process-GL.  Since it's likely on
the critical rendering path, this diff will create a huge perf gap
among implementations. IMHO, this is really bad for WebGL as a
standard.

>
> With UNSYNCHRONIZED, MapBufferRange can be 'sharper', but potentially more
> performant:
> READ|UNSYNC: Still synchronous, but lets the GL process use UNSYNCHRONIZED
> to prevent stalls.
> WRITE|INVAL|UNSYNC: Still async, but lets the GL process use UNSYNCHRONIZED
> to prevent stalls.

I don't think we can allow UNSYNCHRONIZED bit to reach the underlying
GL. That's leads to undefined behavior.

>
> FLUSH_EXPLICIT lets out-of-process GL reduce the amount of data it needs to
> memcpy while not having to allocate many smaller chunks with BufferSubData.
> (Multiple discard+write ranges with the same single shmem scratch buffer)
>
> WRITE without INVAL is probably much slower than WRITE|INVAL even on
> in-process-GL implementations.

How? Assuming you map, write to some, flush, write to some other,
flush... unmap.  So unless you change the ArrayBuffer semantics to
keep track of dirty/clean states for each element, otherwise each
flush is to write back the entire range.

>
> In light of these investigations, I'm increasingly against omitting
> MapBuffer from the WebGL 2 spec.
>
> On Tue, Mar 17, 2015 at 1:58 PM, Zhenyao Mo <zmo@chromium.org> wrote:
>>
>> On Tue, Mar 17, 2015 at 12:15 PM, Florian Bösch <pyalot@gmail.com> wrote:
>> > On Tue, Mar 17, 2015 at 6:35 PM, Zhenyao Mo <zmo@chromium.org> wrote:
>> >>
>> >> With WRITE bit, without INVALIDATE bit, for out-of-process
>> >> implementations, we have to append READ bit internally, and read out
>> >> the whole buffer range, send it to the js, so it can be written to
>> >> partially.  Otherwise, how can you write back in unmap time? unless
>> >> you keep track of which elements in the buffer range have been written
>> >> to and which remain untouched.
>> >
>> > You're thinking of one particular way to implement it. The way you're
>> > thinking of, is to synchronize the copy, and then flush the whole copy
>> > on
>> > unmap. That's why you're talking of readback for write.
>> >
>> > Readback for write makes no sense. An alternative to this necessarily
>> > slow
>> > and cumbersome idea, would be to transfer the data to be written, and
>> > write
>> > that data to the appropriate underlying mapped range. No readback, no
>> > huge
>> > in/out of process performance differences.
>> >
>> > The queue of writes to perform doesn't need to be individually
>> > transferred
>> > and performed either, an out of process implementation would be free to
>> > delay writes to the underlying till it's queued/collated enough writes
>> > into
>> > a block for IPC transfer.
>>
>> I am aware of this alternative, but that's not currently supported by
>> js.  Basically you return a writable buffer to js by MapBufferRange.
>> Now you need to keep track of which part are dirty and which are
>> clean.  I am not saying it's impossible, but requires new semantics
>> and optimization.
>>
>> I don't see this can be better than BufferSubData in out-of-process
>> implementations.
>>
>> When we expose something in core WebGL, we expect it to be implemented
>> the same/similar (semantics, perf, etc) in various platforms (OSs,
>> browser vendors, GPUs). If that's not possible, I think extensions are
>> the better way, with the correct expectation that this (not just
>> semantics, but the perf implication) may not be supported everywhere.
>>
>> Otherwise, let's say a developer implemented a game using
>> MapBufferRange in one browser, but users from another browser can't
>> play it at all even on the same hardware because it's too slow, that's
>> not good for WebGL at all.
>>
>> There are many other examples that we have to hold back features that
>> developers would love, for example, certain compressed texture formats
>> not supported on iOS, etc. It may seem frustrating but in the long
>> run, good for WebGL as a web standard.
>>
>> >
>> >>
>> >> For out-of-process implementations, using the invalidate bit makes a
>> >> huge per difference.  for Map call, it doesn't have to wait for the
>> >> service side to return the buffer range, it can just allocate a buffer
>> >> and initialize to 0 and allow js to write to it.  Otherwise, it's a
>> >> blocking call until service side responded with the readback buffer
>> >> data.
>> >
>> > This argument is borne again out of this flawed idea how to do things
>> > which
>> > you imply as a given, but it isn't, see above.
>
>

-----------------------------------------------------------
You are currently subscribed to public_webgl@khronos.org.
To unsubscribe, send an email to majordomo@khronos.org with
the following command in the body of your email:
unsubscribe public_webgl
-----------------------------------------------------------