[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WebGL2 and no mapBuffer/mapBufferRange



Just warn on WRITE without INVALIDATE.

Here's what the costs look like to me:

MapBufferRange(READ) ~= GetBufferSubData:
Both require a synchronous GL command followed by a copy.

MapBufferRange(WRITE|INVALIDATE) ~= BufferSubData
MapBufferRange can create a scratch shmem for writing via ArrayBuffer, send it across IPC on flush, and do Map+memcpy on the GL process. (1 copy)
BufferSubData is at best from an ArrayBuffer which is already shmem, and is then a copy-on-write (ideally no-copy), but still needs to call BufferSubData or Map+memcpy on GL process. (2 copies, but only 1 copy if you have a heuristic which allocates shmem to ArrayBuffers)

With UNSYNCHRONIZED, MapBufferRange can be 'sharper', but potentially more performant:
READ|UNSYNC: Still synchronous, but lets the GL process use UNSYNCHRONIZED to prevent stalls.
WRITE|INVAL|UNSYNC: Still async, but lets the GL process use UNSYNCHRONIZED to prevent stalls.

FLUSH_EXPLICIT lets out-of-process GL reduce the amount of data it needs to memcpy while not having to allocate many smaller chunks with BufferSubData. (Multiple discard+write ranges with the same single shmem scratch buffer)

WRITE without INVAL is probably much slower than WRITE|INVAL even on in-process-GL implementations.

In light of these investigations, I'm increasingly against omitting MapBuffer from the WebGL 2 spec.

On Tue, Mar 17, 2015 at 1:58 PM, Zhenyao Mo <zmo@chromium.org> wrote:
On Tue, Mar 17, 2015 at 12:15 PM, Florian Bösch <pyalot@gmail.com> wrote:
> On Tue, Mar 17, 2015 at 6:35 PM, Zhenyao Mo <zmo@chromium.org> wrote:
>>
>> With WRITE bit, without INVALIDATE bit, for out-of-process
>> implementations, we have to append READ bit internally, and read out
>> the whole buffer range, send it to the js, so it can be written to
>> partially.  Otherwise, how can you write back in unmap time? unless
>> you keep track of which elements in the buffer range have been written
>> to and which remain untouched.
>
> You're thinking of one particular way to implement it. The way you're
> thinking of, is to synchronize the copy, and then flush the whole copy on
> unmap. That's why you're talking of readback for write.
>
> Readback for write makes no sense. An alternative to this necessarily slow
> and cumbersome idea, would be to transfer the data to be written, and write
> that data to the appropriate underlying mapped range. No readback, no huge
> in/out of process performance differences.
>
> The queue of writes to perform doesn't need to be individually transferred
> and performed either, an out of process implementation would be free to
> delay writes to the underlying till it's queued/collated enough writes into
> a block for IPC transfer.

I am aware of this alternative, but that's not currently supported by
js.  Basically you return a writable buffer to js by MapBufferRange.
Now you need to keep track of which part are dirty and which are
clean.  I am not saying it's impossible, but requires new semantics
and optimization.

I don't see this can be better than BufferSubData in out-of-process
implementations.

When we expose something in core WebGL, we expect it to be implemented
the same/similar (semantics, perf, etc) in various platforms (OSs,
browser vendors, GPUs). If that's not possible, I think extensions are
the better way, with the correct expectation that this (not just
semantics, but the perf implication) may not be supported everywhere.

Otherwise, let's say a developer implemented a game using
MapBufferRange in one browser, but users from another browser can't
play it at all even on the same hardware because it's too slow, that's
not good for WebGL at all.

There are many other examples that we have to hold back features that
developers would love, for example, certain compressed texture formats
not supported on iOS, etc. It may seem frustrating but in the long
run, good for WebGL as a web standard.

>
>>
>> For out-of-process implementations, using the invalidate bit makes a
>> huge per difference.  for Map call, it doesn't have to wait for the
>> service side to return the buffer range, it can just allocate a buffer
>> and initialize to 0 and allow js to write to it.  Otherwise, it's a
>> blocking call until service side responded with the readback buffer
>> data.
>
> This argument is borne again out of this flawed idea how to do things which
> you imply as a given, but it isn't, see above.