[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WebGL2 and no mapBuffer/mapBufferRange



On Tue, Mar 17, 2015 at 3:12 PM, Zhenyao Mo <zmo@chromium.org> wrote:
On Tue, Mar 17, 2015 at 2:39 PM, Jeff Gilbert <jgilbert@mozilla.com> wrote:
> Just warn on WRITE without INVALIDATE.
>
> Here's what the costs look like to me:
>
> MapBufferRange(READ) ~= GetBufferSubData:
> Both require a synchronous GL command followed by a copy.
>
> MapBufferRange(WRITE|INVALIDATE) ~= BufferSubData
> MapBufferRange can create a scratch shmem for writing via ArrayBuffer, send
> it across IPC on flush, and do Map+memcpy on the GL process. (1 copy)
> BufferSubData is at best from an ArrayBuffer which is already shmem, and is
> then a copy-on-write (ideally no-copy), but still needs to call
> BufferSubData or Map+memcpy on GL process. (2 copies, but only 1 copy if you
> have a heuristic which allocates shmem to ArrayBuffers)

This scenario is my main concern. out-of-process-GL will have at least
one extra copying comparing with in-process-GL.  Since it's likely on
the critical rendering path, this diff will create a huge perf gap
among implementations. IMHO, this is really bad for WebGL as a
standard.
Yet MapBufferRange(WRITE|INVALIDATE) is generally one fewer copy than BufferSubData, even on out-of-process-GL.

We are in the business of exposing a 'sharp tool' API. If something in particular is slow on some platforms, tell people about it and have them use alternate codepaths. Artificially limiting performance for implementations because of a quirk in one browser does not seem healthy for a performance-oriented API.

>
> With UNSYNCHRONIZED, MapBufferRange can be 'sharper', but potentially more
> performant:
> READ|UNSYNC: Still synchronous, but lets the GL process use UNSYNCHRONIZED
> to prevent stalls.
> WRITE|INVAL|UNSYNC: Still async, but lets the GL process use UNSYNCHRONIZED
> to prevent stalls.

I don't think we can allow UNSYNCHRONIZED bit to reach the underlying
GL. That's leads to undefined behavior.
Let's leave this for a later discussion then.

>
> FLUSH_EXPLICIT lets out-of-process GL reduce the amount of data it needs to
> memcpy while not having to allocate many smaller chunks with BufferSubData.
> (Multiple discard+write ranges with the same single shmem scratch buffer)
>
> WRITE without INVAL is probably much slower than WRITE|INVAL even on
> in-process-GL implementations.

How? Assuming you map, write to some, flush, write to some other,
flush... unmap.  So unless you change the ArrayBuffer semantics to
keep track of dirty/clean states for each element, otherwise each
flush is to write back the entire range.
There is a command for flushing subranges. With WRITE|INVAL, out-of-process-GL would likely create a scratch buffer shmem and thus controls its contents. WIth buffer reuse (by the same context), clearing to zero isn't even required. Writes get made onto this buffer, the flushed ranges of which are copied into the eventual mapped buffer on the GL process.