[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WebGL Best Practices


Actually there is https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_shader_subgroup.txt which adds Vulkan-style subgroup operations to OpenGL and OpenGL ES.

Sadly, I think it’s only available on NVIDIA desktop drivers at this point, so from that perspective, it might as well not exist





-----Original Message-----

From: <owners-public_webgl@khronos.org> on behalf of "Kevin Rogovin (kevinrogovin@invisionapp.com)" <public_webgl@khronos.org>

Reply-To: Public WebGL <public_webgl@khronos.org>

Date: Thursday, May 28, 2020 at 11:21 PM

To: Public WebGL <public_webgl@khronos.org>

Subject: Re: [Public WebGL] WebGL Best Practices





As much as I hate the reality: branching can be still quite bad in

shaders, depending on the nature of branching. GPU's will group

vertices and fragments into groups. If within a group the branching

diverges, then a typical GPU shader code will execute both branches

for all elements of the group but where the branch is false, mask the

side-effects. How vertices and fragments are grouped is very much GPU

specific. However, for fragment shading, you can count on that a 2x2

fragment block is going to end up in the same group, so diverging

within a 2x2 block is bad. As far as I know, for NVIDIA and AMD GPU's

fragments from different triangles can end up in the same group

whereas for Intel GPU's all fragments within a group are from the same

triangle. This architectural difference can have significant

consequences for uber-shaders and also significant performance

differences for tiny triangles as well. Another impact of uber-shading

is the impact on the compiled shader itself. Branching shaders have a

-tendency- to increase register pressure in compilers which in turn

can decrease shader efficiency. For what it is worth, I had made a

previous project where I initially relied on under-shading to reduce

shader changes. Until the shader changes got quite large (like over

1000 or so), I found that the uber-shader ran slower on the

hardware/software I was targeting, Intel GPU with the open source

drivers. Because of the Intel GPU design, I figured that the impact of

uber-shading on other GPU's would have been even worse. In GL4.x,

there is the extension GL_ARB_shader_ballot at


which exposes some of this to a shader so it can better handle

branching itself. Sadly, this is not in GLES2 or GLES3 in any form and

thus not in WebGL1 or WebGL2.




On Fri, May 29, 2020 at 12:12 AM Andrew Varga (grizzly33@gmail.com)

<public_webgl@khronos.org> wrote:


> Thank you, it will be interesting to see how that changes. One more reason for "serious" graphical applications to be written using wasm I guess..


> I have another performance related question that I meant to ask: I think an outdated best practice is to avoid branching in shaders, but with modern GPU's this is no longer the case.

> My example would be a simple engine which renders 2 types of rectangles in multiple (many) instances, one type having a texture and one with just a solid color.

> A. You can implement this by having 2 separate Programs, one using a texture, the other doesn't. No branching in the shader but you have to have at least 2 draw calls to render all rectangles.

> B. Have a single Program which is using a uniform flag that indicates that there is a texture or not, and branching that uses this uniform. In this case still at least 2 draw calls are needed to update the uniform between calls but at least no Program switching occurs.

> C. Have a dummy texture, or an (instanced) attribute indicating if you have a texture or not, to be able to have both types of rectangles being rendered with the same draw call, preferably using instancing.


> I assume option C., would be preferred for this example, but I guess this depends on the type of "materials" that you might have in your scene, how much they differ and how many of them you wanr to render. Does C., make sense and is there a standard way of going about this strategy?



> On Wed, May 27, 2020 at 5:56 AM Jukka Jylänki (jujjyl@gmail.com) <public_webgl@khronos.org> wrote:



>> Wasm indeed does have to call out to JS to perform the WebGL API

>> calls. We were extremely fortunate that WebGL got modeled directly

>> against native OpenGL, and as a result, the Wasm<->JS function call

>> boundary only consists of primitive ints and floats, so the interop

>> calls between Wasm and JS are quite straightforward.


>> WebGPU on the other hand has not been modeled similarly low-level, but

>> using high-level JS objects, and will have a *much* bigger Wasm<->JS

>> call overhead than WebGL has.


>> Wasm modules do not (currrently at least) shortcut to fast track WebGL

>> API calls, although there have been discussions about enabling Wasm to

>> import JS/DOM API references to be called directly. (I think the

>> proposal is this one:

>> https://github.com/WebAssembly/function-references/blob/master/proposals/function-references/Overview.md

>> )



>> ke 27. toukok. 2020 klo 1.03 Andrew Varga (grizzly33@gmail.com)

>> (public_webgl@khronos.org) kirjoitti:

>> >

>> > Thank you for the elaborate messages, makes sense and useful to know.

>> >

>> > Somewhat related to this, I've also been wondering about how Webassembly works with WebGL (and WebGPU). My assumption is that since WebGL is a web api, the wasm module has to "call out" to JS (using the import object) which is actually calling the WebGL API.

>> > (So for example what emscripten does is it creates the wasm module, plus the import object which maps the OpenGL ES2 API calls to appropriate WebGL API calls implemented in JS?)

>> > If this is the case, a purely JS application might even be faster to use WebGL than a wasm module as it has less overhead, comparing only the performance related to calling the API.

>> > But is it, or is it going to be possible that a wasm module shortcuts this and executes WebGL commands more directly, at a lower level, avoiding some of the overhead present with calling it from JS and so actually be faster?

>> >

>> > On Mon, May 25, 2020 at 9:19 PM Jeff Gilbert (jgilbert@mozilla.com) <public_webgl@khronos.org> wrote:

>> >>

>> >>

>> >> That's a great question! I would still consider that to be best practice.

>> >>

>> >> In some browsers more than others, calling into the browser (C++) from

>> >> JS is relatively slow, whereas accessing a native JS cache is very

>> >> very fast.

>> >>

>> >> Additionally, most browsers have (or are moving to) a split

>> >> content-process/gpu-process implementation of WebGL. In this case,

>> >> moving caching into the browser layer would be more difficult than the

>> >> caching a native JS app could choose to do, because the browser can't

>> >> assume, for example, that all validation succeeds before we update the

>> >> JS-accessible caches. It also means browsers need to maintain the

>> >> correct state in two places, instead of just in the

>> >> (privileged/trusted) gpu process.

>> >>

>> >> In some cases, asking the browser for state reflection means a

>> >> round-trip between processes, which, on a busy machine, can be pretty

>> >> slow.

>> >>

>> >> On Mon, May 25, 2020 at 9:09 AM Andrew Varga (grizzly33@gmail.com)

>> >> <public_webgl@khronos.org> wrote:

>> >> >

>> >> > That's a really useful page!

>> >> >

>> >> > I'm curious of something that is not mentioned here but I came across it many times, eg. in Chapter 9 - Optimizing WebGL Usage of "HTML5 Game Development Insights" and I think many engines follow this practice, which is basically to cache WebGL state in _javascript_ and only actually issue WebGL commands when needed:

>> >> > - keeping track of uniforms but only update them before a draw call, and only if their values have changed

>> >> > - cache all bindings (active texture unit, buffers, attributes)

>> >> > - cache blend states

>> >> > - ..etc, basically cache the entire WebGL state to reduce overhead

>> >> >

>> >> > Is this still consider a good practice? What I've been wondering is why this cannot be implemented by the browser, if this always makes sense to do?

>> >> > And is WebGPU going to be different in this regard?

>> >> >

>> >> > Thanks,

>> >> > Andrew

>> >> >

>> >> > On Fri, Mar 27, 2020 at 6:30 PM Ken Russell (kbr@google.com) <public_webgl@khronos.org> wrote:

>> >> >>

>> >> >> The investigation is underway in http://crbug.com/1065012 . The native implementation of requestPostAnimationFrame, which is behind the --enable-experimental-web-platform-features flag in Chrome, doesn't exhibit this problem.

>> >> >>

>> >> >> For the time being, the rPAF polyfill can be used in Firefox, and seems to carry some benefit there.

>> >> >>

>> >> >>

>> >> >>

>> >> >> On Fri, Mar 27, 2020 at 9:09 AM Ashley Gullen (ashley@scirra.com) <public_webgl@khronos.org> wrote:

>> >> >>>

>> >> >>> Why doesn't it work well in Chrome? rPAF seems to intuitively make sense once explained. Do you have a link to the discussion?

>> >> >>>

>> >> >>> On Thu, 26 Mar 2020 at 19:52, Ken Russell (kbr@google.com) <public_webgl@khronos.org> wrote:

>> >> >>>>

>> >> >>>> Note that there's an active ongoing discussion about the requestPostAnimationFrame best practice. It seems that the polyfill works well in Firefox, but not in Chrome. Suggest holding off adding that to your applications for the moment.

>> >> >>>>

>> >> >>>> -Ken

>> >> >>>>

>> >> >>>>

>> >> >>>> On Thu, Mar 26, 2020 at 11:40 AM Kai Ninomiya (kainino@google.com) <public_webgl@khronos.org> wrote:

>> >> >>>>>

>> >> >>>>> The MDN doc links to an explainer:

>> >> >>>>> https://github.com/WICG/requestPostAnimationFrame/blob/master/explainer.md

>> >> >>>>> It's pretty small, but it's also not a very complex feature (implementation-wise). As explained there, the original motivation was to allow querying computed DOM properties at a time when it's guaranteed it won't force a relayout.

>> >> >>>>>

>> >> >>>>> On Thu, Mar 26, 2020 at 5:05 AM Won Chun (won@cbrebuild.com) <public_webgl@khronos.org> wrote:

>> >> >>>>>>

>> >> >>>>>> Are there more resources about requestPostAnimationFrame? First I've heard of rPAF.

>> >> >>>>>>

>> >> >>>>>> -Won

>> >> >>>>>>

>> >> >>>>>> On Wed, Mar 25, 2020, 4:11 PM Ken Russell (kbr@google.com) <public_webgl@khronos.org> wrote:

>> >> >>>>>>>

>> >> >>>>>>> [cross-posted to webgl-dev-list]

>> >> >>>>>>>

>> >> >>>>>>> Dear WebGL community:

>> >> >>>>>>>

>> >> >>>>>>> Mozilla, with contributions from the rest of the WebGL working group, has just revised the WebGL Best Practices document. It contains a significant number of tips for best structuring WebGL applications and attaining top performance. Please check it out!

>> >> >>>>>>>

>> >> >>>>>>> On behalf of the WebGL working group,

>> >> >>>>>>>

>> >> >>>>>>> -Ken

>> >> >>>>>>>

>> >>

>> >> -----------------------------------------------------------

>> >> You are currently subscribed to public_webgl@khronos.org.

>> >> To unsubscribe, send an email to majordomo@khronos.org with

>> >> the following command in the body of your email:

>> >> unsubscribe public_webgl

>> >> -----------------------------------------------------------

>> >>


>> -----------------------------------------------------------

>> You are currently subscribed to public_webgl@khronos.org.

>> To unsubscribe, send an email to majordomo@khronos.org with

>> the following command in the body of your email:

>> unsubscribe public_webgl

>> -----------------------------------------------------------




You are currently subscribed to public_webgl@khronos.org.

To unsubscribe, send an email to majordomo@khronos.org with

the following command in the body of your email:

unsubscribe public_webgl