[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved



Great work!

For the recompilation message, I filed an ANGLE bug for it.

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Takahiro Aoyagi (taoyagi@mozilla.com)
Sent: Wednesday, May 15, 2019 11:25 PM
To: Public WebGL <public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Still WIP but I confirmed Three.js + KHR_parallel_shader_compile extension with some optimizations improves the frame rate dropping on application start  up.

 

 

Regarding recompilation message, I think such type of hints are very helpful to us optimize the JS graphics engine. Actually I had a hard time with optimizing Three.js because WebGL inside is kinda black box to me.

 

 

On Tue, May 14, 2019 at 4:47 PM Chen, Jie A (jie.a.chen@intel.com) <public_webgl@khronos.org> wrote:

Kai, I’d like to do that. As far as I know, dynamic recompilation is not seldom seen in ANGLE d3d backend. I not am sure if that would be over warned. Probably we should file a bug to the ANGLE team firstly.

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Kai Ninomiya (kainino@google.com)
Sent: Tuesday, May 14, 2019 1:51 AM
To: Public WebGL <public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Jasper, thanks for looking into that!

 

I think Chrome should print a performance warning if we compile a shader that ends up getting recompiled before ever being used.

 

Jie, would you be interested in looking into adding this warning?

 

From: Jasper St. Pierre (jstpierre@nvidia.com) <public_webgl@khronos.org>
Date: Sun, May 12, 2019 at 8:03 PM
To: Public WebGL

I should probably note that the cause for the long stalls on gl.useProgram was found. It turns out that ANGLE, at least on D3D11, will recompile the shader program for every set of vertex attributes that it is used with. The initial compileShader / linkProgram calls was done without having any VAO or vertex attributes bound. Setting up a dummy VAO before compiling the shaders and linking the programs fixed this. It's unfortunate that WebGL does not expose ARB_vertex_attrib_binding, so setting up the dummy VAO was a bit clunky, but it wasn't too bad. Make sure you compile your shaders with the correct vertex attribute set up or you risk a dynamic shader recompilation.


From: owners-public_webgl@khronos.org <owners-public_webgl@khronos.org> on behalf of Kai Ninomiya (kainino@google.com) <public_webgl@khronos.org>
Sent: Sunday, May 12, 2019 10:03:54 AM
To: public webgl
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

I think we will probably need to optimize checking LINK_STATUS (which would make useProgram faster too), so that when the COMPLETION_STATUS query result is sent to the client and cached, it sends the LINK_STATUS as well, and updates the already existing link status cache.

 

From: Takahiro Aoyagi (taoyagi@mozilla.com) <public_webgl@khronos.org>
Date: Sun, May 12, 2019, 2:14 AM
To: Public WebGL

Thanks for the comments, Chen and Kai.

 

Yeah, 8ms is very long to me. And if I compile+link two or more programs the time can be longer.

 

> I guess I’ve found the cause for gl.useProgram. WebGL implementation inserts an extra GL command to query LinkStatus before using it.

 

Will this be optimized (not adding the extra GL command) at some point? Or must it be necessary?

 

> But do note that we cache that info, so it only takes extra time on the first useProgram.

 

I confirmed that on my test.

 

 

So far I'll try not to switch to a program which hasn't been called with gl.useProgram() yet while at least one program is being compiled+linked.

 

 

On Sun, May 12, 2019 at 2:54 PM Kai Ninomiya (kainino@google.com) <public_webgl@khronos.org> wrote:

8ms is an extraordinary long time, in frame rendering time.

 

But do note that we cache that info, so it only takes extra time on the first useProgram.

 

From: Chen, Jie A (jie.a.chen@intel.com) <public_webgl@khronos.org>
Date: Sat, May 11, 2019, 8:07 PM
To: Public WebGL

I guess I’ve found the cause for gl.useProgram. WebGL implementation inserts an extra GL command to query LinkStatus before using it.

https://cs.chromium.org/chromium/src/third_party/blink/renderer/modules/webgl/webgl_rendering_context_base.cc?sq=package:chromium&g=0&l=6350

 

Anyway I think you don’t need to worry too much about this. Usually it doesn’t take very long to drain the command buffer. I tried your test case, which took about only 8ms on my laptop.

 

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Chen, Jie A (jie.a.chen@intel.com)
Sent: Sunday, May 12, 2019 9:44 AM
To: Public WebGL <
public_webgl@khronos.org>
Subject: RE: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Mostly they are gl.get* like commands. You can search ‘WaitForCmd’ in https://cs.chromium.org/chromium/src/gpu/command_buffer/client/gles2_implementation_impl_autogen.h and https://cs.chromium.org/chromium/src/gpu/command_buffer/client/gles2_implementation.cc for them.

But I don’t think gl.useProgram needs to wait. Most likely it has a different cause. I will look into it later.

 

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Takahiro Aoyagi (taoyagi@mozilla.com)
Sent: Sunday, May 12, 2019 2:27 AM
To: Public WebGL <
public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Hi Chen,

 

Thanks for your explanation.

 

What type of WebGL draw calls need to wait for all queued GL commands? In my tests, not only gl.get*()

 

 

but also gl.useProgram() seems to wait.

 

 

That means, I shouldn't call any gl.get*() and also I shouldn't switch program until all programs compilation+linking are done?

 

I haven't tried yet but could draw calls wait, too? If so, I shouldn't render anything until then?

 

 

Interesting question, Takahiro.

My speculation is that the problem might be relevant to chrome’s multi-process architecture. As you may know, the WebGL calls run in the main thread of render process. Chrome needs to send these WebGL commands to the GPU process for actual execution. So in your case, what Chrome actually does for calls of makeProgram2 is simply putting the GL commands into a IPC buffer. But later when you  retrieve a program parameter back, it then waits for the execution of all previously issued GL commands including makeProgram2.   

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Takahiro Aoyagi (taoyagi@mozilla.com)
Sent: Saturday, May 11, 2019 6:58 AM
To: Public WebGL <
public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

 

 

}

 

 

Virus-free. www.avg.com

 

 

 

 

Virus-free. www.avg.com

 

 

Hello again. My app is now showing that getProgramParameter is completing quick, but it is spending a lot of time in useProgram. I am checking for COMPLETION_STATUS_KHR on the GL program beforehand and it is returning true. I’m guessing there’s some additional expensive work that still needs to happen at useProgram time. Is this a bug? Any ways to debug or avoid this? I’m unsure if there will be an easy repro.

 

Again, profiler screenshot attached.

 

From: owners-public_webgl@khronos.org <owners-public_webgl@khronos.org> On Behalf Of Takahiro Aoyagi (taoyagi@mozilla.com)
Sent: Friday, May 10, 2019 12:22 AM
To: Public WebGL <
public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

 

Virus-free. www.avg.com

 

 

 

 

Hi Takahiro,

     The current implementation of COMPLETION_STATUS_KHR query in chromium incurs an expensive round-trip to the GPU thread. This may contribute to the wait time in your case. I am trying to optimize this. Could you please file a bug to chromium? I would look into it once I’ve done the optimization.

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Takahiro Aoyagi (taoyagi@mozilla.com)
Sent: Friday, April 26, 2019 6:20 PM
To: Public WebGL <
public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Basically, for ANGLE’s GL backend, it’s simply a wrapper for the existing GL implementations. In this sense glCompileShader() is still implementation specific. Essentially  ANGLE just calls glCompileShader and check GL_COMPILE_STATUS.

 

 

Thank you, that’s important information to know. Changing my shader compilation pipeline to wait for shader completion status is a lot more difficult, but based on some initial tests, it’s more expensive than not doing it – the latency of getShaderParameter is too high.

 

That said, in a lot of GL implementations, glCompileShader() is practically a no-op, and will only do an AST parse when glGetShaderParameter/InfoLog() is called (which is basically only for debugging / error checking). glLinkProgram() does full compilation of all programs, as full-PSO optimization is important. I don’t know how common this is, but multiple GL implementations I have seen do it. It might make sense for ANGLE to adopt this model as well.

 

 

Absolutely. Ideally there would be no stall for linkProgram. I tried, but it’s too difficult to achieve this in ANGLE.

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Jeff Gilbert (jgilbert@mozilla.com)
Sent: Thursday, April 25, 2019 9:33 AM
To: Public WebGL <
public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

 

linkProgram may stall on compileShader. To avoid the stall, you’d better make sure the COMPLETION_STATUS_KHR statuses of the attached shaders are all True before calling linkProgram. In this sense, the first workflow has less chance running into stall.

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Fernando Serrano García (fserrano@mozilla.com)
Sent: Wednesday, April 24, 2019 9:38 PM
To: Public WebGL <
public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

 

fwiw three.js introduced a "debug" parameter (false by default) to determine if there will be a status check and call all the get* (currently in dev):

 

 

 

 

 

 

 

 

 

 

 

 

This is all extremely interesting, and documentation of these best practices should probably be more publicly available (perhaps through a non-normative note in the spec). I'm thinking the language in the extension spec could be made more clear about what it's actual purpose is. It currently gives the impression that enabling it is what allows for parallel compilation and that this should save time, but given what Jeff and Kai have said, it sounds like the purpose is to avoid CPU stalls, and that it might even take longer in wall time since you're only checking once a frame. Maybe some background information could be added along the lines of "Normally applications must wait on get*Parameter or a draw call for program linking to complete, this extension provides a mechanism for querying completion status without stalling..."

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Thank you! Trying this out in my own WebGL 2 app, I see quite a reduction in shader compilation time, shaving 400ms on load time on a moderately complex scene. The getProgramParameter(COMPLETION_STATUS_KHR) query still takes 1ms (which I believe is being tracked), but that is still better from the 4ms I see in getUniformBlockIndex (my previous sync point).

 

AFAIK, useProgram should not cause a stall, but in order to set my texture samplers to bind to different units, I need to do a getUniformLocation on my sampler uniform, which causes a compilation stall. Same thing with getUniformBlockIndex. Do make sure that anything that might query into reflection about the program (e.g. getVertexAttributeIndex, getUniformLocation, getUniformBlockIndex) happens after COMPLETION_STATUS_KHR returns true, since that’s another way to accidentally stall on compilation completing.

 

 

 

 

 

 

 

Wonderful!

 

 

 

 

 

 

 

 

WebGL community,