[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved



Hello again. My app is now showing that getProgramParameter is completing quick, but it is spending a lot of time in useProgram. I am checking for COMPLETION_STATUS_KHR on the GL program beforehand and it is returning true. I’m guessing there’s some additional expensive work that still needs to happen at useProgram time. Is this a bug? Any ways to debug or avoid this? I’m unsure if there will be an easy repro.

 

Again, profiler screenshot attached.

 

From: owners-public_webgl@khronos.org <owners-public_webgl@khronos.org> On Behalf Of Takahiro Aoyagi (taoyagi@mozilla.com)
Sent: Friday, May 10, 2019 12:22 AM
To: Public WebGL <public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Confirmed COMPLETION_STATUS_KHR is very fast like 0.01ms on my test

 

Nice work!

 

Thanks

 

https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png

Virus-free. www.avg.com

 

On Wed, May 8, 2019 at 2:41 PM Ken Russell (kbr@google.com) <public_webgl@khronos.org> wrote:

Jie Chen from Intel just landed a fix in Chrome dramatically improving the performance of querying COMPLETION_STATUS_KHR under http://crbug.com/881152 . The fix will be in tomorrow's Canary on Windows and macOS.

 

This should address the last major gap with this extension. We would like to enable it by default by promoting it to community approved; any further comments?

 

-Ken

 

 

 

From: Ken Russell <kbr@google.com>
Date: Fri, Apr 26, 2019 at 3:40 PM
To: Public WebGL

Thanks for the report and the clear test case. There was an existing bug about this, so the new one was duplicated into it, and some hints about implementing it were added to the preexisting bug.

 

 

On Fri, Apr 26, 2019 at 5:56 AM Takahiro Aoyagi (taoyagi@mozilla.com) <public_webgl@khronos.org> wrote:

Hi Chen,

 

Opened a bug. Thanks.

 

 

Takahiro

 

 

On Fri, Apr 26, 2019 at 8:03 PM Chen, Jie A (jie.a.chen@intel.com) <public_webgl@khronos.org> wrote:

Hi Takahiro,

     The current implementation of COMPLETION_STATUS_KHR query in chromium incurs an expensive round-trip to the GPU thread. This may contribute to the wait time in your case. I am trying to optimize this. Could you please file a bug to chromium? I would look into it once I’ve done the optimization.

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Takahiro Aoyagi (taoyagi@mozilla.com)
Sent: Friday, April 26, 2019 6:20 PM
To: Public WebGL <public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Hi all,

 

I'm working on Three.js + KHR_parallel_shader_compile extension support with Fernando.

 

 

I'm testing with Canary. But the behavior seems weird to me. gl.getProgramParameter(program, ext.COMPLETION_STATUS_KHR) seems to wait for compile+link completion. I don't think it should wait for that, it should just do polling and returning true/false quickly instead.

 

I made a simple test to reproduce.

 

 

Code snippet.

 

const ext = gl.getExtension('KHR_parallel_shader_compile');

 

gl.compileShader(vertexShader);
gl.compileShader(fragmentShader);
gl.attachShader(program, vertexShader);
gl.attachShader(program, fragmentShader);
gl.linkProgram(program);

const startTime = performance.now();
const status = useExtension
    ? gl.getProgramParameter(program, ext.COMPLETION_STATUS_KHR)
    : gl.getProgramParameter(program, gl.LINK_STATUS);
const endTime = performance.now();

const elapsedTime = endTime - startTime;

 

On my platform, both gl.LINK_STATUS and ext.COMPLETION_STATUS_KHR take so long time about 15ms. I can accept this number for gl.LINK_STATUS because it waits for compile+link completion. But 15ms for ext.COMPLETION_STATUS_KHR looks weird to me. It shouldn't wait for the compile+link completion but should do just polling and quickly returning true/false instead. So I guess the number should be 1ms or less.

 

Let me know if I'm missing something. Or could it be a Canary bug?

 

My platform.

 

OS: Windows10

Browser: Chrome Canary.

Browser boot option: --use-cmd-decoder=passthrough --enable-webgl-draft-extensions

 

 

Thanks

 

Takahiro

 

 

On Fri, Apr 26, 2019 at 10:15 AM Chen, Jie A (jie.a.chen@intel.com) <public_webgl@khronos.org> wrote:

Basically, for ANGLE’s GL backend, it’s simply a wrapper for the existing GL implementations. In this sense glCompileShader() is still implementation specific. Essentially  ANGLE just calls glCompileShader and check GL_COMPILE_STATUS.

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Jasper St. Pierre (jstpierre@nvidia.com)
Sent: Friday, April 26, 2019 1:18 AM
To: Public WebGL <public_webgl@khronos.org>
Subject: RE: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Thank you, that’s important information to know. Changing my shader compilation pipeline to wait for shader completion status is a lot more difficult, but based on some initial tests, it’s more expensive than not doing it – the latency of getShaderParameter is too high.

 

That said, in a lot of GL implementations, glCompileShader() is practically a no-op, and will only do an AST parse when glGetShaderParameter/InfoLog() is called (which is basically only for debugging / error checking). glLinkProgram() does full compilation of all programs, as full-PSO optimization is important. I don’t know how common this is, but multiple GL implementations I have seen do it. It might make sense for ANGLE to adopt this model as well.

 

From: owners-public_webgl@khronos.org <owners-public_webgl@khronos.org> On Behalf Of Chen, Jie A (jie.a.chen@intel.com)
Sent: Wednesday, April 24, 2019 6:48 PM
To: Public WebGL <public_webgl@khronos.org>
Subject: RE: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Absolutely. Ideally there would be no stall for linkProgram. I tried, but it’s too difficult to achieve this in ANGLE.

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Jeff Gilbert (jgilbert@mozilla.com)
Sent: Thursday, April 25, 2019 9:33 AM
To: Public WebGL <public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Possibly, but in the long run, I expect implementations to support no-stall linkProgram invocations.

 

On Wed, Apr 24, 2019 at 6:27 PM Chen, Jie A (jie.a.chen@intel.com) <public_webgl@khronos.org> wrote:

linkProgram may stall on compileShader. To avoid the stall, you’d better make sure the COMPLETION_STATUS_KHR statuses of the attached shaders are all True before calling linkProgram. In this sense, the first workflow has less chance running into stall.

 

From: owners-public_webgl@khronos.org [mailto:owners-public_webgl@khronos.org] On Behalf Of Fernando Serrano García (fserrano@mozilla.com)
Sent: Wednesday, April 24, 2019 9:38 PM
To: Public WebGL <public_webgl@khronos.org>
Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

Regarding the optimal proposed workflow (grouping shaders and programs separately) of:

for (const x of shaders)
  gl.compileShader(x);
for  (const x of programs)
  gl.linkProgram(x);

 

versus something more on the way that we currently have in many engines as threejs:

for (const x of material)

  gl.compileShader(fs);

  gl.compileShader(vs);
  gl.linkProgram(x);

 

Do you think is more or less accurate the following graph for 4 materials and enabling this extension with 2 and 4 parallel threads?

 

Screen Shot 2019-04-24 at 15.35.47.png

 

On Wed, Apr 24, 2019 at 2:40 PM Fernando Serrano García <fserrano@mozilla.com> wrote:

fwiw three.js introduced a "debug" parameter (false by default) to determine if there will be a status check and call all the get* (currently in dev):

 

so currently calling `new WebGLShader()` won't be blocking with any `get*` function and it could be used as is to compile all of them in a loop with or without the extension.

the problem comes from the WebGLProgram as it should just queue itself before trying to link any program and wait until all the shader compile calls finish, then do all the link, and wait for it asynchronously before start using these programs.

I have opened an issue on the three.js repo to keep track of the implementation and we have booked some time this week to take a look at it, feedback is welcome :) https://github.com/mrdoob/three.js/issues/16321

 

On Tue, Apr 16, 2019 at 11:55 PM Ken Russell (kbr@google.com) <public_webgl@khronos.org> wrote:

On Tue, Apr 16, 2019 at 2:24 PM Tarek Sherif (tsherif@uber.com) <public_webgl@khronos.org> wrote:

Yes, I've read both specs. I think this language in the WebGL version is confusing: "When this extension is enabled: Shader compilation and program linking may be performed in a separate CPU thread." And neither indicates which functions are expected to cause stalls. As written, this extension requires more than most that people know implementation details.

 

All previous operations against the program - looking up uniform locations, drawing with it, etc. - will cause stalls. The application *must* be updated to query the program's COMPLETION_STATUS_KHR status, and skip *all* operations against the program until it's no longer pending.

 

I guess we'll just change the WebGL version of this extension spec to be much more explicit about this.

 

 

Essentially, all previous code which checked for the completion of compilation of shaders / linking of programs can be updated with a small wrapper which checks COMPLETION_STATUS_KHR for the linking of programs.

 

This is not true because the most common way this is done will cause stalls that sabotage the parallelism, e.g.:
https://github.com/mrdoob/three.js/blob/master/src/renderers/webgl/WebGLShader.js#L21-L36

 

The check for COMPLETION_STATUS_KHR has to wrap *all* queries and uses of the program object, and they must be skipped until COMPLETION_STATUS_KHR returns true. Yes, all applications will have to be updated to take advantage of this extension, but if they do, the benefits will be significant.

 

-Ken

 

 

Tarek Sherif

Senior Software Engineer, Visualization | Uber

 

 

 

On Tue, Apr 16, 2019 at 4:18 PM Ken Russell (kbr@google.com) <public_webgl@khronos.org> wrote:

Please review the Overview in the underlying OpenGL ES spec:

 

It describes that the extension uses one or more separate CPU threads for shader compilation. If you still think we should rewrite the introduction to the WebGL extension, we can do that too.

 

Essentially, all previous code which checked for the completion of compilation of shaders / linking of programs can be updated with a small wrapper which checks COMPLETION_STATUS_KHR for the linking of programs.

 

-Ken

 

 

 

On Tue, Apr 16, 2019 at 1:36 AM Tarek Sherif (tsherif@uber.com) <public_webgl@khronos.org> wrote:

This is all extremely interesting, and documentation of these best practices should probably be more publicly available (perhaps through a non-normative note in the spec). I'm thinking the language in the extension spec could be made more clear about what it's actual purpose is. It currently gives the impression that enabling it is what allows for parallel compilation and that this should save time, but given what Jeff and Kai have said, it sounds like the purpose is to avoid CPU stalls, and that it might even take longer in wall time since you're only checking once a frame. Maybe some background information could be added along the lines of "Normally applications must wait on get*Parameter or a draw call for program linking to complete, this extension provides a mechanism for querying completion status without stalling..."

 

Tarek Sherif

Senior Software Engineer, Visualization | Uber

 

 

 

On Mon, Apr 15, 2019 at 11:23 PM Kai Ninomiya (kainino@google.com) <public_webgl@khronos.org> wrote:

 

 

On Mon, Apr 15, 2019 at 7:18 PM Tarek Sherif (tareksherif@pm.me) <public_webgl@khronos.org> wrote:

I don't see anything in the OpenGL extension spec about what functions cause stalls. Is that implementation-specific? Having an idea what to expect is important for two reasons:

1. To make it clear that drawing will still work if getProgramParameter(COMPLETION_STATUS_KHR) hasn't returned true yet (i.e. the stall on a draw call)

 

Generally, OpenGL extensions are not allowed to break existing API/existing content. (In OpenGL, all extensions are always enabled; there's no requestExtension.) Hence, even if you enable the extension, it won't change the way your application works, and therefore the draw call must still cause a stall in order to be able to use the program.

 

2. Almost all WebGL code I've looked at handles compilation in a way similar to the following:

    gl.compileShader(vertexShader);
    if (!gl.getShaderParameter(vertexShader, gl.COMPILE_STATUS)) //...

    gl.compileShader(fragmentShader);

    if (!gl.getShaderParameter(fragmentShader, gl.COMPILE_STATUS)) //...

    //...

    gl.linkProgram(program);
    if (!gl.getProgramParameter(program, gl.LINK_STATUS)) //...

I think it would be worth it to make clear that this pattern would sabotage the parallel compile.

 

I seem to remember (not sure if this is still true) that this pattern also sabotages the shader compiler cache in ANGLE, because ANGLE doesn't cache the COMPILE_STATUS (or was it the compile info string?). So it forces the compiler to recompile just to give you the status.

 

Tarek Sherif

 

 

 

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, April 15, 2019 9:00 PM, Ken Russell (kbr@google.com) <public_webgl@khronos.org> wrote:

 

The WebGL extension (which is basically just a wrapper) refers to the underlying OpenGL / OpenGL ES extension https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_parallel_shader_compile.txt, where this is documented. Is that sufficient? It would be better to avoid duplicating a large amount of spec text from the underlying extension. However, we can add some non-normative text if that would help.

 

-Ken

 

 

 

On Sat, Apr 13, 2019 at 3:51 AM Tarek Sherif (tareksherif@pm.me) <public_webgl@khronos.org> wrote:

IIRC, all 3 of those functions still force a wait for the {shader to finish compiling, program to finish linking} just as they did before. The new semantics are only visible via the new COMPLETION_STATUS_KHR check.

That should work perfectly for us. Can we add some language to the spec about which functions cause stalls? I feel like as written, it will be easy for people to use the extension incorrectly.

 

Tarek Sherif

 

 

 

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, April 12, 2019 4:10 PM, Kai Ninomiya (kainino@google.com) <public_webgl@khronos.org> wrote:

 

Oh yes, useProgram probably won't stall. But getUniformLocation and the others will stall, as will draw calls.

 

getProgramParameter(COMPLETION_STATUS_KHR) should get faster at some point; see http://crbug.com/881152 which Ken linked.

 

On Fri, Apr 12, 2019 at 12:39 PM Jasper St. Pierre (jstpierre@nvidia.com) <public_webgl@khronos.org> wrote:

Thank you! Trying this out in my own WebGL 2 app, I see quite a reduction in shader compilation time, shaving 400ms on load time on a moderately complex scene. The getProgramParameter(COMPLETION_STATUS_KHR) query still takes 1ms (which I believe is being tracked), but that is still better from the 4ms I see in getUniformBlockIndex (my previous sync point).

 

AFAIK, useProgram should not cause a stall, but in order to set my texture samplers to bind to different units, I need to do a getUniformLocation on my sampler uniform, which causes a compilation stall. Same thing with getUniformBlockIndex. Do make sure that anything that might query into reflection about the program (e.g. getVertexAttributeIndex, getUniformLocation, getUniformBlockIndex) happens after COMPLETION_STATUS_KHR returns true, since that’s another way to accidentally stall on compilation completing.

 

 

Sent: Friday, April 12, 2019 10:25 AM

To: Public WebGL <public_webgl@khronos.org>

Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

 

IIRC, all 3 of those functions still force a wait for the {shader to finish compiling, program to finish linking} just as they did before. The new semantics are only visible via the new COMPLETION_STATUS_KHR check.

 

On Fri, Apr 12, 2019 at 10:05 AM Tarek Sherif (tsherif@uber.com) <public_webgl@khronos.org> wrote:

Very excited about this extension, but I'm concerned about how it would work in our use case. Our flagship visualization library, deck.gl, often shares the gl context with a separate base map rendering application so that it can properly interleave visualization layers with the base map. Enabling this extension seems to change the behaviour of existing WebGL functions in a way that I don't believe other extensions do. My worry is how our enabling KHR_parallel_shader_compile would affect applications we share contexts with, i.e what happens if the typical synchronous compilation logic is used after KHR_parallel_shader_compile is enabled? Specifically:

  • What happens if getShaderParameter(shader, COMPILE_STATUS) is called before compilation is complete?
  • What happens if getProgramParameter(program, LINK_STATUS) is called before linking is complete?
  • What happens if useProgram is called before linking is complete? 

Thanks,

 

 

Tarek Sherif

Senior Software Engineer, Visualization | Uber

 

 

 

 

Tarek Sherif

Senior Software Engineer, Visualization | Uber

 

 

On Fri, Apr 12, 2019 at 12:16 PM David Catuhe (David.Catuhe@microsoft.com) <public_webgl@khronos.org> wrote:

Wonderful!

 

 

Sent: Wednesday, April 10, 2019 2:56 PM

To: Public WebGL <public_webgl@khronos.org>

Subject: Re: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

 

If your code works correctly now when enabling draft WebGL extensions in Chrome, then no changes are needed. There won't be any changes to the extension when it's taken out of draft status; it'll just be available by default.

 

-Ken

 

 

On Wed, Apr 10, 2019 at 12:36 PM David Catuhe (David.Catuhe@microsoft.com) <public_webgl@khronos.org> wrote:

We do have support for the extension in Babylon.js. How can we test this change?


 

From: owners-public_webgl@khronos.org <owners-public_webgl@khronos.org> on behalf of Ken Russell (kbr@google.com) <public_webgl@khronos.org>
Sent: Wednesday, April 10, 2019 12:00:25 PM
To: public
Subject: [Public WebGL] Propose promoting KHR_parallel_shader_compile to community approved

 

WebGL community,

 

Jie Chen from Intel has principally developed and implemented a WebGL wrapper for the KHR_parallel_shader_compile extension. This allows shader compilation and program linking to occur fully in parallel to the rest of the WebGL application. Some small application changes are needed to take full advantage of the extension, but it is otherwise fully backward compatible. Jie has even emulated this extension on platforms which don't support it natively.

 

This extension solves a longstanding user complaint with WebGL, namely long shader compile times on Windows. Compiles and links occur not only asynchronously, but also in parallel, so there is a net speedup when using this extension. Users who have tried out the draft extension are pleased with the results and would like to see it made widely available.

 

I propose promoting this extension to community approved in https://github.com/KhronosGroup/WebGL/pull/2855 . Any comments?

 

-Ken

 

 

 

 

https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png

Virus-free. www.avg.com

 

Attachment: chrome_up0AJZj96D.png
Description: chrome_up0AJZj96D.png