[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] async shader compiliation

On Tue, Feb 21, 2017 at 10:10 AM, Corentin Wallez <cwallez@google.com> wrote:
One thing that might work in Chrome with the current state of things would be to go through a list of shaders in a Worker and for each of them:
  • Compile it
  • glFinish (and destroy the shader)
  • send to the main event loop that this shader was warmed up
Then in the main event loop, recompile the shader expecting it to be already warm and instant to compile. However on platforms where the browser cannot use program binaries, things would actually be slower.

On Tue, Feb 21, 2017 at 12:40 PM, Jukka Jylänki <jujjyl@gmail.com> wrote:
Shader compilation performance is something that comes up often when working with game developers who use Emscripten and asm.js to port their engines. Firefox has an about:config flag to enable or disable the use of ANGLE at runtime on Windows. Disabling ANGLE speeds up shader compilation by a factor of 3x-10x.

Unfortunately the cost of shader compilation is mostly in the HLSL compiler, outside of ANGLE, so there isn't much we can do to optimize it. Jamie recently made ANGLE compile the vertex and fragment shaders in parallel, improving performance significantly.

Thanks Corentin for confirming that most of ANGLE's shader translation time is spent in the HLSL compiler. I suspected this was the case but wasn't sure.

Doing async/background thread compilation of shaders would be cool, but it's not an ultimate solution since it is only a latency hiding technique which won't reduce power consumption for mobile and low end desktop. Here's some other suggestions (not mutually exclusive):

1. Microsoft has recently open sourced their HLSL shader compiler codebase, which is available at https://github.com/Microsoft/DirectXShaderCompiler. One effect of getting open sourced is that a few months back I noticed I started getting .pdb files of shader compiler served when doing geckoprofiler/CodeXL/VTune profiles of WebAssembly applications, which gives visibility to the hotspots in d3dcompiler_xx.dll. This compiler was not originally intended to be an online compiler, so it's uncertain how much of it has been optimized for speed. Perhaps the problem could be tackled directly at the source and D3Dcompiler optimized to improve compilation times? I don't think this has ever been looked at from an online compiler perspective, so perhaps there are some low hanging fruit. Any % of benefits here would be direct wins on top of whatever other techniques are used.

This compiler only works with yet unreleased versions of Windows: from their README.md you can see "At the moment, the Windows 10 Insider Preview Build 15007 is able to run DXIL shaders."
2. Let's enable binary compiled shaders on the web by leveraging https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_get_program_binary.txt, which is in core OpenGL 4.1 and OpenGL ES 3.0. Allow one to pull out an opaque blob (nontransferrable between PCs, invalidatable by browser between page visits) of compiled shader programs and stick those to IndexedDB. This way at least warm page visits will be fast. On cold page visits one might be able to rearchitect shaders to be compiled parallel to downloading other page assets (textures, geometry, WebAssembly code) to hide most of the impact.

The browsers are already doing this under the hood on platforms where program binaries are available (or at least Chrome does). Doing program caching at the WebGL level wouldn't help as it would get invalidated at the same time the browser invalidates its own cache.
3. Let's do binary SPIR-V shaders on the web. Can SPIR-V binary shaders be standardized to WebGL? There exists an extension to consume SPIR-V in desktop OpenGL at https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_gl_spirv.txt. Can Khronos work to standardize this on OpenGL ES as well so that it could then be translated over to WebGL? This would allow compiling all needed shaders fully offline.

Again, the vast majority of the time is spent in the HLSL shader compiler, so optimizing the ANGLE frontend with a binary format wouldn't help much.
4. Cache compiled shaders in browsers internally for warm page loads to be fast. Not as ideal as 2. or 3. but could work.

Chrome already does that.

More details: the vertex and fragment shaders' text, plus additional metadata, is used as the key into the cache, and the program binary is the value. Warm starts should be faster than cold starts. Here is the relevant code in case other browsers want to adapt it:

 - Look at program_manager.cc, program_cache.cc

 - shader_disk_cache.cc

Solving the cold start problem is difficult. Contributing to Microsoft's DXIL compiler is probably the best way to improve shader load time in the long term, since it'll likely always be necessary to go through that toolchain, just as it's necessary to go through the HLSL compiler currently in order to load shaders into Direct3D.

(Perhaps browsers should consider whitelisting some OpenGL drivers on Windows and running on top of them by default -- bypassing the Direct3D and HLSL translation altogether. Is startup of these large applications much faster in Chrome with the command line flag "--use-angle=gl" ?)

It would not be technically difficult to add entry points to WebGL like compileShaderAsync and linkProgramAsync, which would perform all of the compilation and link phases on another thread and provide new statuses like "PENDING" which could be queried. If you all think those would help, they could be prototyped in the form of an extension. However, it sounds like hiding the latency (and, by the way, increasing parallelism) may not be desirable.

Max, would you be able to make a self-contained test case which just compiles all of the shaders from After the Flood back-to-back, measuring the time between starting compiling the first one and linking the last one?


To me it feels that #1 and #2 could be the best near term prospects for WebGL.

2017-02-21 17:59 GMT+02:00 Maksims Mihejevs <max@playcanvas.com>:
Here is one of real-world examples that we've worked on in collaboration with Mozilla and their recent WebGL 2.0 launch.

After the Flood

We've been making WebGL content and high-end demos for many years now, and aware of many tricks and issues that we have to engage with during creation of such content. Such experience is not available to most of current WebGL developers, so leaving them struggling the way we had finding out caveats on the way. We happily share our experience all the time and implement best practices into our engine.

Initially we had one large stall, for shader compilation, and were enforced to think smarter there, to at least compile only number of pre-cached shader programs within a buffer of time then skip to next animation frame continuing compilation. Firefox gets "unresponsive" warning after tab is frozen while compiling shaders in sync manner within single animation frame. And that is on GL platforms.
On ANGLE this of course is way worse, not even mentioning mobile.

So on GTX 1080, Windows, Chrome/Firefox, ANGLE. With fiber optic internet (very high quality), with servers within 10ms latency, it downloads 19.1Mb of assets (very quickly), and compilation takes even longer than downloading 19.1Mb with all nearly perfect conditions.

At least async compilation in this case would allow us to initiate shader compilation right before loading most of assets, allowing to parallelize loading assets with compiling shaders. Potentially could half the loading times for such case.
But 19.1Mb is actually a lot for initial download for WebGL app, so in more common cases shader compilation will take 50-95% of loading time.

And we are talking not milliseconds here, but actually seconds.
We have profiled complexity of our shaders and their variations very carefully, there is only few complex shader cases widely reused, but generally all shaders even most simple contribute a lot to compilation times.

What is funny, is that simply inlining and minimizing string size by rewriting shader by hand preserving all same logic (so compilation result would be same), did lead to some performance improvements, in some tests we made up to 50% faster, than same shader but not inlined and not minified.

Kind Regards,

On 21 February 2017 at 14:29, Florian Bösch <pyalot@gmail.com> wrote:
P.S. @vendors, please solve the problems we have right now (making WebGL usable without reservations for all usecases, including low latency ones and complex ones), not the problem we wish we had, but haven't gotten to yet (WebGPU, WebNXT, WebGL 2.1, etc.)

On Tue, Feb 21, 2017 at 3:23 PM, Florian Bösch <pyalot@gmail.com> wrote:
On Tue, Feb 21, 2017 at 1:50 PM, Maksims Mihejevs <max@playcanvas.com> wrote:
Can't express how important solving this is for a whole WebGL platform is.

I'm currently engaged with an architectural visualization startup and the rendering pipeline is of considerable complexity (though it's all up-front loaded). It generally works fine on GL backends (it might pause for maybe a few hundred milliseconds). But on the ANGLE backend, it completely freezes the tab for 15 seconds on boot. This is unacceptable. </tales from the real world>