[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] async shader compiliation



Also a note: The open sourced DirectX Shader Compiler is for D3D12 only (shader model 6).

On Tue, Feb 21, 2017 at 1:15 PM Florian Bösch <pyalot@gmail.com> wrote:
Just a note on the D3D compiler, it's my understanding you can't give D3D compiled binary code that isn't signed by the D3D compiler which is codesigned by microsoft. So while in theory you could use the open source D3D compiler from microsoft for "stuff", it won't help you any cause you can't put the output of that into D3D (which precludes ANGLE from doing it as well). And in regards to SPIR-V, it's the same story, you'd still have to compile to HLSL which would still have to go through the D3D compiler cause you couldn't directly generate D3D binary from SPIR-V because signing of the codesigned D3D compiler from microsoft.

On Tue, Feb 21, 2017 at 6:40 PM, Jukka Jylänki <jujjyl@gmail.com> wrote:
Shader compilation performance is something that comes up often when working with game developers who use Emscripten and asm.js to port their engines. Firefox has an about:config flag to enable or disable the use of ANGLE at runtime on Windows. Disabling ANGLE speeds up shader compilation by a factor of 3x-10x.

Doing async/background thread compilation of shaders would be cool, but it's not an ultimate solution since it is only a latency hiding technique which won't reduce power consumption for mobile and low end desktop. Here's some other suggestions (not mutually exclusive):

1. Microsoft has recently open sourced their HLSL shader compiler codebase, which is available at https://github.com/Microsoft/DirectXShaderCompiler. One effect of getting open sourced is that a few months back I noticed I started getting .pdb files of shader compiler served when doing geckoprofiler/CodeXL/VTune profiles of WebAssembly applications, which gives visibility to the hotspots in d3dcompiler_xx.dll. This compiler was not originally intended to be an online compiler, so it's uncertain how much of it has been optimized for speed. Perhaps the problem could be tackled directly at the source and D3Dcompiler optimized to improve compilation times? I don't think this has ever been looked at from an online compiler perspective, so perhaps there are some low hanging fruit. Any % of benefits here would be direct wins on top of whatever other techniques are used.

2. Let's enable binary compiled shaders on the web by leveraging https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_get_program_binary.txt, which is in core OpenGL 4.1 and OpenGL ES 3.0. Allow one to pull out an opaque blob (nontransferrable between PCs, invalidatable by browser between page visits) of compiled shader programs and stick those to IndexedDB. This way at least warm page visits will be fast. On cold page visits one might be able to rearchitect shaders to be compiled parallel to downloading other page assets (textures, geometry, WebAssembly code) to hide most of the impact.

3. Let's do binary SPIR-V shaders on the web. Can SPIR-V binary shaders be standardized to WebGL? There exists an extension to consume SPIR-V in desktop OpenGL at https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_gl_spirv.txt. Can Khronos work to standardize this on OpenGL ES as well so that it could then be translated over to WebGL? This would allow compiling all needed shaders fully offline.

4. Cache compiled shaders in browsers internally for warm page loads to be fast. Not as ideal as 2. or 3. but could work.

To me it feels that #1 and #2 could be the best near term prospects for WebGL.

2017-02-21 17:59 GMT+02:00 Maksims Mihejevs <max@playcanvas.com>:
Here is one of real-world examples that we've worked on in collaboration with Mozilla and their recent WebGL 2.0 launch.

After the Flood

We've been making WebGL content and high-end demos for many years now, and aware of many tricks and issues that we have to engage with during creation of such content. Such experience is not available to most of current WebGL developers, so leaving them struggling the way we had finding out caveats on the way. We happily share our experience all the time and implement best practices into our engine.

Initially we had one large stall, for shader compilation, and were enforced to think smarter there, to at least compile only number of pre-cached shader programs within a buffer of time then skip to next animation frame continuing compilation. Firefox gets "unresponsive" warning after tab is frozen while compiling shaders in sync manner within single animation frame. And that is on GL platforms.
On ANGLE this of course is way worse, not even mentioning mobile.

So on GTX 1080, Windows, Chrome/Firefox, ANGLE. With fiber optic internet (very high quality), with servers within 10ms latency, it downloads 19.1Mb of assets (very quickly), and compilation takes even longer than downloading 19.1Mb with all nearly perfect conditions.

At least async compilation in this case would allow us to initiate shader compilation right before loading most of assets, allowing to parallelize loading assets with compiling shaders. Potentially could half the loading times for such case.
But 19.1Mb is actually a lot for initial download for WebGL app, so in more common cases shader compilation will take 50-95% of loading time.

And we are talking not milliseconds here, but actually seconds.
We have profiled complexity of our shaders and their variations very carefully, there is only few complex shader cases widely reused, but generally all shaders even most simple contribute a lot to compilation times.

What is funny, is that simply inlining and minimizing string size by rewriting shader by hand preserving all same logic (so compilation result would be same), did lead to some performance improvements, in some tests we made up to 50% faster, than same shader but not inlined and not minified.


Kind Regards,
Max

On 21 February 2017 at 14:29, Florian Bösch <pyalot@gmail.com> wrote:
P.S. @vendors, please solve the problems we have right now (making WebGL usable without reservations for all usecases, including low latency ones and complex ones), not the problem we wish we had, but haven't gotten to yet (WebGPU, WebNXT, WebGL 2.1, etc.)

On Tue, Feb 21, 2017 at 3:23 PM, Florian Bösch <pyalot@gmail.com> wrote:
On Tue, Feb 21, 2017 at 1:50 PM, Maksims Mihejevs <max@playcanvas.com> wrote:
Can't express how important solving this is for a whole WebGL platform is.

I'm currently engaged with an architectural visualization startup and the rendering pipeline is of considerable complexity (though it's all up-front loaded). It generally works fine on GL backends (it might pause for maybe a few hundred milliseconds). But on the ANGLE backend, it completely freezes the tab for 15 seconds on boot. This is unacceptable. </tales from the real world>