[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] Support precompiled shaders as extensions



D3D shaders are much more flexible for programmers than GL shader programs. GL type of approach is what promotes shader permutation explosion, D3D9/10/11 has better design that avoids more of that (there's no monolithic shader programs like in GL, though D3D12 is another story). With this respect, D3D works more like EXT_separate_shader_objects does in GL, that extension attempts to remove some of the permutation explosion overhead.

That aside, GL shader compilation is one of the biggest performance problems currently in both Unreal Engine and Unity3D compiled applications. This has been an observed problem time and time again from a couple of years back already.

Unity3D generally likes to compile most shaders at startup, so for Unity3D, slow shader compilation blows up startup times (some Unity pages load 10-20 seconds slower on Windows compared to OS X, due to slow ANGLE + HLSL compiler behavior)
Unreal Engine generally likes to compile shaders on demand, which causes bad per frame stuttering behavior when it does so.
("generally likes" because both engines do have both types of aspects, but they bias a bit differently)

You can visit this Unreal Engine 4 demo page to see the effect in action:

   https://s3.amazonaws.com/mozilla-games/tmp/2016-05-05-PlatformerGame-profiling/PlatformerGame-HTML5-Shipping.html?playback&cpuprofiler&webglprofiler&expandhelp&tracegl=50&novsync

Open the web console while the page is running. It has these kinds of prints:

Trace: at t=96209.1, section "Cold GL" called via "_glLinkProgram" <- "__ZL11LinkProgramRK33FOpenGLLinkedProgramConfiguration" <- "__ZN17FOpenGLDynamicRHI25RHICreateBoundShaderStateEP21FRHIVertexDeclarationP16FRHIVertexShaderP14FRHIHullShaderP16FRHIDomainShaderP15FRHIPixelShaderP18FRHIGeometryShader" took 336.25 msecs!

and there's a light blue spike on the page timeline when slow GL behavior occurs.

My proposal for solving this at least to some degree would be to have an extension which
   a) allows compiling shaders to binary formats
   b) the binary compiled shaders become opaque blobs that one can't access byte data of
   c) the binary compiled shaders can be persisted to IndexedDB
   d) the binary compiled shaders are not guaranteed to remain functional forever in IndexedDB, but they are allowed to expire/go stale across browser restarts/page visits (to let browsers invalidate precompiled shaders if needed across browser updates). Have a synchronous API/member function in the shader object to allow asking if it has been invalidated.

This would allow GL pages to implement their own shader compilation caches effectively.

Perhaps this could be done on the browser level transparently, if browsers were able to prime up their shader caches better so it works consistently across platforms. However it seems that none of the current browsers do that quite perfectly. NVidia and recent AMD have a shader cache in their driver itself, which sometimes help, but not everyone uses NVidia or AMD. I suppose I'd prefer the extension however, because that would give an explicit performance contract.

Although naturally if it was possible to just make GL shader compilation faster, that'd be great too, and would help cold WebGL startup times. Not sure though how tight that code currently is in ANGLE. The above test page shows good timing prints about how long these usually take, and 300-400 msecs is not uncommon, and this is on a beefy 3.9GHz overclocked 16-core Intel Core i7-5960X with a GTX 980 Ti, Windows 10.


2016-11-15 10:57 GMT+02:00 Florian Bösch <pyalot@gmail.com>:
A big part of the problem of slow compile times in GL stems from the limited ways that shaders can be composed
  1. Composing a program from several shader objects attached per stage -> often triggers slow compile times at program linking time, unsupported by D3D (?)
  2. Composing a program by mixing stages -> Does not address intra-stage configuration, often triggers slow compile times at program linking time, unsupported by D3D (?)
  3. EXT_separate_shader_objects -> Does not address intra-stage configuration, unsupported by D3D (?)
  4. ARB_shader_subroutine -> Not supported by D3D (?) and ES
In order to cut down on compile times, we should have a way to compile fragments of shader programs and then quickly (re)configure them at run-time into a working program. It would allow to compile far less shader code because it avoids duplicate compilation. This capability does not presently exist. The closest we're getting to that are shader subroutines, but even that isn't quite there.