[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] async shader compiliation

Max, would you be able to make a self-contained test case which just compiles all of the shaders from After the Flood back-to-back, measuring the time between starting compiling the first one and linking the last one?

Hi Kenneth.

I've made isolated example with shader compilation.

http://moka.co/shader-compilation/ - online version. Find archive in attachment to run it on local machine.
Please run this and post some results.
This example uses shaders from After the Flood WebGL 2.0 demo https://playcanv.as/p/44MRmJRU/ which is what real-world next-gen apps usually look like.
I will remind that we've been optimizing this app a lot before release, and made shaders compilation times twice faster by reducing a lot of unnecessary shader code.

App is very simple - it gets some info about device, then gets WebGL 2.0 context, loads raw text file with many shaders. Then it compiles each shader, makes program, links it, and gets some attributes and uniforms.
It prevents any caching to take in place as well (either driver or browser), as noticed that was happening and made testing harder. It does it by adding random float constant to each shader.
Box at the bottom - is animated by CSS, and turns orange just before compilation start. For the time it is orange/frozen page is unresponsive. In most cases many things turn unresponsive, for example when tabs share JS thread. Or any other webgl content in browser.

Tested this on few platforms, so here are some results:

Windows; GTX 880M; ANGLE (DX11); i7-4810MQ; Chrome 58; ~5,300 ms
Windows; GTX 1070; ANGLE (DX11); i7-6820HK; Chrome 56; ~7,200 ms
Windows; GTX 980; ANGLE (DX11); i7-4790; Chrome 56; ~6,300 ms
Windows; GTX 980 Ti; ANGLE (DX11); i7 3770k; Chrome 56; ~5,100 ms
Windows; GTX 970; ANGLE (DX11); i7-3770; Firefox 52; ~7,600 ms
Windows; GTX 970; ANGLE (DX11); i7-3770; Chrome 58; ~5,500 ms
Windows; GTX 970; (GL); i7-3770; Chrome 58; ~3,200 ms
Windows; Intel HD 4000; ANGLE (DX11); Chrome 56; ~11,000 ms
Linux; GTX 970; i7-6700; Chrome 56; ~1,900 ms
Linux; GTX 670; i7-6700; Firefox 51; ~2,100 ms
Linux; GTX 980 Ti; i5 4690k; Chrome 56; ~2,200 ms
Mac Book 13" late 2013; Inter Iris; i7; ~900 ms
iMac 27" late 2012; GTX 660M; i5; Chrome 56; ~900 ms
iMac 27" late 2012; GTX 660M; i5; Firefox 51; ~930 ms
Android; One Plus Two; Adreno 430; Firefox 52; ~9,600 ms
Android; One Plus Two; Adreno 430; Chrome 58; ~11,600 ms
Android; One Plus Three; Adreno 530; Chrome 56; ~7,500 ms

What is interesting, that on Android, compilation takes a bit longer than linking. When on Desktop Linux and Windows compilation is 3-10 times faster than linking. Although old Mac Book and iMac linking is way faster than compilation, although their fill-rate capabilities are poor. Is Apple doing GLSL compilation better?
I mean: GTX 1070 with i7-6820HK CPU, compiling as slow as Android OP3.. Really?

You can clearly see from tests that linking/compilation times are jumping all around very inconsistently.
ANGLE path is clearly loosing dramatically against GL path.

Do remember this is not most complex WebGL app and this bold times - is for how long whole thing freezes page and in some cases whole device.

Looking forward to hear back from you guys.

Kind Regards,

On 1 March 2017 at 10:58, Maksims Mihejevs <max@playcanvas.com> wrote:
This was abandoned in favor of a set of smaller shaders, chosen and loaded according to the fixed-function state in use.

This is pretty much what we do, and many other nowadays uber shader systems do: they make shader code based on provided arguments.
For example if only diffuseMap is used, then shader will be very simple. If diffuseMap + PBR used, it will get a bit more complicated, and so on. Some of things are driven by uniforms where possible, for example intensity for colour or opacity. But some will lead to shader re-generation, like availability of certain texture on material. So shaders include only what is used by material and specific mesh being rendered with that material.

Ubershader - is pretty much the only way to go decent with Forward Renderer.

We do need async shader compilation, but more we need is faster shader compilation in first place.


On 1 March 2017 at 10:04, Mark Callow <khronos@callow.im> wrote:

On Feb 28, 2017, at 1:21, Maksims Mihejevs <max@playcanvas.com> wrote:

 (if they are ubershaders)

Throughout the time I’ve been working with OpenGL ES 2+, ubershaders have generally been regarded as something to avoid. At the dawn of OpenGL ES 2 the concern was whether such shaders could be compiled and would fit in the available memory of the devices of those days. There was also concern about introducing lots of extra tests and branches for every vertex or fragment. The first example of an “ubershader” was a shader to mimic the OpenGL ES 1 fixed-function pipeline. This was abandoned in favor of a set of smaller shaders, chosen and loaded according to the fixed-function state in use. As far as I know this was the model adopted by all IHV’s who provided OpenGL ES 1 support on their OpenGL ES 2 parts.

To avoid doubt, let me state that I am not trying to downplay the importance of async shader compilation.



Attachment: shader-cache.zip
Description: Zip archive