Here is one of real-world examples that we've worked on in collaboration with Mozilla and their recent WebGL 2.0 launch.
After the Flood
We've been making WebGL content and high-end demos for many years now, and aware of many tricks and issues that we have to engage with during creation of such content. Such experience is not available to most of current WebGL developers, so leaving them struggling the way we had finding out caveats on the way. We happily share our experience all the time and implement best practices into our engine.
Initially we had one large stall, for shader compilation, and were enforced to think smarter there, to at least compile only number of pre-cached shader programs within a buffer of time then skip to next animation frame continuing compilation. Firefox gets "unresponsive" warning after tab is frozen while compiling shaders in sync manner within single animation frame. And that is on GL platforms.
On ANGLE this of course is way worse, not even mentioning mobile.
So on GTX 1080, Windows, Chrome/Firefox, ANGLE. With fiber optic internet (very high quality), with servers within 10ms latency, it downloads 19.1Mb of assets (very quickly), and compilation takes even longer than downloading 19.1Mb with all nearly perfect conditions.
At least async compilation in this case would allow us to initiate shader compilation right before loading most of assets, allowing to parallelize loading assets with compiling shaders. Potentially could half the loading times for such case.
But 19.1Mb is actually a lot for initial download for WebGL app, so in more common cases shader compilation will take 50-95% of loading time.
And we are talking not milliseconds here, but actually seconds.
We have profiled complexity of our shaders and their variations very carefully, there is only few complex shader cases widely reused, but generally all shaders even most simple contribute a lot to compilation times.
What is funny, is that simply inlining and minimizing string size by rewriting shader by hand preserving all same logic (so compilation result would be same), did lead to some performance improvements, in some tests we made up to 50% faster, than same shader but not inlined and not minified.