D3D shaders are much more flexible for programmers than GL shader programs. GL type of approach is what promotes shader permutation explosion, D3D9/10/11 has better design that avoids more of that (there's no monolithic shader programs like in GL, though D3D12 is another story). With this respect, D3D works more like EXT_separate_shader_objects does in GL, that extension attempts to remove some of the permutation explosion overhead.
That aside, GL shader compilation is one of the biggest performance problems currently in both Unreal Engine and Unity3D compiled applications. This has been an observed problem time and time again from a couple of years back already.
Unity3D generally likes to compile most shaders at startup, so for Unity3D, slow shader compilation blows up startup times (some Unity pages load 10-20 seconds slower on Windows compared to OS X, due to slow ANGLE + HLSL compiler behavior)
Unreal Engine generally likes to compile shaders on demand, which causes bad per frame stuttering behavior when it does so.
("generally likes" because both engines do have both types of aspects, but they bias a bit differently)
and there's a light blue spike on the page timeline when slow GL behavior occurs.
My proposal for solving this at least to some degree would be to have an extension which
a) allows compiling shaders to binary formats
b) the binary compiled shaders become opaque blobs that one can't access byte data of
c) the binary compiled shaders can be persisted to IndexedDB
d) the binary compiled shaders are not guaranteed to remain functional forever in IndexedDB, but they are allowed to expire/go stale across browser restarts/page visits (to let browsers invalidate precompiled shaders if needed across browser updates). Have a synchronous API/member function in the shader object to allow asking if it has been invalidated.
This would allow GL pages to implement their own shader compilation caches effectively.
Perhaps this could be done on the browser level transparently, if browsers were able to prime up their shader caches better so it works consistently across platforms. However it seems that none of the current browsers do that quite perfectly. NVidia and recent AMD have a shader cache in their driver itself, which sometimes help, but not everyone uses NVidia or AMD. I suppose I'd prefer the extension however, because that would give an explicit performance contract.
Although naturally if it was possible to just make GL shader compilation faster, that'd be great too, and would help cold WebGL startup times. Not sure though how tight that code currently is in ANGLE. The above test page shows good timing prints about how long these usually take, and 300-400 msecs is not uncommon, and this is on a beefy 3.9GHz overclocked 16-core Intel Core i7-5960X with a GTX 980 Ti, Windows 10.