[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Public WebGL] Re: [angleproject] Re: Solving slow compilation of long loops with texture sampling



Did we ever find a solution for texture lookups in loops causing slow compile times?

On Thu, Mar 31, 2011 at 9:39 AM, Daniel Koch <daniel@transgaming.com> wrote:

On 2011-03-30, at 6:21 PM, Alvaro Segura wrote:

You are right. That's why we propose to find a way to apply a smart
selective change.

Let me state the problem in a different way:

We are not talking about saving several seconds as in the flight.html
demo. In fact, we faced this problem more severely. Our application,
with moderate iteration settings takes more than 1 minute trying to
compile (with a few "your script is taking too long" alerts) and
finally fails. I don't have the numbers here, I think we have to hit
"stop script" after minutes.

What happens if you reduce the optimization level?

Currently Chrome builds of ANGLE use D3DCOMPILE_OPTIMIZATION_LEVEL0, but stand-alone builds will default to 
D3DCOMPILE_OPTIMIZATION_LEVEL3. 
If you are doing your own ANGLE build try modifying this at:
http://code.google.com/p/angleproject/source/browse/trunk/src/libGLESv2/Program.cpp#19

Our fix makes compilation possible, and in only a couple seconds, the
same as native GLSL. Rendering itself is acceptably fast after that
for such a shader.

So, yes, a rule has to be chosen to apply the tex2Dlod() trick only
when necessary that does not break the rest of cases. Some
suggestions:

- texture reads inside loops
- texture reads inside loops longer than N iterations (N=10?)
- texture reads inside loops when the texture has a MIN_FILTER=LINEAR
or NEAREST, ...

That last idea might be the safest. Is it possible in Angle to know
the texture filtering mode? (texParameteri TEXTURE_MIN_FILTER) If the
mode for the relevant texture is LINEAR or NEAREST (not _MIPMAP_) then
using tex2Dlod is just fine, right?

The libGLESv2 part of ANGLE knows the filtering mode, but the shader compiler currently has no knowledge of it.
Adding something like this option would introduce a state dependency between the shaders and textures/texture state.
This would required checking these dependencies every draw call or when a texture state changed and potentially recompiling the shader (resulting in even more delays!).  Introducing state dependencies into the shaders is definitely something we hope to avoid.

Thanks for your attention.

Best regards,

Alvaro



On 30 mar, 22:43, Daniel Koch <dan...@transgaming.com> wrote:
On 2011-03-30, at 7:09 AM, Alvaro Segura wrote:



Hi All,

This post complements our previous post "[angleproject] Solving slow compilation (and eventual fail) in complex shaders (with patch)" with a related but different problem. Previously we discussed a solution for slow compilation of long loops by preventing their unrolling. It has been said that Chrome 12 will compile shaders without unrolling to improve this. We have tested Chrome 12 and certainly improves fractal.io as much as our solution with "[loop] [fastopt] for (...)".

Now, there are other cases, such as that reported by John Davis (http://www.pcprogramming.com/flight.html), and others, where         loops need to do texture sampling (i.e. texture2D(tex, uv); ).

The problem here seems to be the following or a similar issue: texture2D() translates to tex2D() in HLSL. tex2D is said to be a "gradient instruction" because it uses mipmapping (even if we defined MIN_FILTER LINEAR!). HLSL does not allow gradient instructions in true loops (at least in loops with "break" which I can't find is that demo (?)), so upon seeing that call, DX forces an unrolling, even if Chrome 12 is trying to avoid that. The result is a very long compile time and a possible error if the loop is too deep and can't be unrolled.

In HLSL mipmapping can be avoided by using tex2Dlod(tex, uv, 0, 0) [the 0s being the levels chosen]. Great. Is there a similar function in GLSL? yes: texture2DLod(). Can the shader developer just change it? No: GLSL only allows texture2DLod() in vertex shaders, not fragment shaders.

A solution implies Angle emitting a HLSL "tex2Dlod(...,0,0)" instruction from a source GLSL "texture2D(...)". We tested this, again in a custom-built Angle library. And it worked great for our application which does heavy texture sampling inside long loops.

Can that translation be done always as in our naive fix? Not really. There are plenty of applications that need correct mipmapped sampling for good minification of textures. The translation then needs to be done selectively, only where necessary. The rule can be to check whether the texture2D call is inside a loop or when inside a loop of more than N iterations. [the first seems easier to do]:

texture2D(a,b); out of loop  => tex2D(a,b);
texture2D(a,b); inside loop  => tex2Dlod(a, b, 0, 0);

That doesn't really seem like it would be a safe universal change though...



We propose this change or equivalent fixes as we definitely need sampling in loops, following the necessary testing so it does not break anything. Pure OpenGL works with no problems, BTW.

For reference, we are using Angle SVN trunk (rev 598), compiled under VS2008 and then, replacing resulting DLL in FF4. I have upgraded MS SDK Platform and DX SDK to the latest one, having a Win7 64bits PC with a nVidia GTX485. BTW, what is different in Chrome 12? doesn't it use regular SVN-trunk Angle?

Chrome 12 does use regular SVN-trunk ANGLE (as far as I know), however it uses a different build system (GYP) that has some different defines.  Seehttp://code.google.com/p/angleproject/source/browse/trunk/src/build_a...for the relevant settings.

Hope this helps,
Daniel

---
                        Daniel Koch -+- dan...@transgaming.com
Senior Graphics Architect -+- TransGaming Inc.  -+-www.transgaming.com

---
                        Daniel Koch -+- daniel@transgaming.com
Senior Graphics Architect -+- TransGaming Inc.  -+- www.transgaming.com