Many of the optimizations in this article are done automatically by some implementations, but often they are not. Therefore it helps to use these code optimizations, and they neither makes your code more complicated to read.
Swizzle masks are essentially free in hardware. Use them where possible.
in vec4 in_pos; // The following two lines: gl_Position.x = in_pos.x; gl_Position.y = in_pos.y; // can be simplified to: gl_Position.xy = in_pos.xy;
Swizzle can both make your shader faster, and the code becomes more readable.
MAD is short for multiply, then add. It is generally assumed that MAD operations are "single cycle", or at least faster than the alternative.
// A stupid compiler might use these as written: a divide, then add. vec4 result1 = (value / 2.0) + 1.0; vec4 result2 = (value / 2.0) - 1.0; vec4 result3 = (value / -2.0) + 1.0; // There are most likely converted to a single MAD operation (per line). vec4 result1 = (value * 0.5) + 1.0; vec4 result2 = (value * 0.5) - 1.0; vec4 result3 = (value * -0.5) + 1.0;
The divide and add variant might cost 2 or more cycles.
One expression might be better than the other. For example:
result = 0.5 * (1.0 + variable); result = 0.5 + 0.5 * variable;
The first one may be converted into an add followed by a multiply. The second one is expressed in a way that more explicitly allows for a MAD operation.
Assignment with MAD
Assume that you want to set the output value ALPHA to 1.0. Here is one method :
myOutputColor.xyz = myColor.xyz; myOutputColor.w = 1.0; gl_FragColor = myOutputColor;
The above code can be 2 or 3 move instructions, depending on the compiler and the GPU's capabilities. Newer GPUs can handle setting different parts of
gl_FragColor, but older ones can't, which means they need to use a temporary to build the final color and set it with a 3rd move instruction.
You can use a MAD instruction to set all the fields at once:
const vec2 constantList = vec2(1.0, 0.0); gl_FragColor = mycolor.xyzw * constantList.xxxy + constantList.yyyx;
This does it all with one MAD operation, assuming that the building of the constant is compiled directly into the executable.
There are a number of built-in functions that are quite fast, if not "single-cycle" (to the extent that this means something for various different hardware).
Let's say we want to linearly interpolate between two values, based on some factor:
vec3 colorRGB_0, colorRGB_1; float alpha; resultRGB = colorRGB_0 * (1.0 - alpha) + colorRGB_1 * alpha; // The above can be converted to the following for MAD purposes: resultRGB = colorRGB_0 + alpha * (colorRGB_1 - colorRGB_0); // GLSL provides the mix function. This function should be used where possible: resultRGB = mix(colorRGB_0, colorRGB_1, alpha);
It is reasonable to assume that dot product operations, despite the complexity of them, will be fast operations (possibly single-cycle). Given that knowledge, the following code can be optimized:
vec3 fvalue1; result1 = fvalue1.x + fvalue1.y + fvalue1.z; vec4 fvalue2; result2 = fvalue2.x + fvalue2.y + fvalue2.z + fvalue2.w;
This is essentially a lot of additions. Using a simple constant and the dot-product operator, we can have this:
const vec4 AllOnes = vec4(1.0); vec3 fvalue1; result1 = dot(fvalue1, AllOnes.xyz); vec4 fvalue2; result2 = dot(fvalue2, AllOnes);
This performs the computation all at once.