GLSL : common mistakes: Difference between revisions

From OpenGL Wiki
Jump to navigation Jump to search
(adding section Optimization #2)
Line 229: Line 229:
   result1 = dot(fvalue1,;
   result1 = dot(fvalue1,;
   vec4 fvalue2;
   vec4 fvalue2;
   result2 = dot(fvalue2.x, AllOnes);
   result2 = dot(fvalue2, AllOnes);

Revision as of 14:39, 24 September 2009

The following article discusses common mistakes made in the OpenGL Shading Language, GLSL.

Enable Or Not To Enable

With fixed pipeline, you needed to call glEnable(GL_TEXTURE_2D) to enable 2D texturing. You needed to call glEnable(GL_LIGHTING). Since shaders override these functionalities, you don't need to glEnable/glDisable. If you don't want texturing, you either need to write another shader that doesn't do texturing or you can attach a all white or all black texture, depending on your needs. You can also write one shader that does lighting and one that doesn't.

Things that are not overriden by shaders, like the alpha test, depth test, stencil test... calling glEnable/glDisable will have an effect.

Binding A Texture

When you compile and link your GLSL shader, the next step is to get uniform locations for your samplers (I'm talking about texture samplers) and setup the samplers. Some people do this:

glUniform1i(location, textureID)

You can't send a GL texture ID as your sampler. A sampler should be from 0 to the max number of texture image units.
Once you compile and link your shader, make sure that you setup all the samplers by calling (assuming of course your samplers are named Texture0, Texture1 and Texture2

location=glGetUniformLocation(shaderProgram, "Texture0"); 
glUniform1i(location, 0);
location=glGetUniformLocation(shaderProgram, "Texture1"); 
glUniform1i(location, 1);
location=glGetUniformLocation(shaderProgram, "Texture2"); 
glUniform1i(location, 2);

To bind a texture, always use glBindTexture.

glBindTexture(GL_TEXTURE_2D, textureID[0]);
glBindTexture(GL_TEXTURE_2D, textureID[1]);
glBindTexture(GL_TEXTURE_2D, textureID[2]);


for(i=0; i<3; i++)
  glBindTexture(GL_TEXTURE_2D, textureID[i]);

If you don't set the samplers properly, you might get a link failure that says

Output from shader Fragment shader(s) linked, vertex shader(s) linked.
Validation failed - samplers of different types are bound to the same texture i
mage unit.


nVidia drivers are more relaxed. You could do

float myvalue = 0;

but this won't compile on other platforms. Use 0.0 instead. Don't write 0.0f. GLSL is not C or C++.

float texel = texture2D(tex, texcoord);

The above is wrong since texture2D returns a vec4 Do this instead

float texel = float(texture2D(tex, texcoord));


float texel = texture2D(tex, texcoord).r;


float texel = texture2D(tex, texcoord).x;


Functions should look like this

vec4 myfunction(inout float value1, in vec3 value2, in vec4 value3)

instead of

vec4 myfunction(float value1, vec3 value2, vec4 value3)

Not Used

In the vertex shader

gl_TexCoord[0] = gl_MultiTexCoord0;

and in the fragment shader

vec4 texel = texture2D(tex, gl_TexCoord[0].xy);

zw isn't being used in the fs.
Keep in mind that for GLSL 1.30, you should define your own vertex attribute.
This means that instead of gl_MultiTexCoord0, define AttrMultiTexCoord0.
Also, do not use gl_TexCoord[0]. Define your own varying and call it VaryingTexCoord0.

Easy Optimization

gl_TexCoord[0].x = gl_MultiTexCoord0.x;
gl_TexCoord[0].y = gl_MultiTexCoord0.y;

turns into

gl_TexCoord[0].xy = gl_MultiTexCoord0.xy;

Keep in mind that for GLSL 1.30, you should define your own vertex attribute.
This means that instead of gl_MultiTexCoord0, define AttrMultiTexCoord0.
Also, do not use gl_TexCoord[0]. Define your own varying and call it VaryingTexCoord0.

The MAD instruction

MAD is short for multiply, then add. It is a special floating point circuit. Very fast. Costs 1 GPU cycle.

vec4 result1 = (value / 2.0) + 1.0;
vec4 result2 = (value / 2.0) - 1.0;
vec4 result3 = (value / -2.0) + 1.0;

The above doesn't quite easily turn into a MAD. It might be compiled to a reciprocal, then add. That might cost 2 or more cycles. Below is GLSL code that converts to a single MAD instruction (for each line of code of course)

vec4 result1 = (value * 0.5) + 1.0;
vec4 result2 = (value * 0.5) - 1.0;
vec4 result3 = (value * -0.5) + 1.0;

More MAD

One expression might be better than the other.

result = 0.5 * (1.0 + variable);

which compiles to

ADD  temp, 1.0, variable;
MUL  result, temp, 0.5;

Compare the above with this

result = 0.5 + 0.5 * variable;

which compiles to

MAD result, variable, 0.5, 0.5;

Of course, your GLSL compiler might be smart enough and optimize the above simple example for you but code it right yourself!

Linear Interpolation, lerp, mix

This is more about being aware of built in functions of GLSL and making use of them so that your GLSL compiler easily generates the "low level hardware executable". Blending 2 values based on some factor

vec3 colorRGB_0, colorRGB_1;
float alpha;
resultRGB = colorRGB_0 * (1.0 - alpha) + colorRGB_1 * alpha;

which can be simplified to

resultRGB = colorRGB_0  + alpha * (colorRGB_1 - colorRGB_0);

and of course, GPUs have a special instruction just for this common case. GLSL calls it mix while other languages like Cg calls it lerp.

resultRGB = mix(colorRGB_0, colorRGB_1, alpha);


If you have any of these enabled, it is known that this causes software mode rendering on ATI/AMD.
You should not even need these. Enable fulscreen MSAA instead.

Compile GLSL

This should be in the FAQ but for now, we'll leave it here.
Can you compile a GLSL program using some offline compiler?
Yes, by using the Cg compiler.
The Cg compiler not only compiles Cg code, but it can also do translations from one language to another, also called the target language.
You would have to download the Cg package which contains the compiler from
Once install, using the command line, type

cgc -oglsl -profile vp40 test.glsl_vs

That means your file test.glsl_vs contains your GLSL vertex shader and your target is GL_NV_vertex_program4 (also called vp40), so it should print out the shader on screen.
The Cg compiler also supports other targets like arbvp and arbfp and vp10, vp20, vp30, and the fp version as well.

There is no official offline compiler for OpenGL. The ARB didn't intend for this. Normally, you would just send your GLSL code to GL and the GL driver compiles the program, generates a GPU specific binary code, which eventually gets uploaded to the GPU when you decide to use it.

Keep in mind that if you send the GLSL shader code to GL, the driver compiles using the CPU. Compiling a shader is slow. The longer the shader, the more CPU time it takes. The driver does its best to optimize your code and to get rid of dead code. The more shaders you have, the more time it takes. If each shader takes 100ms to compile and you have 10 shaders, it will take 1 second. 60 shaders will take 1 minute. 600 shaders will take 10 minutes. These are rough estimates. They are intended to inform you that you'll be facing a problem when you write a large project.

glUniform doesn't work

You probably did not bind the correct shader first. Call glUseProgram(myprogram) first.

glUniform causes a slow down

This should go in the FAQ but we'll leave it here.
All the glUniform calls are relatively fast except that it has been reported that on some nVidia drivers, when certain values are sent to the shader, the driver recompiles and reoptimizes your shader. This is obviously a problem for games. Values are 0.0, 0.5, 1.0. There is no solution other than to avoid those exact numbers. Has nVidia solved this issue in recent drivers? Unknown.

How to use glUniform

If you look at all the glUniform functions (glUniform1fv, glUniform2fv, glUniform3fv, glUniform4fv, glUniform1iv, glUniform2iv, glUniform3iv, glUniform4iv, glUniformMatrix4fv and the many others), there is a parameter called count.

What's wrong with this code? Would it cause a crash?

 //Vertex Shader
 uniform vec4 LightPosition;
 //In your C++ code
 float light[4];
 glUniform4fv(MyShader, 4, light);

The problem is that for count, you set it to 4 while it should be 1 because you are sending 1 vec4 to the shader.
What's wrong with this code? Would it cause a crash?

 //Vertex Shader
 uniform vec2 Exponents;
 //In your C++ code
 float Exponents[2];
 glUniform2fv(MyShader, 2, Exponents);

The problem is that for count, you set it to 2 while it should be 1 because you are sending 1 vec2 to the shader.
What's wrong with this code? Would it cause a crash?

 //Vertex Shader
 uniform vec2 Exponents[5];
 //In your C++ code
 float Exponents[10];
 glUniform2fv(MyShader, 5, Exponents);

There is nothing wrong with it. We want to send 5 values of vec2.

Uniform Names in VS, GS and FS

This should go in the FAQ but we'll leave it here.
So what happens if you have the same exact uniform name is in both the vertex shader and geometry shader and fragment shader?

Yes, it is legal to have the same uniform name in all shaders.
When you call glGetUniformLocation, it will return one location. When you update the uniform with a call to glUniform, the driver takes care of sending the value for each stage (vertex shader, geometry shader, fragment shader).
This is because a GLSL shader is considered monolithic : the VS, GS and FS is considered as 1 shader.

Keep in mind that this applies to all uniforms : float, vec2, vec3, vec4, mat3, mat4, bool, sampler2D, sampler3D and the many others.


This should go in the FAQ but we'll leave it here.
So what happens when you supply 0 to glUseProgram? The specification says that you go back to fixed function pipeline processing. All the old functions take effect (glTexEnv, glEnable(GL_TEXTURE_2D), glLight, glMaterial, glEnable(GL_LIGHTING) and others).

It is highly recommended that you use shaders for all your rendering needs. All GPUs support shaders. They all support GL 2.0 or 2.1 or 3.0 and above these days.
Note : when you first create a GL context, you are in fixed function mode (glUseProgram(0)) and should bind a shader before rendering.


At the bottom of the common mistakes page, we will just discuss optimizations.

Assume that you want to set the output value ALPHA to 1.0. Here is one method : =;
 myOutputColor.w = 1.0;
 gl_FragColor = myOutputColor;

The above code is 3 MOV instructions for a SM 2 GPU, 2 MOV instructions for a SM 3 and above GPU.

Here is another method =;
 myOutputColor.w = max(myColor.w, 1.0);
 gl_FragColor = myOutputColor;

but again, 1 MOV or 2 MOV instructions and a MAX instruction.

Here is another method : The MAD instruction!

 const vec4 constantList = vec4(0.0, 0.5, 1.0, 2.0);
 gl_FragColor = mycolor.xyzw * constantList.yyyx + constantList.xxxy;

The line with gl_FragColor is a SINGLE MAD INSTRUCTION for SM 2 and above GPUs! The line with constantList is not an instruction, it just consumes a GPU register.

Optimization #2

Observe the following code. How can it be optimized?

  vec3 fvalue1;
  result1 = fvalue1.x + fvalue1.y + fvalue1.z;
  vec4 fvalue2;
  result2 = fvalue2.x + fvalue2.y + fvalue2.z + fvalue2.w;

That is a lot of ADD instructions. We can optimize it with either a DP3 or DP4 single instruction.

  const vec4 AllOnes = vec4(1.0);
  vec3 fvalue1;
  result1 = dot(fvalue1,;
  vec4 fvalue2;
  result2 = dot(fvalue2, AllOnes);