GLSL : common mistakes

Revision as of 19:00, 7 July 2011 by Kri (Talk | contribs) (wikified OpenGL Shading Language)

Jump to: navigation, search

The following article discusses common mistakes made in the OpenGL Shading Language, GLSL.

Use the Swizzle

Swizzle masks are essentially free in hardware. Use them where possible.

gl_TexCoord[0].x = gl_MultiTexCoord0.x;
gl_TexCoord[0].y = gl_MultiTexCoord0.y;

This code can be simplified to:

gl_TexCoord[0].xy = gl_MultiTexCoord0.xy;

Drivers may detect and optimize this kind of thing, but they may not. It is best to do it for them when you can.


MAD is short for multiply, then add. It is generally assumed that MAD operations are "single cycle", or at least faster than the alternative.

vec4 result1 = (value / 2.0) + 1.0;
vec4 result2 = (value / 2.0) - 1.0;
vec4 result3 = (value / -2.0) + 1.0;

A stupid compiler may directly use these as written: a divide, then add. That might cost 2 or more cycles. Below is GLSL code that converts to a single MAD instruction (for each line of code of course).

vec4 result1 = (value * 0.5) + 1.0;
vec4 result2 = (value * 0.5) - 1.0;
vec4 result3 = (value * -0.5) + 1.0;

These is much more likely to convert into MAD operations.

One expression might be better than the other.

result = 0.5 * (1.0 + variable);

This may be converted into an add followed by a multiply. It can be expressed in a way that more explicitly allows for a MAD operation:

result = 0.5 + 0.5 * variable;

The compiler may be able to optimize this automatically, but it may not. Best to be careful.

Assignment with MAD

Assume that you want to set the output value ALPHA to 1.0. Here is one method : =;
 myOutputColor.w = 1.0;
 gl_FragColor = myOutputColor;

The above code can be 2 or 3 move instructions, depending on the compiler and the GPU's capabilities. Newer GPUs can handle setting different parts of gl_FragColor, but older ones can't, which means they need to use a temporary to build the final color and set it with a 3rd move instruction.

You can use a MAD instruction to set all the fields at once:

 const vec2 constantList = vec2(1.0, 0.0);
 gl_FragColor = mycolor.xyzw * constantList.xxxy + constantList.yyyx;

This does it all with one MAD operation, assuming that the building of the constant is compiled directly into the executable.

Fast Built-ins

There are a number of built-in functions that are quite fast, if not "single-cycle" (to the extent that this means something for various different hardware).

Linear Interpolation

Let's say we want to linearly interpolate between two values, based on some factor:

vec3 colorRGB_0, colorRGB_1;
float alpha;
resultRGB = colorRGB_0 * (1.0 - alpha) + colorRGB_1 * alpha;

This can be converted to the following for MAD purposes.

resultRGB = colorRGB_0  + alpha * (colorRGB_1 - colorRGB_0);

GLSL provides the mix function. This function should be used where possible:

resultRGB = mix(colorRGB_0, colorRGB_1, alpha);

Dot products

It is reasonable to assume that dot product operations, despite the complexity of them, will be fast operations (possibly single-cycle). Given that knowledge, the following code can be optimized:

  vec3 fvalue1;
  result1 = fvalue1.x + fvalue1.y + fvalue1.z;
  vec4 fvalue2;
  result2 = fvalue2.x + fvalue2.y + fvalue2.z + fvalue2.w;

This is essentially a lot of additions. Using a simple constant and the dot-product operator, we can have this:

  const vec4 AllOnes = vec4(1.0);
  vec3 fvalue1;
  result1 = dot(fvalue1,;
  vec4 fvalue2;
  result2 = dot(fvalue2, AllOnes);

This performs the computation all at once.


How to use glUniform

If you look at all the glUniform functions (glUniform1fv, glUniform2fv, glUniform3fv, glUniform4fv, glUniform1iv, glUniform2iv, glUniform3iv, glUniform4iv, glUniformMatrix4fv and the many others), there is a parameter called count.

What's wrong with this code? Would it cause a crash?

 //Vertex Shader
 uniform vec4 LightPosition;
 //In your C++ code
 float light[4];
 glUniform4fv(MyShader, 4, light);

The problem is that for count, you set it to 4 while it should be 1 because you are sending 1 vec4 to the shader.
What's wrong with this code? Would it cause a crash?

 //Vertex Shader
 uniform vec2 Exponents;
 //In your C++ code
 float Exponents[2];
 glUniform2fv(MyShader, 2, Exponents);

The problem is that for count, you set it to 2 while it should be 1 because you are sending 1 vec2 to the shader.
What's wrong with this code? Would it cause a crash?

 //Vertex Shader
 uniform vec2 Exponents[5];
 //In your C++ code
 float Exponents[10];
 glUniform2fv(MyShader, 5, Exponents);

There is nothing wrong with it. We want to send 5 values of vec2.

glUniform doesn't work

You probably did not bind the correct shader first. Call glUseProgram(myprogram) first.

glUniform causes a slow down

All the glUniform calls are relatively fast except that it has been reported that on some nVidia drivers, when certain values are sent to the shader, the driver recompiles and reoptimizes your shader. This is obviously a problem for games. Values are 0.0, 0.5, 1.0. There is no solution other than to avoid those exact numbers. Has nVidia solved this issue in recent drivers? Unknown.


Although not strictly a mistake, some wonder why glGetUniformLocation returns -1. If there is a uniform that you are not using, the driver will optimize your uniform out. Drivers are really good at optimizing code. If you are using your uniform and it is clear that the uniform will never effect the output, the uniform will get optimized out.


When should you call glUseProgram?

glUseProgram needs to be called when you setup a uniform. There are several versions of the glUniform* function depending if your variable is a single float, vec2, vec3, vec4, a matrix, etc. Notice that the glUniform* functions do not take the program ID (your shader) as a parameter.

What if you want to get the location of a uniform? Notice that glGetUniformLocation takes the program ID (your shader) as a parameter. No, you do not need to call glUseProgram before calling glGetUniformLocation.

What if you want to render? glUseProgram needs to be called before you use glDrawArrays or glDrawElements or glDrawRangeElements or whatever draw function you are using. It may seem obvious that you need to bind your shader before you render your object but some newcomers seem to call glUseProgram after glDrawArrays or glDrawElements or glDrawRangeElements or whatever.

Uniform Names in VS, GS and FS

So what happens if you have the same exact uniform name in both the vertex shader and geometry shader and fragment shader?

Yes, it is legal to have the same uniform name in all shaders.

When you call glGetUniformLocation, it will return one location. When you update the uniform with a call to glUniform, the driver takes care of sending the value for each stage (vertex shader, geometry shader, fragment shader).

This is because a GLSL program contains all of the shader stages at once. Programs do not consider uniforms in a vertex shader to be different from uniforms in a fragment shader.

Keep in mind that this applies to all uniforms : float, vec2, vec3, vec4, mat3, mat4, bool, sampler2D, sampler3D and the many others.


Enable Or Not To Enable

With fixed pipeline, you needed to call glEnable(GL_TEXTURE_2D) to enable 2D texturing. You needed to call glEnable(GL_LIGHTING). Since shaders override these functionalities, you don't need to glEnable/glDisable. If you don't want texturing, you either need to write another shader that doesn't do texturing or you can attach a all white or all black texture, depending on your needs. You can also write one shader that does lighting and one that doesn't.

Things that are not overriden by shaders, like the alpha test, depth test, stencil test... calling glEnable/glDisable will have an effect.

Binding A Texture

NVIDIA and Types

nVidia drivers are more relaxed. For example:

float myvalue = 0;

The above is not legal according to the GLSL specification 1.10, due to the inability to automatically convert from integers (numbers without decimals) to floats (numbers with decimals). Use 0.0 instead. With GLSL 1.20 and above, it is legal because it will be converted to a float.

float myvalue1 = 0.5f;
float myvalue2 = 0.5F;

The above is not legal according to the GLSL specification 1.10. With GLSL 1.20, it becomes legal.

float texel = texture2D(tex, texcoord);

The above is wrong since texture2D returns a vec4. Do one of these instead:

float texel = texture2D(tex, texcoord).r;
float texel = texture2D(tex, texcoord).x;

Functions inputs and outputs

Functions should look like this

vec4 myfunction(inout float value1, in vec3 value2, in vec4 value3)

instead of

vec4 myfunction(float value1, vec3 value2, vec4 value3)

Not Used

In the vertex shader

gl_TexCoord[0] = gl_MultiTexCoord0;

and in the fragment shader

vec4 texel = texture2D(tex, gl_TexCoord[0].xy);

zw isn't being used in the fs.
Keep in mind that for GLSL 1.30, you should define your own vertex attribute.
This means that instead of gl_MultiTexCoord0, define AttrMultiTexCoord0.
Also, do not use gl_TexCoord[0]. Define your own varying and call it VaryingTexCoord0.

Sampling and Rendering to the Same Texture

Normally, you should not sample a texture and render to that same texture at the same time. This would give you undefined behavior. It might work on some GPUs and with some driver version but not others.

The extension [GL_NV_texture_barrier] can be used to avoid this in certain ways. Specifically, you can use the barrier to ping-pong between two regions of the same texture without having to switch textures or buffers or anything. You still don't get to read and write to the same location in a texture at the same time unless there is only a single read and write of each texel, and the read is in the fragment shader invocation that writes the same texel.