AMD's GLSL optimization

I noticed that when compiling shader, some obviously not in use attributes and uniforms remain active. This happens when the data passes through variables of structure.
Here simple program shows this issue

  [Vertex shader]
  #version 330
  in vec3 Position;
  in vec4 VertexColor;
  in vec2 TexCoord0;
  in vec2 TexCoord1;
  in vec2 TexCoord2;
  in vec2 TexCoord3;
  in vec2 TexCoord4;
  in vec2 TexCoord5;
  in vec2 TexCoord6;
  in vec2 TexCoord7;

  out vec4 fVertexColor;
  out vec2 fTexCoord0;
  out vec2 fTexCoord1;
  out vec2 fTexCoord2;
  out vec2 fTexCoord3;
  out vec2 fTexCoord4;
  out vec2 fTexCoord5;
  out vec2 fTexCoord6;
  out vec2 fTexCoord7;

  struct VertexRec {
     vec3 Position;
     vec4 VertexColor;
     vec2 TexCoord0;
     vec2 TexCoord1;
     vec2 TexCoord2;
     vec2 TexCoord3;
     vec2 TexCoord4;
     vec2 TexCoord5;
     vec2 TexCoord6;
     vec2 TexCoord7;
  };

  void GetVertex(inout VertexRec A)' + #10#13 +
  {
    A.Position = Position;
    A.VertexColor = VertexColor;
    A.TexCoord0 = TexCoord0;
    A.TexCoord1 = TexCoord1;
    A.TexCoord2 = TexCoord2;
    A.TexCoord3 = TexCoord3;
    A.TexCoord4 = TexCoord4;
    A.TexCoord5 = TexCoord5;
    A.TexCoord6 = TexCoord6;
    A.TexCoord7 = TexCoord7;
    return;
  }

  void PassVertex(inout VertexRec A)
  {
    gl_Position = vec4(A.Position, 1.0);
    fVertexColor = A.VertexColor;
    fTexCoord0 = A.TexCoord0;
    fTexCoord1 = A.TexCoord1;
    fTexCoord2 = A.TexCoord2;
    fTexCoord3 = A.TexCoord3;
    fTexCoord4 = A.TexCoord4;
    fTexCoord5 = A.TexCoord5;
    fTexCoord6 = A.TexCoord6;
    fTexCoord7 = A.TexCoord7;
    return;
  }

  void main()
  {
     VertexRec V;
     GetVertex(V);
     PassVertex(V);
  }

  [Fragment shader]
  #version 330
  in vec4 fVertexColor;
  in vec2 fTexCoord0;
  uniform sampler2D TexUnit0;
  out vec4 FragColor;

  void main()
  {
    FragColor = fVertexColor * texture(TexUnit0, fTexCoord0); 
  }

All vertex attributes is active, however used only Position, VertexColor, TexCoord0. If your application bind vertex arrays depend on activity of attributes, but they are not used, it can lead to an increase size of vertex and unnecessary work of GPU, and most importantly - unnecessary occupation of attribute slots.

By the way, NVidia via it CG compiler make better optimization and there no useless attributes.

PS: in Catalyst 11.1 still not work GL_ARB_get_program_binary
glGetProgramiv(ID, GL_PROGRAM_BINARY_LENGTH, @val) always return 0.

Have you called


glProgramParameteri(prog.Handle, GL_PROGRAM_BINARY_RETRIEVABLE_HINT, GL_TRUE);

and then linked the program before trying to retrieve the program?

eg.


var
  binary: array of byte;
begin
  glProgramParameteri(prog.Handle, GL_PROGRAM_BINARY_RETRIEVABLE_HINT, GL_TRUE);
  glLinkProgram(prog.Handle);
  glGetProgramiv(prog.Handle, GL_PROGRAM_BINARY_LENGTH, @progLength);
  SetLength(binary, progLength);
  glGetProgramBinary(prog.Handle, progLength, nil, @binFormat, @binary[0]);
end;

It works on the most recent Catalyst drivers for me using that.

Although I did discover that if you don’t link the program after setting the retrievable hint, and then try retrieving binary program information, you cause an AV in the driver:


  // prog already linked before this code
  glProgramParameteri(prog.Handle, GL_PROGRAM_BINARY_RETRIEVABLE_HINT, GL_TRUE);
  glGetProgramiv(prog.Handle, GL_PROGRAM_BINARY_LENGTH, @progLength); // BOOM !!!!

Yes, I doing this, you may look at unit GLShaderManager.pas in Experimental folder. On NV its work.

Build-in constant VERSION still equal 100 for each version of GLSL.
This is a very simply bug, why guys from AMD is still not fixed it.
Shame on you :slight_smile:

PS: but I noticed 11.1 driver visibly increase OpenGL performance. Thanks.

I noticed that when compiling shader, some obviously not in use attributes and uniforms remain active.

In what way are the attributes in your example “obviously” not in use? The values aren’t consumed by the fragment shader, but the vertex shader “obviously” consumes the input attributes and passes them to outputs.

All the spec says is, “A generic attribute variable is considered active if it is determined by the compiler and linker that the attribute may be accessed when the shader is executed.” And these attributes certainly are being accessed. They may not contribute to the final output, but they clearly are being accessed.

I for one wouldn’t expect an implementation to be able to detect that an attribute is considered not in use due only to the fact that the corresponding fragment shader doesn’t use the results. Sure, I wouldn’t be against such a thing if a driver does it, but I also wouldn’t build my engine to rely on such behavior. Especially with ARB_separate_shader_objects out there to allow mix-and-match programs.

Your statement made me confused. I go very wrong when was decided to simplify the work with the attributes in that way. I thought that the optimization over all objects of the program - a prerequisite, but it was filthy feature of Nvidia. I have not yet learned how to use ARB_separate_shader_objects, but I think that the developers might consider both ways for different purposes.

Interesting how work optimization on low level (GPU assembler)
It must be removing of instructions to decrease register using and that be depends on busyness registers in final output. So attributes and varyings removing must be depends on using its by registers.

Is there a way to globally use with multiple shader objects of conditional compilation?

I have tried to set defines them in the main object, but the other shader objects can not see they. And because shader sub object shared with many other shader program, it will be bad idea every time change it. Also I want avoid creating huge base of shader sub object on in any case.

ARB_shading_language_include method is not to suggest it’s too new.

It’s not clear from your English (I think we might be losing some meaning in the translation), but do you mean something like this:


// Defines section (value can vary per shader program)
const int NUM_POINT_LIGHTS = 1;

// Code section
...
uniform vec3  PtLightDiffuse[ NUM_PT_LIGHTS ];
...
for ( int i = 0; i < NUM_PT_LIGHTS; i++ )
  color += applyLighting( i );

Sorry for bad English, that’s what I mean

[Shader object #1]
#version 330
in vec3 Position;
#ifdef VERTEXCOLOR_USED
in vec4 VertexColor;
#endif
void GetVertex(inout VertexRec A)
{
  A.Position = Position;
#ifdef VERTEXCOLOR_USED
  A.VertexColor = VertexColor;
#endif
}

[Shader object #2]
#ifdef VERTEXCOLOR_USED
out vec4 fVertexColor;
#endif
void PassVertex(inout VertexRec A)
{
  gl_Position = vec4(A.Position, 1.0);
#ifdef VERTEXCOLOR_USED
  fVertexColor = A.VertexColor;
#endif
}

[Main shader object]
#define VERTEXCOLOR_USED
void GetVertex(inout VertexRec A);
void PassVertex(inout VertexRec A);
void main()
{
   VertexRec V;
   GetVertex(V);
   PassVertex(V);
}

Something like ubershader, but with multiple shader objects.

Ah!, I get what you mean. Have experienced that.

When I was working with AMD GL drivers with unified shader programs (no seperate shader objects), it had trouble auto-optimizing away unused expression DAG branches, including unreferenced uniforms and varyings in those DAG branches. NVidia, OTOH, is excellent at nuking unused expressions/varyings/uniforms.

However, with separate shader objects, IIRC you have to promise not to change the interface so that you can play musical shaders.

Okay, I’ll explore that option.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.