Conditional and loops

card: geforce fx, os: linux, drivers 76.76

  1. conditional

Using if statements slow down quiete much the process: about half for each couple of those statements. Is it a normal thing on shaders or does it depend on the graphics card/driver ? (I guess the card can change the things, but do drivers too ?) Can I expect if to be faster in a soon future, at least fast enough in order not to notice slowdowns (even with heavy use of them) ?

  1. loops

I finally remarked I can use for statements but I must use constants for the second term (conditional). Will that point change in a near future ? I also remarked that for is much faster than pure if conditionals: it is a strange thing at my opinion because such a loop obligatory requires conditionals to be performed !

I’d like any comment/info/suggestion/remarks about that. Thank you.

jide,

As fas as I remember, the dynamic branching/looping in vertex shaders/programs in the Geforce FX family is really slow in the first members of the family 5800, … And it is faster in the FX 59xx.
In the fragment shader/programs it is not supported in this family so it should be unrolled by the compiler (this is why it needs to be known at compile time).

Dynamic branching is supported in the fragment shaders/programs in the Geforce 6 & 7 family with some restrictions you can find in the section 2.1.1 of NVIDIA OpenGL 2.0 support document: http://developer.nvidia.com/object/nv_ogl2_support.html
(I have noticed that in section 2.2.1 it also talks about Geforce FX fragment-level branching)

In hw>=GF6, in fragment shaders (I don’t know if it is a hw limitation or current drivers limitation), you can not use loops to index through varyings or throught built-in constants. For example:

vec4 v4color=vec4(0.0);
for(int i=0; i<u_iNLightsFS; ++i){
  v4Color.rgb+=gl_LightSource[i].ambient.rgb;		}

(being u_iNLightsFS an uniform value)

gives me the error:
(16) : error C5043: profile requires index expression to be compile-time constant

The next code:

vec4 v4color=vec4(0.0);
for(int i=0; i<u_iNLightsFS; ++i){
  float fAtten=texture2D(u_txtrAtten2D, var_v4AttenTxtrCoords[i].xy).a;
  v4Color.rgb*=fAtten;
}

Gives me the same error.

But code like this:

    for (float iter = 0.0; iter < MaxIterations && r2 < 4.0; ++iter){
        float tempreal = real;
        real = (tempreal * tempreal) - (imag * imag) + Creal;
        imag = 2.0 * tempreal * imag + Cimag;
        r2   = (real * real) + (imag * imag);
    }

(Fron the Mandelbrot example in 3DLabs GLSLDemo. Being MaxIterations an uniform)
Work without problems (no accesing anything with an index).

The info you can find on section 4.1.3 of the NVIDIA GPU Programming Guide is a beautiful example of what I’m NOT being able to do with the current hw/drivers.

Hope this helps.

– Carlos

I read an artikel some times ago, where I can remember some things who can help you (but I was unable to find these artikel again :frowning: ).

So far I remember, “if’s” in fragment-shaders will be made on blocks. If these blocks are big (I think nvidia uses blocks greater or equal to 64x64 pixels, newest ati-cards X1x00 can handle bocks up to 4x4) and your pixels in block cover both (if and else) so each pixel have to be calculated slow (check always if and else) otherwise it will perform much faster.

I think, newer hardware will perform better on conditions (e.g. using of smaller blocks and other optimisations).

Please remember current GPU’s aren’t comparable with CPU’s, e.g. no condition caching mechanism (like TLB’s, in GPU’s I didn’t read something about that). :wink:

It is curious that in the NV_fragment_program_2 example in the NVIDIA SDK, you find the next code for the fragment program:

...
PARAM nlights = program.local[0]; # number of lights
...
REP nlights;
    # get light position and color from texture
    TEXC lightPos, lightIndex, texture[0], RECT;  # write condition code
    TEX lightColor, lightIndex, texture[1], RECT;
    # call correct lighting function based on w component of position
    IF EQ.w;          # lightPos.w == 0.0
      CAL dirlight;
    ELSE;
	  CAL pointlight;
	ENDIF;
	ADD lightIndex, lightIndex, 1.0; # increment loop counter
ENDREP;
...

That basically is doing something like:

for(int i=0; i<nlights; ++i){
  vec4 vcPos=textureRect(0, gl_TexCoord[i].xy);
  vec4 vcColor=textureRect(1, g_TexCoord[i].xy);
  if(vcPos.w==0.0)
    DirLight();
  else
    PointLight();
}

It is using an index to loop through the texture coordinate sets.
So maybe the limitation is currently in GLSL.

Hope this helps.

Thank you very much for your help. I found ways to avoid using conditionals and loops but this requires writting much more shaders.
I’ll also see for the documents you stippled.

Btw, my graphic card is a Geforce FX 5600.

But as I said, loops with for are very much faster than when using if. Still don’t know why since a for do an if at each new loop (for ensuring the condition is still true).

Anyway thanks again.

What i read somewhere sometime ago, regarding dynamic branching is that the driver uses CPU to handle branches (can someone confirm this?). If such is the case then loops would definitely be faster since the driver can just unroll the entire loop (remember the conditional must be a constant) and no data would have to be shifted via the AGP bus to do the comparison on the CPU.

Also for cards that support dynamic branching, i highly recommend using “discard” instruction because it can really help.

If you write the above NV_fragment_program2 example in glsl, the current compiler inserts always an unnecessary dynamic branch BRK into the loop. Loops with only REP and without the dynamic BRK are much faster. How can I make the GLSL Compiler to create a loop without BRK?

Originally posted by jide:
[b]Thank you very much for your help. I found ways to avoid using conditionals and loops but this requires writting much more shaders.
I’ll also see for the documents you stippled.

Btw, my graphic card is a Geforce FX 5600.

But as I said, loops with for are very much faster than when using if. Still don’t know why since a for do an if at each new loop (for ensuring the condition is still true).

Anyway thanks again.[/b]
For your hw. If you are talking about fragment shaders: It does not support branching, that means:

  • Ifs (branches): all the shader is evaluated even if the condition does not match. This means that the content of the ‘if’ is evaluated even if the condition is not true (and then possibly the result multiplied by the result of the condition).
    For example, if you have
    if(xx){
    code1;
    }
    else{
    code2
    }
    code1 and code2 are executed.
    This is why it will ever be slower. It is worse, the compiler should include code to ‘eliminate’ the effect of the condition that is not true. See the pseudo-assembler generated to understand what I say.
  • Loops: they are unrolled, this is why they are faster.

Hope this helps.

Cab, I’ve just tested what you said and the result is that my fragment shaders don’t evaluate to true all the if statements:

if (a == 1)
   color = ...
else if (a == 2)
   color = ...
else
   color = ...

evaluates only the first to true (which is well): the color is the one given in the first but not the last one. So, maybe that depends on hardware/drivers.

What Cab meant was that the GPU will execute all statements within each if block i.e. temporary fragment color will be calculated with each code path and the final color of the fragment will be the one which came from the correct if block. Hope you understand.

That’s fine now, I misunderstood what he said.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.