Yes. But I think that those release notes are for the first versions of the 6x drivers. With first versions if you run GLSL Shading Language Demo (from 3DLabs: http://developer.3dlabs.com/openGL2/downloads/index.htm)) you will notice that the mandelbrot and Julia shaders will run very slow (the use a loop based in an uniform value). But with latest 6x.yy drivers or with 70.xx they have implemented branching and looping. This is part of the ‘object assembly’ generated code (using nvemulate) for the Julia shader:
...
ADDR R3.x, R0.y, c[2];
[BOLD] LOOP c[3].yxxw; [/BOLD]
SLTR H0.x, R2, c[3].z;
SLTR H0.w, R1.x, c[4].x;
MULXC HC.x, H0.w, H0;
[BOLD] BRK (EQ.x); [/BOLD]
MOVR R2.x, R3;
MOVR R1.w, R4.x;
MULR R0.w, R1, R2.x;
...
MADR R2.x, R1.w, R1.w, R0.w;
[BOLD] ENDLOOP; [/BOLD]
MULR R0.w, R1.x, c[7].y;
...
I have also noted than in the test I made, the ‘object assembly’ generated shader does not include the branching (inside the if statment it has the half-light vector normalization, access to the decal texture, the specular computation and the final sum/multiply of all the calculated and uniform lighting parameters). It calculates all the shader and when doing the final sum, it multiplies the result conditionally to get one value or the other.
SGTRC HC.w, R1, c[0].z;
...
MADR result.color.xyz(NE.w), R0, R1, R2;
This is why it is slower.
It is strange. I presume that with a branching it should be faster.
Another surprise is that it doesn’t use the NV_fragment_program_2 normalize instruction. (I have three normalize calls inside my shader). It is still using the DP3/RSQ pair to normalize. Maybe the ‘unified compiler’ will convert them…