I want to iterate over the lights in the fragment shader and a static REP Loop seems to be a good solution. The other idea would be to use a #define and to recompile the shader for each lightcount, but I’ve read that with ps30 it’s possible to create uber shaders.
The problem is, that unfortunately the glsl compiler (gf6800, driver version 76.41) always creates a LOOP/ENDLOOP loop with the maximum iteration count of 255 and a break instruction. How should a for loop look like to use static branching? Thanks.
uniform int lightcount;
for(int i=0;i<lightcount;i++)
{
}
That’s up to nVidia’s compiler. You’ve done all you can to let it know what kind of loop it should build; it’s now up to them to make their compiler better.
Thank’s for your reply. I hoped there was a special pattern for this kind of loop. There is a not unimportant speed difference between “#define lightcount” and “uniform lightcount” with full dynamic branching. In D3D there are these constant integer registers for loops even with ps20 and the drivers unrolls the loop and recompiles the shader internally. Hopefully this behaviour will be also implemented with uniform variables in glsl, because compiling and linking a glsl shader manually with the “#define lightcount X” is very slow.
I did some tests and it seems that LOOP and REP are nearly equally fast,when the index register is not used. It’s surprising how many instructions can be executed without to much performance lost, if the loop iteration count is fixed. But infact it is the unnecessary dynamic branching BRK instruction that really slows the shader down several times. Does someone know, if this will be fixed in one of the next driver releases ?
That could have been the special pattern, but it generated nearly the same code. Only the SLTRC is replaced with SGERC. All fragments are running through the same path of the shader.
I’ve also checked the output of the cg compiler with profile fp40 enabled and it also used the BRK instruction.
But the DirectX shader compiler (fxc.exe)generates the correct result using profile ps_3_0, even if the loop starts at zero.
The parameter lightcount is assigned to i0.
I compiled only the pixel shader without an .fx file. The shader itself is senseless, but the interesting thing is the loop. Here is the output:
A better glsl implementation could detect that the result depends only on uniform and consts and run this loop on the cpu, reducing the shader to one mov instruction.