Nvidia loop unrolling behavior

Hi,
currently i`m trying to do some math , namely a weighted addition of very long vectors.
The GLSL FS code to this :

  
void main (void ) 
{
	int i ; 
	vec4 accum = vec4 (0,0,0,0);
	vec2 address = vec2(gl_TexCoord[0]) ;
	vec4 temp;	
	const vec2 xOffs=vec2(TEX_TILE_WIDTH,0) ;
	const vec2 yOffs=vec2(0,TEX_TILE_HEIGHT); 
// start with mean 
	accum = texRECT (textures[0],address);

	for ( i= 1 ; i < NUM_COEFFS+1 ; i++)
	{
	 	address = (vec2)gl_TexCoord[0]
	 				+ ((i%WH)%W)*xOffs +
	 	   			(((i%WH) - ((i%WH)%W )) / W )*yOffs ;
" 
	 	const int index = (i-(i%WH))/WH;
		const int In = i -1 ; 
		const int cIndex = (In-(In%4))/4;

		float val ;
		if ((In%4) == 0 )  {	val = factors[cIndex].x;}		else if ((In%4) == 1) {   val = factors[cIndex].y;}\
		else if ( (In%4) == 2) 	{   val = factors[cIndex].z;}
		else { val = factors[cIndex].w;}
		
	 	accum+=   texRECT(textures[index],address)* 
	 			  val;
	}
gl_FragColor = accum * scaleBias.x + scaleBias.y;
} 

For this to work on NV30 the driver has to unroll the loop manually. The problem though is that the Nvidia driver refuses to unroll loops that run longer than 256 times , so if NUM_COEFFS is greater than 256 , which is pretty common, the shader won`t compile.
Is there a solution to this , ideally i would like to be able to force the compiler to unroll a certain loop , a #pragma ( unroll ) or something would be great.
To the Nvidia guys : Are you planning to remove this limitation , or is there a way around it ?
Thanks ,
Martin Kraus

P.S.: What does one have to do to get a registered developer login ( apart from filling and sending the form … because i get no answer )

You hit a fragment program instruction limit.

  • Reduce the code in your loop. It looks like it has potential for optimizations. (You didn’t say what WH, W and factors are so I can’t help more.)
  • Separate the calculation into multiple drawing passes.
  • Try newer drivers. Maybe newer compilers produce shorter code.
  • If that’s still not helping, make friends with NV40.

Hi ,
no , i don`t think i am hitting hardware limits here … looking at the asm output from the nvidia driver i get 3 instructions for one iteration ( all that addressing can be calculated when compiling ) which would make 768 + some instructions , which again is smaller than the limit of 1024 .
So i think i am hitting a compiler limit , the info log says :

  1. : warning C7012: not unrolling loop that executes 272 times since
    maximum loop unroll count is 256
    (35) : error C5013: profile does not support “for” statements
    39 lines, 1 warnings, 1 errors.

Btw , the hand-unrolled NV_fragment_program version of this works just fine , so im saying that this is definitely a compiler boundary im hitting here …
And i will make friends with NV40 very soon ,but for now NV30 can do what i want , it`s just the GLSL compiler that stands in the way.
Bye ,
Martin Kraus

Ok, that sounds different. :wink:
You could send a bugreport to the developer relations then.

Originally posted by Martin Kraus:
[b]Hi ,
no , i don`t think i am hitting hardware limits here … looking at the asm output from the nvidia driver i get 3 instructions for one iteration ( all that addressing can be calculated when compiling ) which would make 768 + some instructions , which again is smaller than the limit of 1024 .
So i think i am hitting a compiler limit , the info log says :

[quote]
35) : warning C7012: not unrolling loop that executes 272 times since
maximum loop unroll count is 256
(35) : error C5013: profile does not support “for” statements
39 lines, 1 warnings, 1 errors.

Btw , the hand-unrolled NV_fragment_program version of this works just fine , so im saying that this is definitely a compiler boundary im hitting here …
And i will make friends with NV40 very soon ,but for now NV30 can do what i want , it`s just the GLSL compiler that stands in the way.
Bye ,
Martin Kraus[/b][/QUOTE]Just curious, how do you get the asm output from the NVidia compiler?

hi ,
theres a registy key you can set , so asm files for fp and vp will be output to your current directory ( the trick was here on the board ) sadly though it doesnt seem to work anymore for me with 61.34 …

Originally posted by Martin Kraus:
hi ,
theres a registy key you can set , so asm files for fp and vp will be output to your current directory ( the trick was here on the board ) sadly though it doesnt seem to work anymore for me with 61.34 …

It works on my machine with 61.34 and 61.71. So… I’ll post it again:

  
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\Global\OpenGL]

[HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\Global\OpenGL\Debug]
"WriteProgramObjectAssembly"=dword:00000001

yooyo

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.