Temporary Register Limit

hey all, I’m currently working on a program, using the OpenGL Shading Language to perform some scientific modeling on the graphics card of my computer. In a recent effort to increase the amount of data passed to the shader program, I added additional textures for reading and writing, and altered the shader source code as necessary, including adding in an extra call for two functions in the fragment shader source code. I’ve checked the max size of my textures possible, and I’m at the limit, so this was the only option available.

Unfortunately, when I made these alterations, I received the error message:

“Temporary register limit of 32 exceeded; 57 registers needed to compile program”

I don’t have much experience with this aspect of programming, being mostly a self taught programmer, like how a computer chooses variables for placement in temporary registers. Could someone please explain what this problem is exactly, and any suggestions for fixing this problem?

I’m running the shader program on an NVIDIA 7600 GS with the latest driver version

Thank you very much, any help is very much appreciated.

The problem is that there is no temporary storage in memory for values in a fragment shader on Geforce 7. So all working values have to be stored in registers.

So it seems that at some stage you have 57 values of “active” data. (So you probably have a very complex shader)

If you can post the shader here someone might be able to suggest a way to re-arrange it to use less registers. (doing combining operations earlier etc)

Another suggestion is to run your shader through the Nvidia Cg compiler (or dump the assembly output at runtime) to view how registers are being allocated.

OK here is the shader source code:

GLcharARB* lorentz_vert_source =
“void main(void)”
“{”
" gl_TexCoord[0] = gl_MultiTexCoord0;"
" gl_TexCoord[1] = gl_MultiTexCoord1;"
" gl_TexCoord[2] = gl_MultiTexCoord2;"
" gl_TexCoord[3] = gl_MultiTexCoord3;"
" gl_Position = ftransform();"
“}”;

GLcharARB* lorentz_frag_source =

“uniform float time;”
“uniform float time_step;”
“uniform float Bo;”
“uniform float Re;”
“uniform float xscale;”

“uniform sampler2D velocity_mass;”
“uniform sampler2D position_charge;”
“uniform sampler2D vel_mass;”
“uniform sampler2D pos_charg;”

“float ERRTOL =0.08;”
“float ERRT =0.05;”

“uniform int particle_tex_height;”
“uniform int particle_tex_width;”
"
“//1
“vec3 ElectricField(vec3 position, int x){”
" const float one_over_four_pi_eo = 0.28;”
" int i, j;"
" vec2 particle_index;"
" vec4 particle_position;"
" vec3 direction;"
" float magnitude;"
" vec3 electric_sum = vec3(0.0);"

" for(i=0; i<int(particle_tex_height); i++) {"
" for(j=0; j<int(particle_tex_width); j++) {"

" particle_index.x = (float(j)+0.5) / float(particle_tex_width);"
" particle_index.y = (float(i)+0.5) / float(particle_tex_height);"
“if(x==0){”
" particle_position = texture2D(position_charge, particle_index);"
“} if(x==1){”
“particle_position = texture2D(pos_charg, particle_index);”
“}”

" direction = position - particle_position.xyz;"
" magnitude = length(direction);"

" if (magnitude > 1e-9) {"
" electric_sum += one_over_four_pi_eo * particle_position.w * direction / pow(magnitude, 2.0);"
" }"

" }"
" }"
" return electric_sum;"

“}”
"
" //2

“float EllipticFirstKind(in float x, in float y, in float z)”
“{”
“float alamb, ave, delx, dely, delz, sqrtx, sqrty,sqrtz,xt,yt,zt, ea, eb;”
“xt=x;”
“yt=y;”
“zt=z;”
“do {”
“sqrtx = sqrt(xt);”
“sqrty = sqrt(yt);”
“sqrtz = sqrt(zt);”
“alamb = sqrtx*(sqrty+sqrtz)+sqrtysqrtz;"
"xt = 0.25
(xt+alamb);”
“yt = 0.25*(yt+alamb);”
“zt = 0.25*(zt+alamb);”
“ave = (1.0/3.0)(xt+yt+zt);"
“delx = (ave-xt)/ave;”
“dely = (ave-yt)/ave;”
“delz = (ave-zt)/ave;”
“} while (max(max(abs(delx),abs(dely)),abs(delz)) > ERRTOL);”
"ea=delx
dely-delzdelz;"
"eb=delx
dely*delz;”
“return(1.0+((1.0/24.0)*ea-(0.1)-(3.0/44.0)eb)ea+(1.0/14.0)eb)/sqrt(ave);"
“}”
"
" //3
“float LEFK (in float ak)”
“{”
“float phi = 3.1415/2.0;”
“float s = sin(phi);”
"return s
EllipticFirstKind(cos(phi)cos(phi),(1.0-sak)
(1.0+s
ak),1.0);”
“}”
"
" //4

“float EllipticSecondKind(in float x, in float y, in float z)”
“{”

"float alamb, ave, delx, dely, delz, sqrtx, sqrty,sqrtz,xt,yt,zt,fac,sum, ea, eb,ec,ed,ee;"
"xt=x;"
"yt=y;"
"zt=z;"
"sum=0.0;"
"fac=1.0;"
"do {"
	"sqrtx = sqrt(xt);"
	"sqrty = sqrt(yt);"
	"sqrtz = sqrt(zt);"
	"alamb = sqrtx*(sqrty+sqrtz)+sqrty*sqrtz;"
	"sum += fac/(sqrtz*(zt+alamb));"
	"fac=0.25*fac;"
	"xt = 0.25*(xt+alamb);"
	"yt = 0.25*(yt+alamb);"
	"zt = 0.25*(zt+alamb);"
	"ave = 0.2*(xt+yt+3.0*zt);"
	"delx = (ave-xt)/ave;"
	"dely = (ave-yt)/ave;"
	"delz = (ave-zt)/ave;"
"} while (max(max(abs(delx),abs(dely)),abs(delz)) &gt; ERRT);"
"ea = delx*dely;"
"eb = delz*delz;"
"ec = ea-eb;"
"ed = ea-6.0*eb;"
"ee = ed+ec+ec;"

"return 3.0*sum+fac*(1.0+ed*(-(3.0/14.0)+(0.25*(9.0/22.0))*ed-(1.5*(3.0/26.0))*delz*ee)+delz*((1.0/6.0)*ee+delz*(-(9.0/22.0)*ec+delz*(3.0/26.0)*ea)))/(ave*sqrt(ave));"

“}”
"
" //5

////Preforms Legendre Transformation to K space of Elliptic Integral Second Kind////////////////////////////
“float LESK (in float ak)”
“{”
“float cc,q,phi;”
“phi = 3.1415/2.0;”
“float s = sin(phi);”
“cc = cos(phi)cos(phi);"
"q = (1.0-s
ak)(1.0+sak);”
“return s*(EllipticFirstKind(cc,q,1.0)-((sak)(s*ak))*EllipticSecondKind(cc,q,1.0)/3.0);”
“}”
"
" //6

//CALCULATES MODULUS - SEE SITE ON OFF AXIS FIELD STRENGTH FOR EQUATION
“float Kv( float r, float a, float z)”
“{”
“return sqrt((4.0ra)/((zz)+(rr)+(aa)+(2.0ra)));"
“}”
"
" //7
“vec3 MagneticFieldRing(float x, float y, float z, float a, float I, float magz_pos)”
“{”
“float brr;”
“float bzz;”
“float posx = x;”
“float posy = y;”
“float posz = z-magz_pos;”
"float r = sqrt(((posx
posx)+(posy*posy)));”
“float theta = atan(posy/posx);”

"float ak = Kv(r,a,posz);"
"brr = (I*posz/(2.*3.1415*r*sqrt((((a*a)+(r*r)+(2.*r*a))+posz*posz))))*(-LEFK(ak)+((a*a+r*r+posz*posz)/(((a*a)+(r*r)-(2.*a*r))+posz*posz))*LESK(ak));"
"bzz = ((I)/(2.*3.1415*sqrt((a*a)+(r*r)+(2.*r*a)+(posz*posz))))*(LEFK(ak)+((a*a-r*r-posz*posz)/(((a*a)+(r*r)-(2.*a*r))+posz*posz))*LESK(ak));"
"float b_x = brr*cos(theta);"
"float b_y = brr*sin(theta);"
"float b_z = bzz;"

"return vec3(b_x, b_y, b_z);"

“}”
"
" //8

“uniform int number_of_magnets;”
“uniform float magnet_current[6];”
“uniform float magnet_zpos[6];”
“uniform float magnet_radius[6];”
“vec3 MagneticField(float time, vec3 position)”
“{”
//DOD - this function is not correct
“float posx = position.x;”
“float posy = position.y;”
“float posz = position.z;”
“int j;”
//“float b_mag;”

"vec3 b_temp = vec3(0.0);"
"vec3 b = vec3(0.0);"


"for (j=0; j &lt; 3; j++)"
"{"
"b = MagneticFieldRing(posx, posy, posz,magnet_radius[j],magnet_current[j],magnet_zpos[j]);"

"b_temp = b_temp + b;"

“}”

“return b_temp;”
“}”
"
" //10
“vec3 Lorentz(in float charge, in float mass, in float time, in vec3 position, in vec3 velocity, in vec3 electric_field)”
“{”
" float chrg_div_mass = charge/mass;"
" vec3 fields = electric_field + cross(velocity, MagneticField(time, position));"
" return chrg_div_mass * fields;"
“}”
"
" // 10
“vec3 Euler(vec3 y, vec3 slope, float delta)”
“{”
" return y + slope * delta;"
“}”
"
" //11

"
"
“void RungeKutta(float charge, float mass, float time, float time_step, in vec3 position, in vec3 velocity, in vec3 electric_field, out vec4 computed_vel, out vec4 computed_pos)”
“{”

" float half_step = 0.5 * time_step;"
" float half_time = time + half_step;"
" float full_time = time + time_step;"

" vec3 k1, k2, k3, k4;"
" vec3 ks1, ks2, ks3, ks4;"
" vec3 v1, v2, v3, v4;"
" vec3 p1, p2, p3, p4;"
" v1 = velocity;"
" p1 = position;"
" k1 = Lorentz(charge, mass, time, p1, v1, electric_field);"
" ks1 = v1*xscale;"
" v2 = Euler(velocity, k1, half_step);"
" p2 = Euler(position, ks1, half_step);"
" k2 = Lorentz(charge, mass, half_time, p2, v2,electric_field);"
" ks2 = v2*xscale;"
" v3 = Euler(velocity, k2, half_step);"
" p3 = Euler(position, ks2, half_step);"
" k3 = Lorentz(charge, mass, half_time, p3, v3,electric_field);"
" ks3 = v3*xscale;"
" v4 = Euler(velocity, k3, time_step);"
" p4 = Euler(position, ks3, time_step);"
" k4 = Lorentz(charge, mass, full_time, p4, v4,electric_field);"
" ks4 = v4*xscale;"
" computed_vel.xyz = velocity + (time_step / 6.0) * (k1 + 2.0 * k2 + 2.0 * k3 + k4); "
" computed_vel.w = mass;"
" computed_pos.xyz = position + (time_step / 6.0) * (ks1 + 2.0 * ks2 + 2.0 * ks3 + ks4); "
" computed_pos.w = charge;"

“}”

"
" //12

“void main(void)”
“{”
" int x;"
" vec4 computed_velocity[2];"
" vec4 computed_position[2];"
" vec4 space[2];"
" vec4 momentum[2];"
" float mass[2];"
" float charge[2];"
" momentum[0] = texture2D(velocity_mass, gl_TexCoord[0].xy);"
" space[0] = texture2D(position_charge, gl_TexCoord[1].xy);"
" momentum[1] = texture2D(vel_mass, gl_TexCoord[2].xy);"
" space[1] = texture2D(pos_charg, gl_TexCoord[3].xy);"
" mass[0] = momentum[0].a;"
" mass[1] = momentum[1].a;"
" charge[0] = space[0].a;"
" charge[1] = space[1].a;"
" for(x=0;x<2;x++){"
“vec3 electric_field = ElectricField(space[x].xyz, x);”
“RungeKutta(charge[x], mass[x], time, time_step, space[x].xyz, momentum[x].xyz, electric_field, computed_velocity[x], computed_position[x]);”
“}”
//LOOP REDUCES TEMP REGISTER USE FROM 57 TO 41, ACCOUNTING FOR ALL 3 MAGNETS
" gl_FragData[0]= computed_velocity[0];"
" gl_FragData[1] = computed_position[0];"
" gl_FragData[2] = computed_velocity[1];"
" gl_FragData[3] = computed_velocity[1];"
“}”;

Anybody have any suggestions? Thank you

That nested for() loop looks bad.

I have run into problems on an ATI where there were “too many instructions”. It turned out that the loops were being unrolled by the compiler.

Perhaps the for loops are causing the compiler problems for you.
Especially if they are impossible to unroll.

" for(i=0; i<int(particle_tex_height); i++) {"
" for(j=0; j<int(particle_tex_width); j++) {"

Unless particle_tex* is relatively small (< 20, maybe), there’s almost certainly a better way to do this. Reductions can replace many instances of looping over two dimensions!

It also looks like you’re probably trying to do too much in a single fragment program. Fragment programs are meant to be small—perhaps 30 lines or less. There’s no reason you can’t have a longer one----aside from the issue you already found and a few others----but that’s not the concept they’re designed around.

See if you can break up the program into smaller stages. Maybe pre-compute some things (like the result of ElectricField?) and then bring the results together for a final result.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.