Available number of temporary registers exceeded

Tried in RenderMonkey, on an Radeon 9800:

Create a new project with GLSL

Vertex program:

uniform vec4 LightPositionLocal; // RenderMonkey Semantic : ViewPosition
uniform vec4 EyePositionLocal; // RenderMonkey Semantic : ViewPosition
uniform vec4 diffuse;
uniform vec4 ambient;
varying vec4 angles;
varying float dh;

void main(void)
{
const vec3 Tangent = vec3(0.0 ,0.0, 1.0);

// calculating the 2 Tangent vectors to the surface
vec3 T1 = vec3(normalize(cross(gl_Normal, Tangent)));
vec3 T2 = vec3(normalize(cross(gl_Normal, T1)));

// vector from vertex to eye, normalized
vec3 V = normalize(vec3(EyePositionLocal) - vec3(gl_Position));
vec3 L = normalize(vec3(LightPositionLocal) - vec3(gl_Position));
vec3 H = normalize(L + V);

// compute some values needed
float s = dot(H, gl_Normal);
vec3 DiP = normalize(H - gl_Normal * s);

// Compute varying values
dh = tan(acos(s));
angles = vec4( dot(L, gl_Normal), dot(V, gl_Normal), dot(T1, DiP), dot(T2, DiP));

// transform into homogeneous-clip space
gl_Position = ftransform();
}

Fragment program:

varying vec4 angles;
varying float dh;
void main(void)
{
const float pi = 3.141592653589793238462643383279502884197169399375105820974944592;
const vec4 Parameters = vec4(1.5, 0.8, 1.3, 0.15);
float wardcol = (Parameters.x/pi +
Parameters.y * (1.0/sqrt(angles.x+angles.y))
* exp(-(dhdh)((angles.zangles.z)/(Parameters.zParameters.z)+(angles.wangles.w)/(Parameters.wParameters.w)))/
(4.0piParameters.zParameters.w))(0.4 + max(0.0, angles.x));

gl_FragColor = vec4(wardcol, wardcol, wardcol, 1.0);

}

When running in RenderMonkey on ATI :

OpenGL Preview Window: Compiling vertex shader API(OpenGL) /…/Ward VS (GL)/Pass 0/Vertex Program/ … success
OpenGL Preview Window: Compiling fragment shader API(OpenGL) /…/Ward VS (GL)/Pass 0/Fragment Program/ … success
OpenGL Preview Window: Linking program … success
Link successful. The GLSL vertex shader will run in software - available number of temporary registers exceeded. The GLSL fragment shader will run in hardware.
This effect will be rendered using software rasterizer - press space bar to update rendering.

Apparently, the problem in is the vertex program.
Any idea ?

if you try in HLSL (same code, but all in the vertex program)

struct VS_OUTPUT {
float4 Position : POSITION;
float3 Color0 : COLOR0;
};

const float4x4 ModelViewProjectionMatrix;
const float4 LightPositionLocal;
const float4 EyePositionLocal;
const float4 diffuse;
const float4 ambient;

VS_OUTPUT main(
float4 gl_Position : POSITION,
float3 gl_Normal : NORMAL
)
{
VS_OUTPUT OUT;

const float3 Tangent = {0, 0, 1};
const float4 Parameters = {1.5, 0.8, 1.3, 0.15};
const float pi = 3.141592653589793238462643383279502884197169399375105820974944592;

// calculating the 2 Tangent vectors to the surface
float3 T1 = normalize(cross(gl_Normal, Tangent));
float3 T2 = normalize(cross(gl_Normal, T1));

// vector from vertex to eye, normalized
float3 V = normalize(EyePositionLocal - gl_Position);
float3 L = normalize(LightPositionLocal - gl_Position);
float3 H = normalize(L + V);

// compute some values needed
float s = dot(H, gl_Normal);
float3 DiP = normalize(H - gl_Normal * s);

// Compute varying values
float dh = tan(acos(s));
float4 angles = float4( dot(L, gl_Normal), dot(V, gl_Normal), dot(T1, DiP), dot(T2, DiP));

// here we compute the ward shading value
float wardcol = (Parameters.x/pi +
Parameters.y * (1.0/sqrt(angles.x+angles.y))
* exp(-(dhdh)((angles.zangles.z)/(Parameters.zParameters.z)+(angles.wangles.w)/(Parameters.wParameters.w)))/
(4.0piParameters.zParameters.w))(0.4+max(0.0, angles.x));

// transform into homogeneous-clip space
OUT.Color0 = wardcol;
OUT.Position = mul(ModelViewProjectionMatrix, gl_Position);
return OUT;
}

float4 main( float4 inDiffuse: COLOR0 ) : COLOR0
{
// Just display the normal as color:
return inDiffuse;
}

It works, even in profile 1.1 …

Move gl_Position = ftransform() to the top of the vertex shader and it will work just fine. It’s preferrable from a performance point of view to write position early anyway, so there’s no reason to put that at the end of the vertex shader.

Originally posted by Humus:
It’s preferrable from a performance point of view to write position early anyway
Why?

– Tom

Doesn’t the GLSL compiler re-order that automatically ?

I know that there some high level shading optimisation tips : like writing (N - 0.5) * 2.0 instead of N * 2.0 - 1.0 (1st version generate a much better code than the 2nd version in HLSL).

Also when using ftransform(), I guess it will set a similar ‘flags’ like ‘ARB_position_invariant’

Anyway I will try if the shader now compiles and link correctly.

Originally posted by Tom Nuydens:
[b]Why?

– Tom[/b]
This way the hardware can throw away a triangle early if it’s outside the frustum or backfacing (if culling is enabled). There are probably other operations that can take place early as well. I don’t know exactly what optimizations current hardware applies, but the general advice is to write position early.

Originally posted by execom_rt:
Doesn’t the GLSL compiler re-order that automatically ?
It could, but it’s not guaranteed to happen.

Ok, apparently, moving the ftransform() to the beginning of the function fixes the problem.

Weird bug apparently, i’ve forwarded the problem to devrel@ati.com, just in case.

Sadly Rendermonkey doesn’t have a performance profiler, would be interesting to see the difference.

Originally posted by execom_rt:
const float pi = 3.141592653589793238462643383279502884197169399375105820974944592;

:eek: Exaggeration due of copy-paste?

Future-proof shader for FP256 hardware. :stuck_out_tongue:

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.