Bizarre bug?

Certain things in my frament shader make my program look like it’s limited by the vertex shader!
First, here’s the code:
vertex shader:

const float nrmTexCoordScale = 512.0;
void main(void)
{
	gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
	gl_TexCoord[0] = vec4(gl_Vertex.x * nrmTexCoordScale, gl_Vertex.y * nrmTexCoordScale, 1.0, 1.0);
} 

and the fragment shader: (the lighting equation is just bs now)

uniform sampler2D	nrmTex;

void main(void)
{
	const vec3 nrm = 2.0 * (texture2D(nrmTex, vec2(gl_TexCoord[0])).rgb - 0.5);
	const vec3 light = vec3(1.0, 0.0, 0.0);
	const float diffuse = dot(nrm, light);
	gl_FragColor = vec4(vec3(diffuse), 1.0);
} 

As is, it runs at about 70fps on my GF5900 in linux with 6629 drivers (latest public). It’s completely vertex-bound. Now the bizarre part: if I change the first line of my fragment shader and remove either one of the math operations (*2.0 or -0.5), the speed skyrockets: about 50x faster.

How could adding one simple math op to the fragment shader make my program vertex-bound?

Originally posted by gmeed:
How could adding one simple math op to the fragment shader make my program vertex-bound?
If you change your fragment shader and you see a performance difference, then it could mean you are fragment bound.

In your fragment shader, you are assigning a non-constant to constant variables. I think this is an error.

Turn on strict GLSL spec.

Originally posted by V-man:
[b] [quote]Originally posted by gmeed:
How could adding one simple math op to the fragment shader make my program vertex-bound?
If you change your fragment shader and you see a performance difference, then it could mean you are fragment bound.

In your fragment shader, you are assigning a non-constant to constant variables. I think this is an error.

Turn on strict GLSL spec.[/b][/QUOTE]Thanks for the const tip. I changed that.

About your first comment, though.
I think it’s vertex-limited when it’s acting really slow because the window size has no effect on the rendering, but changing the number of vertices while keeping the number of fragments constant does affect the framerate.

I just noticed something else that confuses me even more: if the object is outside the viewport (not visible), the framerate goes back up to what it is when the shaders aren’t doing the buggy-slow thing. This seems to suggest that it’s not vertex-limited, right? But then why does performance scale with the number of vertices when the object is visible?

Another update: 3DLabs’ glsl verifier doesn’t detect anything wrong with the code after I removed the consts.

… and another update: I get this behavior even when I’m not using a vertex program.

You may want to use nvshaderperf to run an analysis on your program.

Get nvshaderperf here

I’ve been hacking away at this problem for a while, and now I at least have a better idea of what the problem is, so I’ll try describing it again. In trying to narrow it down I’ve eliminated my vertex shader, and I’m just using the fixed function hardware for that. Also, I’m just using one, basic pixel shader, and nvshaderperf confirms that it’s as simple and fast as it looks.

I assumed that my rendering speed would be the minimum of the speed of drawing the bare unshaded geometry, and the speed of shading extremely simplified geometry. This assumption isn’t even close the being true. (And I’ve verified that I’m not agp-limited. Everything’s on the card.)

Here are the numbers:
simple geometry with normal texture decal’d on: 1000fps.
simple geometry with shader that uses normal tex: 450fps.

full geometry with a normal texture decal’d on: 200fps.
full geometry with shader that uses normal tex: 50fps.

The thing I don’t understand is the last number. The geometry can be drawn at 200fps, and my shader can shade that area at 450fps, so why in the world am I suddenly running at 50fps??! Shouldn’t it be about 200fps? I realize it might be a little slower if the vertex and fragment stages are fighting for on-card bandwidth, but nothing like this.

Another interesting thing is that when I’m drawing the full geometry with my shader, and the object isn’t visible, the fps goes up to the 200 that I get without the shader, so it looks like it really is fragment-limited.

The only explanation I can think of is that some of the math hardware (ie-not memory) on the card is shared between the vertex and pixel pipelines, but that’s not the case on my nv35.

I’ve duplicated the problem on windows with the 67.02 drivers.

Originally posted by gmeed:
and my shader can shade that area at 450fps
Are you sure it’s the same area? Maybe there is a lot of overdraw when using the full geometry?

Originally posted by spasi:
[quote]Originally posted by gmeed:
and my shader can shade that area at 450fps
Are you sure it’s the same area? Maybe there is a lot of overdraw when using the full geometry?
[/QUOTE]Yep, the area’s the same. It’s pretty easy to control since I’m just drawing a terrain on a regular grid. The only thing that’s not straight triangle strips are some zero-area triangles that go from the end of one row to the beginning of the next, but those should be dropped.

I’m also viewing it from directly above for benchmarking, so nothing’s in front of anything else.

The GeForce 5000 series is terrible at GLSL performance, because it has to resort to full-precision floating point operations for many shaders. It may be, for example, that your fragment shader change causes such degradation. NVIDIA may have an option to use lower-precision intermediates in the shader; try turning that on.

Originally posted by jwatte:
The GeForce 5000 series is terrible at GLSL performance, because it has to resort to full-precision floating point operations for many shaders. It may be, for example, that your fragment shader change causes such degradation. NVIDIA may have an option to use lower-precision intermediates in the shader; try turning that on.
Thanks for the reply. I tried using half-precision, but that didn’t fix it. If you look at the numbers I posted a few replies ago, I’ve verified that my shader can do 450fps, and the unshaded geometry can be drawn at 200 fps. So each part of the system can handle at least 4x the performance I’m seeing. I’m just trying to figure out what’s going on when I combine them that’s causing me to get only 50fps. Basically, I can draw detailed geometry, or a nicely-shaded plane, but not both at the same time.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.