PDA

View Full Version : Bizarre bug?



gmeed
12-20-2004, 08:51 PM
Certain things in my frament shader make my program look like it's limited by the vertex shader!
First, here's the code:
vertex shader:
const float nrmTexCoordScale = 512.0;
void main(void)
{
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
gl_TexCoord[0] = vec4(gl_Vertex.x * nrmTexCoordScale, gl_Vertex.y * nrmTexCoordScale, 1.0, 1.0);
} and the fragment shader: (the lighting equation is just bs now)
uniform sampler2D nrmTex;

void main(void)
{
const vec3 nrm = 2.0 * (texture2D(nrmTex, vec2(gl_TexCoord[0])).rgb - 0.5);
const vec3 light = vec3(1.0, 0.0, 0.0);
const float diffuse = dot(nrm, light);
gl_FragColor = vec4(vec3(diffuse), 1.0);
} As is, it runs at about 70fps on my GF5900 in linux with 6629 drivers (latest public). It's completely vertex-bound. Now the bizarre part: if I change the first line of my fragment shader and remove either one of the math operations (*2.0 or -0.5), the speed skyrockets: about 50x faster.

How could adding one simple math op to the fragment shader make my program vertex-bound?

V-man
12-21-2004, 08:00 AM
Originally posted by gmeed:
How could adding one simple math op to the fragment shader make my program vertex-bound?If you change your fragment shader and you see a performance difference, then it could mean you are fragment bound.

In your fragment shader, you are assigning a non-constant to constant variables. I think this is an error.

Turn on strict GLSL spec.

gmeed
12-21-2004, 08:37 AM
Originally posted by V-man:

Originally posted by gmeed:
How could adding one simple math op to the fragment shader make my program vertex-bound?If you change your fragment shader and you see a performance difference, then it could mean you are fragment bound.

In your fragment shader, you are assigning a non-constant to constant variables. I think this is an error.

Turn on strict GLSL spec.Thanks for the const tip. I changed that.

About your first comment, though.
I think it's vertex-limited when it's acting really slow because the window size has no effect on the rendering, but changing the number of vertices while keeping the number of fragments constant does affect the framerate.

I just noticed something else that confuses me even more: if the object is outside the viewport (not visible), the framerate goes back up to what it is when the shaders aren't doing the buggy-slow thing. This seems to suggest that it's not vertex-limited, right? But then why does performance scale with the number of vertices when the object _is_ visible?

gmeed
12-21-2004, 09:09 AM
Another update: 3DLabs' glsl verifier doesn't detect anything wrong with the code after I removed the consts.

gmeed
12-21-2004, 10:35 AM
... and another update: I get this behavior even when I'm not using a vertex program.

gdewan
12-21-2004, 12:25 PM
You may want to use nvshaderperf to run an analysis on your program.

Get nvshaderperf here (http://developer.nvidia.com/object/nvshaderperf_home.html)

gmeed
12-21-2004, 02:23 PM
I've been hacking away at this problem for a while, and now I at least have a better idea of what the problem is, so I'll try describing it again. In trying to narrow it down I've eliminated my vertex shader, and I'm just using the fixed function hardware for that. Also, I'm just using one, basic pixel shader, and nvshaderperf confirms that it's as simple and fast as it looks.

I assumed that my rendering speed would be the minimum of the speed of drawing the bare unshaded geometry, and the speed of shading extremely simplified geometry. This assumption isn't even close the being true. (And I've verified that I'm not agp-limited. Everything's on the card.)

Here are the numbers:
simple geometry with normal texture decal'd on: 1000fps.
simple geometry with shader that uses normal tex: 450fps.

full geometry with a normal texture decal'd on: 200fps.
full geometry with shader that uses normal tex: 50fps.

The thing I don't understand is the last number. The geometry can be drawn at 200fps, and my shader can shade that area at 450fps, so why in the world am I suddenly running at 50fps??! Shouldn't it be about 200fps? I realize it might be a little slower if the vertex and fragment stages are fighting for on-card bandwidth, but nothing like this.

Another interesting thing is that when I'm drawing the full geometry with my shader, and the object isn't visible, the fps goes up to the 200 that I get without the shader, so it looks like it really is fragment-limited.

The only explanation I can think of is that some of the math hardware (ie-not memory) on the card is shared between the vertex and pixel pipelines, but that's not the case on my nv35.

I've duplicated the problem on windows with the 67.02 drivers.

spasi
12-21-2004, 11:55 PM
Originally posted by gmeed:
and my shader can shade that area at 450fpsAre you sure it's the same area? Maybe there is a lot of overdraw when using the full geometry?

gmeed
12-22-2004, 08:26 AM
Originally posted by spasi:

Originally posted by gmeed:
and my shader can shade that area at 450fpsAre you sure it's the same area? Maybe there is a lot of overdraw when using the full geometry?Yep, the area's the same. It's pretty easy to control since I'm just drawing a terrain on a regular grid. The only thing that's not straight triangle strips are some zero-area triangles that go from the end of one row to the beginning of the next, but those should be dropped.

I'm also viewing it from directly above for benchmarking, so nothing's in front of anything else.

jwatte
12-23-2004, 07:38 PM
The GeForce 5000 series is terrible at GLSL performance, because it has to resort to full-precision floating point operations for many shaders. It may be, for example, that your fragment shader change causes such degradation. NVIDIA may have an option to use lower-precision intermediates in the shader; try turning that on.

gmeed
12-24-2004, 09:28 AM
Originally posted by jwatte:
The GeForce 5000 series is terrible at GLSL performance, because it has to resort to full-precision floating point operations for many shaders. It may be, for example, that your fragment shader change causes such degradation. NVIDIA may have an option to use lower-precision intermediates in the shader; try turning that on.Thanks for the reply. I tried using half-precision, but that didn't fix it. If you look at the numbers I posted a few replies ago, I've verified that my shader can do 450fps, and the unshaded geometry can be drawn at 200 fps. So each part of the system can handle at least 4x the performance I'm seeing. I'm just trying to figure out what's going on when I combine them that's causing me to get only 50fps. Basically, I can draw detailed geometry, or a nicely-shaded plane, but not both at the same time.