ARB_fp perfs again

I run the app on a GeForce FX 5900 Ultra, I bind a 21 instructions fragment program (3 temps, 1 TEX, 4 ALU, 2 TEX, 11 ALU, 1 TEX, 2 ALU) in 1024768.
I get 65 fps. Doing simple maths as someone explained me in here :
850 MHz * 2 / 1024
768*21 = 103

Actually the skybox is rendered before everything, so there is some overdraw. when I disable the skybox, the fps increases up to 71.5 …
Where are gone those missing 30 fps (only 1 mesh is rendered with VBO interleaved, no state changes, app is clearly pixel limited) ?

Secondary questions :

  • Why do I have to multiply GPU frequency by 2 ?
  • How could I render my skybox after everything has been drawn in a clean way (avoiding to render the skybox “just before the far clip plane”) so that no overdraw incurs ?

Thanks,
SeskaPeel.

Technically, you’re multiplying GPU frequency by 4. . . “850 * 2” You’re doing that because the NV35 has four sets of floating point processing units.

According to a relatively recent Beyond3D thread, though, each of those units can potentially perform more than just one operation per clock cycle. Supposedly, in addition to the full ALU or texture lookup, it can also perform two relativley simple ops in the same cycle: ADD, SUB, MUL, DP3, or DP4. I’m guessing, though, that these three operations must be independant rather than each dependant on the previous.

Of course, the above indicates that you should potentially be getting even better performance than you calculated. However, there’s also more than may need to be taken into account when predicting performance such as the effect of register usage (on GeforceFXs), texture filtering, vertex processing speeds, and CPU operations as well.

[This message has been edited by Ostsol (edited 10-20-2003).]

Which drivers version did you use?

I wonder whether the 52.xx wouldn’t be faster than the official 45.23 in this case.

As Ostsol said, there are a lot of variables that can influence the performance. I wouldn’t rely too much on a simple math formula to guess the performance.

glPushAttrib (GL_DEPTH_BUFFER_BIT);

glEnable (GL_DEPTH_TEST);
glDepthFunc(GL_LEQUAL);
glDepthMask (false);
glDepthRange(0.99999,1.0);

Render Skybox

glDepthRange(0,1.0f);
glPopAttrib ();

With this code you can render your skybox at the end of the frame with no overdraw at all. I use it and it works well.
To speed up the rendering of the skybox, you should tesselate it a bit (like 4*4 quads per face or so), so early z-rejection can be done more efficiently by the card.

Another advantage is, that you don´t have to use a huge skybox to make sure it is always behind all other geometry. It gets rendered everywhere, where no pixel has been drawn before. This way it can never occlude other objects.

Jan.