_dramatically_ less frames with GL_ARB_vertex_program than with standard code !?!

hello, it’s me again…

Now i have tested my GL_ARB_vertex_program code on several machines; the “hardware-pool” differs from a Geforce2-MX2 up to a Geforce3-Ti500 (note: after the installation of the actual nVidia drivers, the GL_ARB_vertex_program-extension is supported by all of this cards!!!)
The contains about 10000 tri’s, drawn with standard function calls (the code is unoptimized):

  1. on the Geforce2-MX i get around 60fps without the usage of GL_ARB_vertex_program

  2. but with the enabled vertex-program, which code does nothing else than multiplying the vertices by the actual matrix and assigning the texture-coordinates, i get only 40fps !!!
    Is this the normal case ??

  3. On the Geforce3-Ti500, i get around 130 fps without the usage of GL_ARB_vertex_program

  4. but, again, with enabled vertex-program the framerate falls back to a value around 120fps. (as you can see: the leak is not as big as it is when running with the Geforce2-MX card)

Is a framerate collapse in this dimensions normal ?

The GeForce2 does not have hardware acceleration for vertex programs, so yes – it’s normal that it runs slow. On GeForce3 the vertex program does run in hardware, so it’s fast. Vertex program performance depends on the length of your program, but it’s not unlikely that even a very short program would run slower than the (highly optimized) fixed-function pipeline.

– Tom

To be honest, I think that 40fps is very good for your GeForce2 MX card ! Software vertex programs can be much worse than that. I guess you have a good CPU.

@Tom Nydens:

>>short program would run slower than the
>>(highly optimized) fixed-function pipeline.

you mean, that running with vertex-programs is always “connected” with a small lack of performance in relation to running the same code with the FF-pipeline ?? But, apart from the “free programmability”, what are the other advantages of hardware vertex programs ? Do i have more gpu-“capacity” while rendering big masses of poly’s ? As i’ve said, the vertex program is very short - OK, perhaps i’m doing something _really_worse_really_wrong - but the program outputs is that, what it should be (the program only transformes the vertices and applys the texturecoordinates).
Is your hint, to use the FF-pipeline, whenever it’s possible ?? I thought of a small performance lack, too, but i assumed a “really small lack” of 2-3fps, or something around this !

@vincoof:

>>good CPU…

mmmh, i’m running on a 133Mhz-FSB PentiumIII-1GHz System.

I have a GF 2 Ti and i use a vertex-program (Cg) to have some animated water (it swaps around and the texcoords get changed).

However even with a lot of water (around 50000 quads) there is no speed difference with the vertex program enabled or disabled.
(it may be 5 FPS slower, but not more)

I don´t know about the GF mx, but there should be not such a big difference.

Jan.

What do you expect?

With the MX, you’re effectively disabling hardware transformation.
With the Geforce 3 you take what? A smaller than ten percent hit.
Besides, vertex programs were never meant for trivial cases. Try at least lerping some key frames before jumping to conclusions.

@zeckensack:

>>vertex programs were never meant for
>>trivial cases.

mmh, ok - this sounds plausibly; i thought, that using vertexprograms/fragmentprograms at each transformation/rasterization whould be a good way and additionally that it does not decrease the performance.
May be, it could be good, not to use this stuff all the time; i haven’t play around with this features as much as my experience is enough, to give a qualified statement.

>>Try at least lerping some
Sorry, but please explain me, what do you exactly mean with “lerping some key frames” ?? Perhaps i’m thinking wrong on interpreting this sentence, but i have no idea what you mean.

@all:
i will post the source of the vertex program tomorrow - then you can take a look at it (and possibly tell me what is wrong)

Sorry, but please explain me, what do you exactly mean with “lerping some key frames” ??

One of the most popular features that vertex programs/shaders offer is to interpolate a mesh using two meshes into th GPU instead of CPU. In other words, “lerping keyframes”.
Such vertex program shows very well the performance gain.
Without such “complex” program, you won’t notice any significant gain IMO.

Yup

Lerping keyframes = linearly interpolating between two phases of a keyframe animation (eg Quake models)
Sorry for the fuzzy talk.

The benefit of doing this with vertex programs does not really come from computational power, but because it allows you to do it on the fly from graphics memory (=huge bandwith to begin with) without storing the result anywhere. That would likely be wasted bandwith anyway, because you regularly need the result only once, for the current frame. Your animation weight will have changed when you render the next frame.

There’s the exception of multipassing over animated geometry. I’ll go out on a limb and say storing the result requires so much additional memory management hassle that it’s not worth the potential savings in calculation.

Without vertex programs:
Read keyframe one (system memory)
Read keyframe two (system memory)
interpolate (CPU)
store to graphics memory
fetch from graphics memory, transform and render (GPU)

With vertex programs:
Read keyframe one (graphics memory)
Read keyframe two (graphics memory)
interpolate, transform and render (GPU)

[This message has been edited by zeckensack (edited 02-05-2003).]

How much vertex programs cost you on the CPU depends both on how fast your CPU (and main RAM!) is, and what else you might be doing on that CPU.

@zeckensack / vincoof:

>>Lerping keyframes
oh, yes - thanks for your explanation; i did know about this technique, but i call it simply “keyframe interpolation”, not “lerping keyframes” or something else(mmh, this mis understanding might result out of my crappy english).

@all:
Today i carried the source of my vertex program with me, here it is:

“!!ARBvp1.0”
“PARAM mvp[4] = { state.matrix.mvp };”
“ATTRIB pos = vertex.position;”
“DP4 result.position.x, mvp[0], pos;”
“DP4 result.position.y, mvp[1], pos;”
“DP4 result.position.z, mvp[2], pos;”
“DP4 result.position.w, mvp[3], pos;”

“MOV result.color, vertex.color;”
“MOV result.texcoord[0].x, vertex.texcoord[0].x;”
“MOV result.texcoord[0].y, vertex.texcoord[0].y;”

“END”;

as you can see: the program itself is really short - because of that this small programs works fine and does its jobs, i think there is no problem (regarding the speed); so i guess, that the dramatically speed-lack on gf2mx (or other non-hardware-GL_ARB_vertex_program-capable hardware) lies in the deficient features of this hardware boards - as all of you already mentioned.

(BTW: for applying the actual modelview matrix and texturing to an object’s vertices, this program should be correct ?? please tell me, if there is something wrong)

“lerping keyframes” is more a hacker expression than a word you’ll see in a dictionary. So, if you didn’t understand it, it did not have to do with your crappy english.

The vertex program looks fine, except the texture coordinate. If I were you I would set all texture coordinates, especially the 4th coordinate to 1. Something like :
MOV result.texcoord[0].w, 1;
Because coordinates are homogeneous, if you don’t set the “w” coordinate (called “q” for texture coord, “w” for vertex) you would get an undefined result.

Also, if I were you I would use “result.texcoord” instead of “result.texcoord[0]”, but it’s a personal preference. Actually all video cards that support vertex programs do support multitexture, afaik. So it should not be a problem.

And as a side note, I prefer using mnemonics, like :

ATTRIB iPos = vertex.position;
ATTRIB iColor = vertex.color;
ATTRIB iTex = vertex.texcoord;
OUTPUT oPos = result.position;
OUTPUT oColor = result.color;
OUTPUT oTex = result.texcoord;
DP4 oPos.x, mvp[0], iPos;
DP4 oPos.y, mvp[1], iPos;
DP4 oPos.z, mvp[2], iPos;
DP4 oPos.w, mvp[3], iPos;
MOV oColor, iColor;
MOV oTex, iTex;

Also, I could have written :

MOV oTex.x, iTex.x;
MOV oTex.y, iTex.y;
MOV oTex.w, 1;

but because it’s slower and gives the same result I don’t. I use that instead :

MOV oTex, iTex;

Or even:

MOV oTex.xy, iTex

If you only want x/y written.

Or even,

SWZ oTex, iTex, x,y,0,1

This should be 1 instruction of Radeons. On GeForce3+ it would generally require more but maybe they could optimize to a MOV in this case.