Vertex programs vs CVAs and T&L

Ysaneya · March 6, 2001, 1:11pm

Damn, my previous post wasn’t posted, dunno why.

Anyway, here it goes:

Vertex programs are becoming pretty popular recently. I was wondering if it is still possible to transform the vertices once in a multi-pass environment ( like with CVAs ). If yes, how ? Are there new extensions for that or what ?

I am also wondering what is the speed hit when using a standard vertex program ( to do transform & lighting without any special effect ) ? Ie, is it as fast as the T&L we all know, or slower ? And finally, if i have 8 lights in my scene, will i have to make the 8 lighting calculations in the vertex shader myself ? Is there a speed hit too, compared to standard opengl lighting ?

I’m so curious

Y.

Ysaneya · March 6, 2001, 1:13pm

Oh yes, a precision: i’m interested to know the answer for HARDWARE vertex programs

Y.

cass · March 6, 2001, 2:24pm

Ysaneya,

Hardware T&L (both fixed function and vertex program) always performs T&L on each pass.

Lighting-heavy T&L will usually be faster with the fixed function pipe than T&L if you do the same math. If you make significant simplifications (like light in object space) you may be able to beat fixed function lighting in some cases.

If you use vertex programs, you’re responsible for coloring the vertex - period. If that means performing 8 separate lighting calculations and accumulating them into o[COL0] and o[COL1], then that’s what you have to do. If you are lighting with 8 lights, then fixed function T&L will probably be faster.

What you want to use vertex programs for is things that the fixed function pipe won’t do. Strange texgen modes, bump mapping setup, skinning. Those kinds of things you’ll get a real win on, because there’s no other way to do it in hardware.

The precision is 32-bit IEEE float for hardware vertex programs.

Thanks -
Cass

Ysaneya · March 6, 2001, 2:59pm

> Hardware T&L (both fixed function and vertex program) always performs T&L on each pass.

Are you telling me that using CVAs on a hardware T&L capable card won’t speed up my program when using many pass with the same vertex data ?

I never heard of a such thing before… are you sure of that ?

Thanks for the other answers I understand that i should use vertex shaders only for special effects, but there might be some cases where i’d like to perform, for example, a custom texture generation, and yet use a standard transform and a standard lighting calculation. What would have been interesting is a vertex instruction telling OpenGL to use the standard transform function, and another instruction to tell OpenGL to use the standard lighting function… instead of rewriting yourself transform & lighting every time ( which might be slower for the same result ). Just a thought

Y.

cass · March 6, 2001, 6:07pm

Ysaneya,

Yes, I’m sure. CVAs speed up hardware T&L (on NVIDIA hardware) by optimizing vertex transfer – not by pre-transforming the vertices. You get the best performance, though, by using NV_vertex_array_range which allows you to optimize vertex transfer yourself.

I see your point about letting vertex programs do some operations, but leaving others unaltered. This would make the vertex program execution model less clean, though, and introduce baggage that we may not really want or need in the future.

Thanks -
Cass

mcraighead · March 6, 2001, 7:30pm

To clarify…

You will not get multipass T&L reuse, because there is nowhere to store the results.

However, you do get T&L reuse within a single pass because of the vertex cache. If you use an index several times in succession, the post-transform version of the vertex will still be available and will be used rather than recomputing.

Matt

Won · March 6, 2001, 8:05pm

This is true even for vertex programs? Does the post-transform vertex cache also work for vertex programs?

-Won

cass · March 6, 2001, 9:47pm

Won,

Yes - the post-transform vertex cache works for vertex programs too. You always want to maximize vertex reuse. For GeForce hardware “strips of short tri/quad strips” gives you the best vertex cache utilization.

Cass