Maximum GeForces

Fundamentally, I’m a performance hog. I’ll look at extensions like NV_register_combiner or NV_vertex_program and imagine what I might do with them. But the ones I’ll actually ever use are extensions like NV_vertex_array_range and NV_fence: extensions whose sole purpose is to enhance the performance of the application.

So, I’ve got quite a few questions about these performance enhancing extensions. Specificly, I’d like to know about their behavior on NV10/15 level hardware. Also, assume that the bottleneck is on the operations in question.

  1. NV_fence sounds like a great idea, on paper. Coupled with NV_vertex_array_range, you can tell the renderer to render these polygons at the same time as you go off and do something else. You can use the fence to tell when the renderer is getting close to finishing, so you can queue up another batch.

This is all theory, which requires glTestFenceNV to be fast. So, should code be written expecting glTestFenceNV to be fast (where it might be called dozens or perhaps hundreds of times per frame)? If not, the purpose of NV_fence seems diminished, making it useful only for checking to see if you can overwrite vertex array range memory.

  1. Is video memory appreciably faster than AGP memory? Obviously, allocating 8MB of video memory could kill your application’s texture performance, but is the loss of that video memory worth the gain in polygon throughput (assuming that your textures still all fit in video memory).

  2. Is there a significant performance benifit to using glDrawRangeElementsEXT?

  3. In the new NVIDIA spec, there is a change in the allowable formats for NV_vertex_array_range. Obviously, these are avaliable only in the newest drivers, but are these new formats avaliable for NV10&15 level hardware or is it restricted to NV20 and above?

  4. Is there a dramatic performance difference between optimized NV_vertex_array_range use (in video memory) and display list use for static data?

  5. Assuming that data copying is sequential (to AGP vertex_array_range memory), is there a significant performance difference between dynamic and static data?

  6. Does glEnabling/Disabling vertex_array_range cause a significant performance hit (I know that actually changing the vertex_array_range memory is a bad idea, but this is different)?

  7. On a slightly different subject, I know that the cube mapping tex-gen modes are quite slow. Have the new drivers increased the speed of these modes?

I know its a lot to answer, but I’ve been wondering about this for some time now. I would appreciate any clarification anyone might be able to give.