VAR and CVA

Hi

is it possible to use VAR and CVA together? The VAR spec says they should be independent and not interfering with each other, does this mean, they cannot be used at the same time?

CVA uses a vertex cache (i think 8 vertices get cached). Does VAR use a vertex cache, too, so when i use one vertex several times, i get a speed increase?
And how does this cache work. Does it just store the last 8 vertices and kick the vertex out, which was in at first (like a queue), or does it check wether a vertex was used several times, and doesn´t kick it out, even if that particular vertex has been in the cache longer than all the others?

I need these information to be able to optimize my “engine” as much as possible.

Thanks in advance!
Jan.

The vertex cache has nothing to do with CVA, or rather not much. As long as you use indexed geometry (preferably glDrawRangeElements) you should ´be able to take advantage of the vertex cache. The card has a small simple FIFO cache that stores transformed vertices. By submitting your geometry with high locality, i.e triangles that are close in the mesh are submitted close to eachother in the vertex stream, you get maximum use out of this cache. This is why it helps to submit triangles in triangle strip order even if you submit them as triangles, it increases locality. I highly doubt you’ll get any kind of speedup from CVAs if you’re using VAR already.

[This message has been edited by harsman (edited 02-03-2003).]

CVA doesn’t make (much) sense in the context of VAR.

I’d suggest using VAR + DrawRangeElements() where available, else the ATI_vertex_object extension if available, else CVA/DrawElements().

But if i understand the specs correctly, than VAR and CVA don´t have so much in commen.
VAR allows pulling data asyncronously. CVA computes the vertices once and is therefore usefull with shared vertices and multipass rendering.
At the momement i use VAR. However i want to do some stuff i need to use multipass rendering for, so if i also enable CVA and VAR and render my scenes 2 or more times, this should be faster than only with VAR.

Or have i understood CVA totally wrong?

Therefore i´d like to have a clear answer: Is it allowed/possible to use VAR and CVA at the same time?

Jan.

Ok, i tested it. This is the result:

Locking an array doesn´t yield to an error when VAR is used.
However there is no speedup, at all.
Locking an array and not using VAR yields to a little speedup.

This is how i tested:
I used 33000 triangles where two third vertices were shared. I used no texturing and wireframe mode.
To maximize the reuse of vertices i rendered the scene 16 times each frame (depthfunc = GL_EQUAL). Of course i used glDrawRangeElements.

VAR (AGP mem) : 34 FPS
system memory : 36 FPS
system mem + CVA : 37 FPS

So one could say, that neither VAR nor CVA yields to the results one would expect. I think that´s strange.

Jan.

Did you enable CULL_FACE and set CullFace to FRONT_AND_BACK ? Otherwise, you’re probably fill rate limited. Or you’re locked to the refresh rate of your monitor.

When benchmarking, make sure you’re measuring (and optimizing!) the right thing.

CVA allows a SOFTWARE transform driver to do less work on the vertices. A hardware transform card will transform the vertices as it comes across them (if they’re not in the vertex cache, and being drawn using an indexed primitive). CVA can also allow a hardware transform driver to copy vertex data into an optimal format in a pre-allocated buffer that is set up for fast throughput, but typically a fast path like that will be optimized for Quake-size vertex buffers, not giant buffers like yours.

As we are on CVA, does anybody know if the limitations for CVA still exist?
I mean the limitation that only one Vertexformat(vertex 3f, color 4ub, tex0 2f, tex1 2f) gets increased performence by geforce cards.
This was covered in the Geforce optimization FAQ, but it is two years old and i didnt found anything newer on the web.
So is this limit still there or is it gone with newer drivers or newer cards?

The document exist in Googles cache: http://www.google.de/search?q=cache:BfAu…&hl=de&ie=UTF-8

But nothing on there website or cvs servers

Lars

I would expect the implementation of LockArraysEXT() for hardware transform cards to be something like:

lockStart = x;
lockCount = y;
arraysLocked = true;
arraysDirty |= allEnabledArrays;

I would expect the implementation of VertexPointer() (and all the other array specification functions) to be something like:

arraysDirty |= kVertexArray;

I would expect actual geometry issuing calls, such as DrawElements(), to do something like:

if( array_is_too_big() ) {
return slow_path();
}
copy_arrays_to_internal_AGP_memory_buffers( arraysLocked ? arraysDirty : kAllArrays );
arraysDirty = 0;
if( !arraysLocked ) {
scan_elements_for_range( elements, &x, &y );
}
poke_indices_at_card( elements, x, y );

With the caveat that hints given by Matt on this forum before seem to indicate that a Lock-free DrawElements() will expand all vertices and draw them DrawArrays()-style on the card, rather than scan the index list to find min/max, on GeForce cards.