Help with NV_vertex_array_range && EXT_compiled_vertex_array

I was just wondering: is there a big speed improvement in using NV_vertex_array_range when using not so many vertices ?

I render many spheres, each cca. 200 vertices. So I load all vertices for a sphere up in video/agp mem, and then render them cca. 100 times (by modifying model view matrix). I don’t see any speed improvement when using the extension, it’s the same as using local memory (I checked everything, range is valid, memory is allocated etc. And I use DrawElements to render.)

Same question for EXT_compiled_vertex_array.

Memo

This should definitely be the fastest way to get geometry to the GPU – BUT if geometry transfer is not the bottlneck, then it won’t matter at all.

Cass

I found out the following:

(context: video memory allocated with wglAllocateMemoryNV; vertcies, normals & texture coords copied there; vertex range valid)

  1. If I use glVertexPoiter(3,GL_FLOAT,12,ptr) I get 42 FPS.

  2. If I use glVertexPoiter(3, GL_FLOAT, 16,ptr) I also get 42 FPS (normals also padded).

  3. If I put the call to glDrawElements in a display list and then call it, I get 56 FPS.

As far as I know, display lists are compiled in local memory, the vertex data being immediately defered. Why is it faster ?

Second question: what if I put the indices for glDrawElements call also in Video/AGP memory ? (gotta try this tonight)

Third question: does it make sense to also glLock the vertex data ?

Memo

The “stride” is not the same thing as
“pitch”. If your vertex data or your normal
data is tightly packed (no extra padding)
the “stride” should be 0. If you add one
float worth of padding, the “stride” should
be 4. Minimizing padding is good because it
means minimizing the amount of data that
needs to be transferred.

Perhaps you’re already doing this, and the
extra padding you’re specifying for the
vertexes is the normals, and vice versa? If
so, why isn’t the padding in the second case
20? (the point presumably being to make each
data item sit on a 16-byte boundary).

but btw.
you have to realize that in real applications it’s really hard to maximize the efficienctywith that extension.
esp when you have
dynamic geometry, level geometry, and multiple textures involved.
it really boils down to a fussing game and it’s not worth it …

laterz,
akbar A.
;vertexabuse.cjb.net

That all depends. For example, if you have
NV_vertex_program in hardware (slobber,
slobber), you can do up to 16-matrix skinning
in hardware, using static (on-card) vertex
buffers.

I also think you’re missing one of the points
of using AGP memory for vertex transfer. If
you’re doing any kind of processing, or just
looking at and selecting vertexes, then the
vertexes you want to render have to go
somewhere. Make that “somewhere” be AGP
memory, written sequentially, even though
it’s only data for one frame, and your
throughput will be higher than if you wrote
the selected items (or a list of them) to
regular memory, and the driver then had to
do a second copy for you.

You can even set up your transfer pipeline as
a cyclic buffer (a la typical sound hardware)
by using the fence extension.

Well, well ! It seems that you guys are a little confused, too. Your answers are pretty evazive.

bgl: I never intended to use stride. That 3rd parameters confuses me a little. I just tried to do what John Carmack said he did. On the other hand, if I use 0 instead of 12 or 16 the result is the same.

kaber0111: I don’t agree with you. I have only static data. I precalculate all the vertices I need and I put them in a vector. Then, if the card supports the NV_vertex_array_range extension, I allocate AGP/video memory and copy the vertices to that new location, and then I render only from there.
Yes, I have multitexture, so what ? I set the texture I need, then render the vertices which use that texture, and so on.
“Level geometry” also changes, so I just need to recalculate the vertices and write them again in that vector - but this happens once every 100 … +INF frames ! (it’s user input dependant)

bgl: See the answer above. I know exactly what I’m doing with the vertices. And fence will complicate things, I think … but I’ll try it, anyway.

My question was: why do I have to use both this extension and display lists to get the fastest speed ?

I tried to also lock the data - no more speed improvement. Didn’t have time to try copying indexes in AGP, too ( been busy XMas shopping :wink: )

Memo,

Please check out the VAR/fence whitepaper on the NVIDIA web site. It has a number of caveats regarding the use of VAR and/or fence. Please make sure that you’re not in one
of the conditions it warns against.

Also, what hardware/OS are you using?

Thanks -
Cass

cass: maybe you’re right, I’ll check it out. Can you provide a link to it ?

HW:
PIII 600
Intel 820 w/ SDRAM
GeForce II GTS (Hercules bla bla 64M DDRAM)
W98 / W2K

Here it is:
http://www.nvidia.com/Marketing/Developer/DevRel.nsf/pages/D1C924B3E02A1F9B8825692E007FE245

You can even choose between Word and Adobe !

Regards.

Eric

Thanks !