I was just wondering: is there a big speed improvement in using NV_vertex_array_range when using not so many vertices ?
I render many spheres, each cca. 200 vertices. So I load all vertices for a sphere up in video/agp mem, and then render them cca. 100 times (by modifying model view matrix). I don’t see any speed improvement when using the extension, it’s the same as using local memory (I checked everything, range is valid, memory is allocated etc. And I use DrawElements to render.)
The “stride” is not the same thing as
“pitch”. If your vertex data or your normal
data is tightly packed (no extra padding)
the “stride” should be 0. If you add one
float worth of padding, the “stride” should
be 4. Minimizing padding is good because it
means minimizing the amount of data that
needs to be transferred.
Perhaps you’re already doing this, and the
extra padding you’re specifying for the
vertexes is the normals, and vice versa? If
so, why isn’t the padding in the second case
20? (the point presumably being to make each
data item sit on a 16-byte boundary).
but btw.
you have to realize that in real applications it’s really hard to maximize the efficienctywith that extension.
esp when you have
dynamic geometry, level geometry, and multiple textures involved.
it really boils down to a fussing game and it’s not worth it …
That all depends. For example, if you have
NV_vertex_program in hardware (slobber,
slobber), you can do up to 16-matrix skinning
in hardware, using static (on-card) vertex
buffers.
I also think you’re missing one of the points
of using AGP memory for vertex transfer. If
you’re doing any kind of processing, or just
looking at and selecting vertexes, then the
vertexes you want to render have to go
somewhere. Make that “somewhere” be AGP
memory, written sequentially, even though
it’s only data for one frame, and your
throughput will be higher than if you wrote
the selected items (or a list of them) to
regular memory, and the driver then had to
do a second copy for you.
You can even set up your transfer pipeline as
a cyclic buffer (a la typical sound hardware)
by using the fence extension.
Well, well ! It seems that you guys are a little confused, too. Your answers are pretty evazive.
bgl: I never intended to use stride. That 3rd parameters confuses me a little. I just tried to do what John Carmack said he did. On the other hand, if I use 0 instead of 12 or 16 the result is the same.
kaber0111: I don’t agree with you. I have only static data. I precalculate all the vertices I need and I put them in a vector. Then, if the card supports the NV_vertex_array_range extension, I allocate AGP/video memory and copy the vertices to that new location, and then I render only from there.
Yes, I have multitexture, so what ? I set the texture I need, then render the vertices which use that texture, and so on.
“Level geometry” also changes, so I just need to recalculate the vertices and write them again in that vector - but this happens once every 100 … +INF frames ! (it’s user input dependant)
bgl: See the answer above. I know exactly what I’m doing with the vertices. And fence will complicate things, I think … but I’ll try it, anyway.
My question was: why do I have to use both this extension and display lists to get the fastest speed ?
I tried to also lock the data - no more speed improvement. Didn’t have time to try copying indexes in AGP, too ( been busy XMas shopping )
Please check out the VAR/fence whitepaper on the NVIDIA web site. It has a number of caveats regarding the use of VAR and/or fence. Please make sure that you’re not in one
of the conditions it warns against.