Please give me advice on Vertex Arrays

tcs · December 16, 2000, 8:26am

I never used Vertex Arrays in OGL and I need some advice
because I don’t want to implement them the wrong way.

I have a landscape that is 100% static. So I want to put
it in a single vertex buffer. The buffer never needs to
be changed, so can I lock it or so to speed it up ? I don’t
draw the whole landscape, I only draw the visible parts.
I think that passing the indices of the visible vertices is
the way to go, right ? I also need mutitexturing, I hope two
tex coords are possible.

So, that’s at least my idea. What would you say ? What’s the
fastest way to render a static landscape with changing visibility ?
And can OpenGL store the vertex data for me like D3D or do I
have to keep a separte copy for OpenGL (since my data structure isn’t
suited for direct use) ?

Please tell me what you think

Tim

Sylvain · December 17, 2000, 12:56am

Well the fastest way should be the Use of VARange extension from Nvidia…
but if you need to do a ‘real’ OpenGL implementation you should use display Lists
both of thoses mechanism are cool for static data.
Of course, display lists are not as speed as VAR…
Concerning classic vertexArrays, you can not expect spectacular gain comparing to display lists moreover you always need to keep a copy of the geomtry when using arrays… so… Make your choice… VAR or Display lists! or in other terms… NV specific or OpenGL 1.1 -> 1.2 compatible.

tcs · December 17, 2000, 2:56am

Sorry but if you had read my post you would know that display lists are NOT an option because I have static geometrie but I don’t want to draw 100% of it all the time. And why do I need this nVidia extensions ? Simply passing the array and locking it ensures fast speeds and reuse of vertices.

Sylvain · December 17, 2000, 6:04am

huh??

tcs · December 17, 2000, 9:06am

It doesn’t look like you understand my question or how vertex buffers work…

system · December 17, 2000, 9:40am

The problem with glVertexPointer() and its
friends is that there is no good locking
semantics. Further, because the application
allocates the memory for glVertexPointer()
in the normal case, transfer to a hardware
transform/lighting card will sometimes be
sub-optimal, because memory allocated with
malloc() (or even just globals) will neither
be physically contiguous nor in easily
accessible AGP memory.

These are exactly the problems that the
vertex buffer memory allocation and fence
extesions are intended to solve. Use them if
they are available and it makes sense.
However, dollars to donuts that geometry
upload is not your main limiting factor.
Just using glVertexPointer() and friends will
probably remove most of the transfer
bottleneck, and you could move on to trying
to minimize texture fill and speed up your
AI/physics/whatever is operating on the data
in the first place.

If your geometry is truly static, you may be
able to use the compiled vertex array
extension to put the data in AGP memory or
on the card, without having to manage the
memory yourself. However, the use notes for
LockArrayEXT and UnlockArrayEXT say that any
state change (glTranslate(), glRotate() etc)
could cause the locked data to be invalidated
and thus the locking might be less efficient.

Also, it states that using glDrawElement()
and friends on a range outside of that which
is locked is undefined. Further, you cannot
use more than one locked array at a time, so
if you’re drawing anything other than your
static geometry, you’ll have to unlock to
draw those other things anyway.

[This message has been edited by bgl (edited 12-17-2000).]

tcs · December 17, 2000, 10:23am

Thanks for this info…

I’m pretty confused. I used to pass the data with ultra-slow glVertex3fv. Now I’m using compiled vertex arrays, and it’s not a single frame faster !!! I define the arrays, upload them and lock them. Then I use my Octree every frame and pass the indices of the visible triangles to opengl. Why is it THAT slow ? And why do I get 15FPS in a 10K triangle scene on a GeForce DDR ?

I’m 100% T&L bound, textures have no effect. I ran out of ideas, I know that it should be a hell lot faster. Brute-force rendering everything into a display list is much faster than using the Octree, strage…

any ideas ??

zed · December 17, 2000, 10:40am

"And why do I get 15FPS in a 10K triangle scene on a GeForce DDR ? "

im doing better than that with my vanta so it
sounds like you’re either doing something majorlly wrong or youre cpu bound im guessing the later. try commenting out a lot of the calculation the cpu does each frame and see if that makes a major jump in speed also check the nvidia site theres a performance faq for the geforce there

AndersO · December 17, 2000, 10:41am

I’m thinking about drawing some landscape too… It’s static and so on. I was thinking about just using display lists containing triangle strips.

Chop up the whole landscape into tiles, build one display list for each tile (one tile containing several triangle strips), calculate a bounding sphere for each tile. Draw only tiles wich are in frustum…

The triangle strips themself are built up from irregular polygons (no heightfield/grid). Pre-optimized so to speak. You could have several lods of each tile, there will be problems with cracks in the seams between the tiles, working on that…

What do you think about that?

Cheers

tcs · December 17, 2000, 10:52am

zed: Yes, I know this sounds stupid but I’m 100% T&L bound. My CPU is an Athlon 700. This is really strange. But I guess my system is broken anyway, Quake3 is only half as fast as a few month ago. I reinstalled my system, tried everything. I guess the board ios broke. But anyway, I got the same performance (realtive to the HW) on my Atlon 850 with a GeForce 2.

AnsersO. I already did something like that, it’s fast like hell especially because GeForce ICDs perform bounding-box culling for all display lists before renderinmg them ;-))) Try it out !

But that is no option for me…

Tim

btw: is glDrawElements() or glDrawArrays() faster ?

tcs · December 17, 2000, 10:57am

In addition to my last question, I have to say that I only draw with one primitive, triangles. My vertex array contains only the gridpoints, so with idexing I’m getting the maximum triangles with the minum of vertices.

AndersO · December 17, 2000, 1:12pm

Well… As zed mention, the faq at nvidia is good. There’s tables comparing the speed of different methods of sending geometry to the card… In general I would guess it applys to other cards than nvidias too…

According to it gldrawelements is faster then gldrawarrays, IF yor geometry shares vertices.

tcs · December 17, 2000, 1:38pm

Since we’re talking about a heightmap terrain, it shares a hell lot of vertices. I already tried it, the element call is faster.

system · December 17, 2000, 4:10pm

How about NOT locking the arrays? The spec
is ambiguous on this point, but I think a
lock may not be valid between calls to flip
the screen or clearing or other flushing
calls anyway.

Try this:
For each frame:
call glVertexPointer() and friends to set up
your arrays.
Walk through your geometry, deciding which
vertexes to draw, putting the index of those
vertexes in a big array you allocate with
malloc() or globally.
Then call glDrawElements() (or possibly
glDrawRangeElements()) on that array.

No locking. This should be close to as fast
as the card can go; the only inefficiency
being that your data is not in AGP or card
memory, and whatever is in the code that
decides which vertexes to draw.

You might also want to time your code for
setting up the index array; perhaps your LOD
algorithm or culling is actually slowing
things down… Let us know how this works
out.

tcs · December 18, 2000, 2:43am

Hello !

I tried to lock once, (un)lock before/after each frame and don’t lock at all. Doesn’t affect the performance very much, but DrawElements is faster when I use any form of locking.

I set up my arrays and lock them at the startup. My other geometrie is just a sun and a skybox, no need for vertex arrays. My other high-poly obhect is a water mesh wich is rendered perfectly fast as a series of triangle strips. So I can leave my terrain arrays selected & locked all the time. I found it is a speedup to let my Octree gather all indices that are visible and then drawing all with a single DrawElement() call on the locked arrays.

I also thought my Octree is a bit to slow. But it’s that is not the problem. I tried to brute-force everything in a for loop and then compared the FPS with a situation where the terrain is 100% visible. The Octree wasn’t even a frame slower.

What really makes me wonder is the polygon/FPS relationship. ´The visible polycount can drop from 10K to 5K and I only get 20% more FPS. Even without textures and lighting and the water it’s a bit slow. My engine is going open source for people to learn, I don’t want to be a bad example

I thought about a mesh optimization algorithm that culls away useless traingles, or maybe I can create tri strips at runtime ?

Any ideas how to implement such speedups ?

Tim