Issue with (compiled) vertex arrays

Fugit · August 5, 2001, 1:33am

I’m writing a quadtree engine. Until recently I was using a glBegin()/glEnd() pair to do a GL_TRIANGLE_FAN for all of the nodes to be drawn. I was getting around 70fps most of the time.
I decided it would be far more efficient to use vertex arrays, and so I converted all the terrain points to have a vertex, normal, and texcoord array element. Then, I changed the drawing code to basically this:

glBegin(GL_TRIANGLE_FAN);
glArrayElement(Node->TriFanIndex);
glArrayElement(Node->FanPointIndex[0]);
…
…
glEnd();

This work(s) fine, but I only get 40-50fps, rising to 70 when I move right to the edge of the terrain and most of it is frustum-culled out of the quadtree. I thought this was strange, and so added the compiled vertex array extension support, but this made no visible difference at all.
Please, can you help?

-Kieren

Nutty · August 5, 2001, 2:03am

Dont ever use glArrayElement. If you’re gonna use that, you may as well just use standard immediate mode commands.

What you should do is use glDrawElements, or glDrawArrays. To do this within a quad tree, you’ll have to setup vertex arrays per leaf node of the quadtree. So that you can easily draw an entire array without having to extract the polys in that node.

One possible solution is to create a display list per leaf node, so that when traversing the quad tree, you only have to call the display list. (Though this might use too much vid memory to be practical)

Hope that helps.

Nutty

P.S. Compiled Array extension doesn’t really do that much, except cache transformed vertices for if they’re used again. Only ever used in indexed mode. How big this cache is, I dont know. It seems that this extension should be good for multi-pass rendering, prodived the cache of transformed vertices was big enuff. Doing a single pass, you’re very unlikely to get any speed gain out of it.

[This message has been edited by Nutty (edited 08-05-2001).]

Fugit · August 5, 2001, 2:19am

Woohoo, glDrawElements() (index-based) just what I was looking for Thanks

Fugit · August 5, 2001, 2:39am

Hmm, I nearly forgot, I can’t use one (big) glDrawElements() call, as I’m using triangle fans at the moment. Calling it once, per triangle fan, gives me 55-60fps, which is OK, but would it be better to use triangles instead, less (or one) glDrawElements() call, but using more memory? (for the indices)
Thanks again,

Kieren

Fugit · August 5, 2001, 2:46am

Reading over what you said again, Nutty, I have a few things I should add
First, every node is a “leaf” node (assuming that means it contains polys) - I use the quadtree for dynamic LOD. Also, because of this LOD, nodes when rendered can have up to 4 extra polys to remove artifacts (‘cracks’) in the terrain… so I can’t really do big static display lists, or arrays, even though the actual terrain is static.
So in your opinion, do you think it would be a better idea to (1) use glDrawElements to draw each triangle fan, or (2) add GL_TRIANGLE indices to an array, then draw that array … although that uses more memory?
Thanks

Fugit · August 5, 2001, 6:25am

Dawww, just tried building a big index list and calling glDrawElements()… 35fps on average
I’ve tried building the buffer then drawing all in one go, drawing + resetting every 1000, 100 triangles, that just makes it worse.
Please, someone help? :I

The_Legend · August 5, 2001, 6:38am

Have you ever thought that cvas may be the wrong way to speed up your engine?

When the scene always changes, using arrays & displaylists efficently gets tricky.

Nutty · August 5, 2001, 6:39am

Hmmmmmm…

If every node, is a leaf node, i.e. it contains polys, then I assume your quadtree is only 1 level deep?

Is this right?

Nutty

P.S. Why not use Tri-strips instead of Tri-fans?

[This message has been edited by Nutty (edited 08-05-2001).]

Fugit · August 5, 2001, 6:50am

Actually, the quadtree is up to 7 levels deep.
Basically, for each node, I calculate an LOD, which corresponds to a recursion depth.
If I want to draw a node at LOD 4, I simple recurse no further than depth 4 for that node, and draw the polygons in that node - which are infact an approximation (pretty much) of the nodes below it…
The problem with tri-strips is, well, I can’t one to approximate a terrain with different LODs… can I? :eek:

Fugit · August 5, 2001, 6:54am

I know this will be helpful in describing what I do at the moment (doesn’t include recursion code).
You should also note that I call UpdateQuadtree() first, which does all the LOD, culling, etc., then RenderQuadtree(), which traverses and renders the quadtree.

if (*Node->EdgeIndices[0] > 1)
{
ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[0];
ArrayIndices[NumArrayIndices++] = Poly->ExtraFanPointIndices[0];
  ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
  ArrayIndices[NumArrayIndices++] = Poly->ExtraFanPointIndices[0];
  ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[1];
}
else
{
ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[0];
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[1];
}
if (*Node->EdgeIndices[1] > 1)
{
ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[1];
ArrayIndices[NumArrayIndices++] = Poly->ExtraFanPointIndices[1];
  ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
  ArrayIndices[NumArrayIndices++] = Poly->ExtraFanPointIndices[1];
  ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[2];
}
else
{
ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[1];
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[2];
}
if (*Node->EdgeIndices[2] > 1)
{
ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[2];
ArrayIndices[NumArrayIndices++] = Poly->ExtraFanPointIndices[2];
  ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
  ArrayIndices[NumArrayIndices++] = Poly->ExtraFanPointIndices[2];
  ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[3];
}
else
{
ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[2];
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[3];
}
if (*Node->EdgeIndices[3] > 1)
{
ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[3];
ArrayIndices[NumArrayIndices++] = Poly->ExtraFanPointIndices[3];
  ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
  ArrayIndices[NumArrayIndices++] = Poly->ExtraFanPointIndices[3];
  ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[0];
}
else
{
ArrayIndices[NumArrayIndices++] = Poly->FanBaseIndex;
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[3];
ArrayIndices[NumArrayIndices++] = Poly->FanPointIndices[0];
}

if (NumArrayIndices / 3 >= INDEX_ARRAY_SIZE)
{
// draw, reset
glDrawElements(GL_TRIANGLES, NumArrayIndices, GL_UNSIGNED_INT, ArrayIndices);
NumArrayIndices = 0;
}

I’ll cry soon… sniff

-Kieren

system · August 5, 2001, 6:55am

hi

perhaps you should try to split your terrain up in several “vertex-buffers”. some buffers will never be touched in a particular frame. that saves some memory bandwith, i think . …

if you’re using a GPU (geforce, radoen) use the appropriate extensions (fences for geforce) to make use of parallelism.

just my 2 cents

freakyboy

Fugit · August 5, 2001, 12:01pm

That made no difference either… grr

…sniff…sob…

zed · August 5, 2001, 12:07pm

basically youre not drawing enuf polygons in one call
using fans u would do i assume 8 tris per call.

try to draw as many tris per call (but not to many, this varies but if u stick to under 4000 vertices u should be ok)

without using extensions
fastest is glDrawArrays()
then glDrawElements()
…
begin…end()

BUT if the geometry changes (ROAM) using begin…end might be quickest.

CVA’s occur a perfromance hit, they only should be used if u draw the geometry with multiple passes.

in my game i believe i render blocks of 4x4 quads. rendered as tris is 4x4x2 = 32 tris per call i one degenrant tri_strip, but only cause i change textures quite a bit + cause i use quite fine culling.

from my experiments (which ive done quite a bit)

with these sized blocks 8x8,16x16,32x32
a 16x16 block is the quickest. 16x16x2 = 512 tris plus degenerants

system · August 5, 2001, 1:03pm

I just (finally) solved a similar problem with Bezier patches. I’m also doing dynamic LOD with crack fixing at the edges, and I wanted to figure out how to do this without changing my vertex arrays or sending unecessary verts to OpenGL.

After much experimentation, I came up with an algorithm that I like Instead of storing the verts for each patch linearly (top to bottom, left to right, or whatever), I store them in the order that they are used, from lowest LOD to highest. For example, the lowest LOD just uses the four corner verts, so those are 0, 1, 2 and 3 in the array. The next LOD adds five internal verts, so those are 4, 5, 6, 7 and 8 in the array. And so on.

This way, your indices for any given LOD will not be spread randomly around the vert array (which results in unecessary transformations).

I don’t know if this will be helpful to you, and the details of implimentation are non-trivial (at least it took me a while to work out But in short, figuring out how to use vertex arrays efficiently with dynamic tessellation is tricky, so I sympathize

Fugit · August 5, 2001, 11:00pm

Well, thanks for all your input… but I’ve decided to create my own mixture
Basically, I will fill up a buffer with triangle fans - probably by specifying the base point, number of fan points, then the fan points. This is because I tried using glBegin()/glEnd() as an alternative to glDrawElements() (i.e., I drew the buffer myself) with a slight performance INCREASE
So, I thought, why not just optimize that more and use fans? I’ll try that when I get back from London today, I’ll try to post the results here… I’m sure you’re all simply dying to know the outcome…
Well, bye bye

Thanks again everyone

-Kieren