Optimising using vertex arrays

Hi,
Currently I’m using one set of arrays (vertex, texcoord, normal) per object and then I call glDrawElements.
By doing this, I can translate/rotate my objects by calling glTranslate/glRotate primitives. And then I call GlDrawElements.

  1. How much data can those vertex arrays contain so as to get best performance ?

  2. What is the cost of changing arrays’ pointers for each object ?

  3. Would it be better if I had fixed size arrays ? I should fill those arrays and then flush them. The main avantage I see by doing this is that you don’t have to change arrays’ pointers. The main drawback is that you consume CPU time filling.

  4. In order to display overlays(text/bitmaps), is it better to consider them as common objects or is it better to use display lists ?

Thanx in advance for your answers

I would think that changing a pointer is much less overhead than filling in the array and flushing it everytime you want to change it.

you can try this, if you don’t want to call gl*Pointer too often:

GLfloat *pointer1;
GLfloat *pointer2;

GLfloat *static_pointer;

glVertexPointer(3, GL_FLOAT, 0, static_pointer);

then, say you want to draw the array pointed to by pointer1:

static_pointer = pointer1;
glDrawElements(…);

Then draw array 2:

static_pointer = pointer2;
glDrawElements(…);

Moz, maybe I’m missing something but I don’t see how that would work. In your example, you tell OpenGL that you are using the address pointed to by static_pointer. Then when you do static_pointer = pointer1, you have changed the address that static_pointer is pointing to, but OpenGL still thinks you want it to be pointing to the old address because you haven’t told it that static_pointer is now pointing to a new location.

Ok,

That was just a suggestion and I had not tried it actually. Now I see why that can’t work.

  1. If I remember correctly, an array of around 1K is optimal on some cards. I’m sure this will vary from card to card though.

  2. Dunno, but generally I consider it fairly costly.

  3. I personally use a single fixed size vertex array. Though the method may require a bit more cpu, the flexibility you gain from using this method I think more than makes up for the cpu cost.

  4. I personally like to keep things simple, so everything gets rendered via the vertex array.

is it really faster using ‘a single fixed size array’
personally im changing the pointers every time (most of the calls go through one glDrawElements function)
for a laugh i tried really pushing the polys today (10-40 q3 characters on the screen) and were quite surprised how well my non hardware tnl card handled it (last 2 shots) http://members.nbci.com/_XMCM/mybollux/projects/gotterdammerung/gotterdammerung.html

is anyone using display lists for anything? i can see them buying me some speed at the cost of simplicity

Well, I think that storing my geometry with a set of arrays per object will be nice.
I will have a vertex array, a normal array, a color array, and a texcoord array for each object. Then I will call glVertexPointer and so, and one call to glDrawElements per object.

Using fixed size array require CPU time and I don’t want to bother with SIMD instructions in order to make things going fast.

I will hash sort my objects by materials which will enable me to avoid unnecessary state changes.

In fact, the main problem I had when I asked my question was:

Is it possible to have 2 different objects in the temporary fixed size arrays AT THE SAME TIME ? If yes, this means that I have to rotate/translate the geometry by hand before putting it in the arrays. (no call to glRotate/glTranslate).

If it is not possible to have 2 differents objects in the arrays at the same time, then why do I need to copy the geometry in a temporary buffer ? I just have to change the pointers and it enables me to call glRotate/glTranslate.

The unique case I can figure, where fixed size arrays is a good choice is when you have to render a BSP tree. Since you just have to hash sort your faces by material before copying them in the fixed size arrays.

Am I totally wrong ?

as long as you dont lock your arrays you dont get anything from using the same pointer several times… at calling glDrawElements(or however called…), gl goes to the different pointers and begins to read out the data, not important if those are the same pointers like before… just when you lock the whole, it copies it into faster ram no gpu and then you have to sort the meshes…

Sorry…but how to lock an array?

I tried to use glVertexPointer(3,GL_FLOAT,0,MyVertexArray); to set my vertices. To use them between glBegin(GL_TRIANGLES); and glEnd(); I called glArrayElement(ArrayIndex);

Now it is much slower than creating the vertices with glVertex3f(x,y,z); everytime!

I hope somebody has a tutorial or can display a simple sample here, I need it!

Thanks…

You dont want to use glArrayElement, as you’re still just passing over everything seperately to the gfx card.

You should be using glDrawElements (if you care about performance anyway), for this you need to set up a list of indexes into your vertex buffer too. This function is what you should be using, as it’s the most optimized, and this function is required I believe if you want to lock your arrays. AFAIK alot of drivers will not make use of glLock extenions if you dont use glDrawElements.

For some sample code of using glDrawElements, and the lock functions check my website. www.nutty.org

  • Nutty

Okay, one thing I havn’t seen addressed yet is what about rendering alot of smaller models (not terrian/rooms but things like game characters and cars and ships etc.)? I realize you’re supposed to try and keep your vertex buffers around 1k (according to DFrey). And DFrey says to use a single vertex buffer. And I also realize you should keep the number of calls to glDrawElements down to a minimum. But there are only two ways I know of to orient a model:

  1. Maintain a matrix for each object and do a glMultMatrix() and then a glDrawElements for EACH object (which also ruins any material sorting you’ve got).
  2. Actually multiply EACH VERTEX in EACH model by it’s own matrix EVERY FRAME. This would allow you to keep your material sorting and would allow larger batches but would require you to refill a vertex buffer every frame since each model’s geometry could change each frame. Not to mention the overhead of processing EACH AND EVERY vertex for each object in the game world.

So which way is better? Or is there another way?

If your objects contain a decent amount of polys, then I’d go with the 1st option.

If not, then chances are you’re not drawing soo many polys anyway, as it to cause serious problem.

Originally posted by Guardian:

3) Would it be better if I had fixed size arrays ? I should fill those arrays and then flush them. The main avantage I see by doing this is that you don’t have to change arrays’ pointers. The main drawback is that you consume CPU time filling.

Well, after my early suggestion that was to say the least, completely wrong , I started implementing my rendering engine based on vertex arrays.

I came up with the following idea: put everything (all the objects) in a unique huge vertex array which size is defined at the start of the application.
I see several avantages in doing that.

1)I won’t have to change the pointers any time.

2)It doesn’t matter that the array is huge since I only render a max of say 1k vertices (as DFrey suggested) for each call to glDrawElements (and you have to store the data somewhere anyway, so why not in the vertex array in the first place).

3)I don’t have to copy the data from one array to another every time I render a new object.

There are obviously some complications.

If I put all my objects in the vertex array (actually the arrays since there can be texture coord, colors…) at the initialisation of the program, that’s ok, I can put their data in successive blocks of memory in the array(s).

What if I want to dynamically remove or add an object?
For that I’ve got a VertexMemoryManager class that keeps track of all the allocated chunks of memory in the vertex array via a linked list of references (pointers) to these chunks.

Then when I want to remove an object, I simply remove the reference from the list.

If I want to add an object, I look for two non-contiguous chunks in the vertex array and allocate the free memory between them to my new object until all its data has a place in the array. then I copy its data to the array and remap its indices.

This makes the loading of an object a bit slower but I think that the fact that I don’t have to either change the pointers or copy the data every frame is a more important
benefit.

I’ve not tested it completely, but until now it seems viable. I was just wondering what you would think of that solution.

What about cache issues ? I mean, if you have an array of 1million elements, access 1000 of them more or less randomly, wouldn’t it be slower than accessing 1000 sequential elements due to cache hits ?

Y.

Do you mean CPU cache or GPU cache.
Because according to what I understood of how vertex arrays work (but I may be completely wrong again), the graphics card accesses directly the AGP memory.
Plus GPU vertex cache is very small (10 vertices?) so it is only useful when you have redundancy on short periodicity (like say when using triangle strips) which my method does not affect.

I’m talking of the kind of memory where the VA are… like CPU if the vertex array is in RAM, and GPU if you stored it on video or AGP memory. I wasn’t talking of vertex cache. By the way, AGP and video memory has no cache at all, so wouldn’t the performances be horrible if you put the vertex array in video/AGP memory and access it randomly ?

Y.

Originally posted by Ysaneya:
By the way, AGP and video memory has no cache at all, so wouldn’t the performances be horrible if you put the vertex array in video/AGP memory and access it randomly ?

Yes. If you’re going to write to AGP memory, you should do it sequentially and not randomly. If you need random access, I suspect it would be better to keep one buffer in system RAM on which you perform the random updates. You can then sequentially copy this buffer to another one in AGP mem, which you use for the rendering.

  • Tom

If that is a problem, it only occurs when I load a new object in my large vertex arrays.
By the way, when I copy data to my arrays, I always copy one large block at a time (like a big memcpy, I guess that’s what you mean by writting sequentially), unless my arrays are very fragmented, then, I agree performance can suffer, so I will probably have to defragment the arrays.

Then, dereferencing the arrays with glDrawElements is always a random operation to some extent (with glDrawElements your index array rarely is {0, 1, 2, 3…}).

And for the moment I don’t use AGP memory (but I probably will), so all the data is still in RAM. But I cannot see how there would be more CPU cache problems with my method than if you refill your arrays every time you render a new object or if you change the pointers.

Writing is not the problem. If the idea is to have one huge unique vertex array for your whole scene, assuming you put it in AGP memory, and only random elements of it are used in a glDrawElements call, the problem is reading from AGP memory. I guess this read will be done by the hardware or the driver, but it’s still unsequential access… that’s what i fear the most.
Really, writing is not a problem, since with this method you create the VA once.

Y.

What I don’t understand is how it could be a sequential read at all if using glDrawElements?
Assuming you use glDrawElements (which seems to be the most used and most optimised method) with an array containing 10K entries. Your index array may reference the entry 0, then the entry 9999. Doesn’t look very sequential to me.
But I’m probably missing something important here, am I?