View Full Version : Optimising using vertex arrays

03-10-2001, 02:55 AM
Currently I'm using one set of arrays (vertex, texcoord, normal) per object and then I call glDrawElements.
By doing this, I can translate/rotate my objects by calling glTranslate/glRotate primitives. And then I call GlDrawElements.

1) How much data can those vertex arrays contain so as to get best performance ?

2) What is the cost of changing arrays' pointers for each object ?

3) Would it be better if I had fixed size arrays ? I should fill those arrays and then flush them. The main avantage I see by doing this is that you don't have to change arrays' pointers. The main drawback is that you consume CPU time filling.

4) In order to display overlays(text/bitmaps), is it better to consider them as common objects or is it better to use display lists ?

Thanx in advance for your answers http://www.opengl.org/discussion_boards/ubb/wink.gif

03-10-2001, 03:20 AM
I would think that changing a pointer is much less overhead than filling in the array and flushing it everytime you want to change it.

you can try this, if you don't want to call gl*Pointer too often:

GLfloat *pointer1;
GLfloat *pointer2;
GLfloat *static_pointer;

glVertexPointer(3, GL_FLOAT, 0, static_pointer);

then, say you want to draw the array pointed to by pointer1:

static_pointer = pointer1;

Then draw array 2:

static_pointer = pointer2;

03-10-2001, 05:47 AM
Moz, maybe I'm missing something but I don't see how that would work. In your example, you tell OpenGL that you are using the address pointed to by static_pointer. Then when you do static_pointer = pointer1, you have changed the address that static_pointer is pointing to, but OpenGL still thinks you want it to be pointing to the old address because you haven't told it that static_pointer is now pointing to a new location.

03-10-2001, 09:49 AM

That was just a suggestion and I had not tried it actually. Now I see why that can't work.

03-10-2001, 11:29 AM
1. If I remember correctly, an array of around 1K is optimal on some cards. I'm sure this will vary from card to card though.

2. Dunno, but generally I consider it fairly costly.

3. I personally use a single fixed size vertex array. Though the method may require a bit more cpu, the flexibility you gain from using this method I think more than makes up for the cpu cost.

4. I personally like to keep things simple, so everything gets rendered via the vertex array.

03-10-2001, 06:46 PM
is it really faster using 'a single fixed size array'
personally im changing the pointers every time (most of the calls go through one glDrawElements function)
for a laugh i tried really pushing the polys today (10-40 q3 characters on the screen) and were quite surprised how well my non hardware tnl card handled it (last 2 shots) http://members.nbci.com/_XMCM/mybollux/projects/gotterdammerung/gotterdammerung.html

is anyone using display lists for anything? i can see them buying me some speed at the cost of simplicity

03-11-2001, 11:24 AM
Well, I think that storing my geometry with a set of arrays per object will be nice.
I will have a vertex array, a normal array, a color array, and a texcoord array for each object. Then I will call glVertexPointer and so, and one call to glDrawElements per object.

Using fixed size array require CPU time and I don't want to bother with SIMD instructions in order to make things going fast.

I will hash sort my objects by materials which will enable me to avoid unnecessary state changes.

In fact, the main problem I had when I asked my question was:

Is it possible to have 2 different objects in the temporary fixed size arrays AT THE SAME TIME ? If yes, this means that I have to rotate/translate the geometry by hand before putting it in the arrays. (no call to glRotate/glTranslate).

If it is not possible to have 2 differents objects in the arrays at the same time, then why do I need to copy the geometry in a temporary buffer ? I just have to change the pointers and it enables me to call glRotate/glTranslate.

The unique case I can figure, where fixed size arrays is a good choice is when you have to render a BSP tree. Since you just have to hash sort your faces by material before copying them in the fixed size arrays.

Am I totally wrong ? http://www.opengl.org/discussion_boards/ubb/smile.gif

03-11-2001, 11:33 AM
as long as you dont lock your arrays you dont get anything from using the same pointer several times.. at calling glDrawElements(or however called..), gl goes to the different pointers and begins to read out the data, not important if those are the same pointers like before.. just when you lock the whole, it copies it into faster ram no gpu and then you have to sort the meshes..

03-12-2001, 03:12 AM
Sorry....but how to lock an array?

I tried to use glVertexPointer(3,GL_FLOAT,0,MyVertexArray); to set my vertices. To use them between glBegin(GL_TRIANGLES); and glEnd(); I called glArrayElement(ArrayIndex);

Now it is much slower than creating the vertices with glVertex3f(x,y,z); everytime!

I hope somebody has a tutorial or can display a simple sample here, I need it! http://www.opengl.org/discussion_boards/ubb/confused.gif


03-12-2001, 03:58 AM
You dont want to use glArrayElement, as you're still just passing over everything seperately to the gfx card.

You should be using glDrawElements (if you care about performance anyway), for this you need to set up a list of indexes into your vertex buffer too. This function is what you should be using, as it's the most optimized, and this function is required I believe if you want to lock your arrays. AFAIK alot of drivers will not make use of glLock extenions if you dont use glDrawElements.

For some sample code of using glDrawElements, and the lock functions check my website. www.nutty.org (http://www.nutty.org)

- Nutty

03-12-2001, 05:11 AM
Okay, one thing I havn't seen addressed yet is what about rendering alot of smaller models (not terrian/rooms but things like game characters and cars and ships etc.)? I realize you're supposed to try and keep your vertex buffers around 1k (according to DFrey). And DFrey says to use a single vertex buffer. And I also realize you should keep the number of calls to glDrawElements down to a minimum. But there are only two ways I know of to orient a model:
1. Maintain a matrix for each object and do a glMultMatrix() and then a glDrawElements for EACH object (which also ruins any material sorting you've got).
2. Actually multiply EACH VERTEX in EACH model by it's own matrix EVERY FRAME. This would allow you to keep your material sorting and would allow larger batches but would require you to refill a vertex buffer every frame since each model's geometry could change each frame. Not to mention the overhead of processing EACH AND EVERY vertex for each object in the game world.

So which way is better? Or is there another way?

03-12-2001, 11:16 PM
If your objects contain a decent amount of polys, then I'd go with the 1st option.

If not, then chances are you're not drawing soo many polys anyway, as it to cause serious problem.

03-15-2001, 03:32 AM
Originally posted by Guardian:

3) Would it be better if I had fixed size arrays ? I should fill those arrays and then flush them. The main avantage I see by doing this is that you don't have to change arrays' pointers. The main drawback is that you consume CPU time filling.

Well, after my early suggestion that was to say the least, completely wrong http://www.opengl.org/discussion_boards/ubb/redface.gif, I started implementing my rendering engine based on vertex arrays.

I came up with the following idea: put everything (all the objects) in a unique huge vertex array which size is defined at the start of the application.
I see several avantages in doing that.

1)I won't have to change the pointers any time.

2)It doesn't matter that the array is huge since I only render a max of say 1k vertices (as DFrey suggested) for each call to glDrawElements (and you have to store the data somewhere anyway, so why not in the vertex array in the first place).

3)I don't have to copy the data from one array to another every time I render a new object.

There are obviously some complications.

If I put all my objects in the vertex array (actually the arrays since there can be texture coord, colors...) at the initialisation of the program, that's ok, I can put their data in successive blocks of memory in the array(s).

What if I want to dynamically remove or add an object?
For that I've got a VertexMemoryManager class that keeps track of all the allocated chunks of memory in the vertex array via a linked list of references (pointers) to these chunks.

Then when I want to remove an object, I simply remove the reference from the list.

If I want to add an object, I look for two non-contiguous chunks in the vertex array and allocate the free memory between them to my new object until all its data has a place in the array. then I copy its data to the array and remap its indices.

This makes the loading of an object a bit slower but I think that the fact that I don't have to either change the pointers or copy the data every frame is a more important

I've not tested it completely, but until now it seems viable. I was just wondering what you would think of that solution.

03-15-2001, 08:52 AM
What about cache issues ? I mean, if you have an array of 1million elements, access 1000 of them more or less randomly, wouldn't it be slower than accessing 1000 sequential elements due to cache hits ?


03-15-2001, 09:51 AM
Do you mean CPU cache or GPU cache.
Because according to what I understood of how vertex arrays work (but I may be completely wrong again), the graphics card accesses directly the AGP memory.
Plus GPU vertex cache is very small (10 vertices?) so it is only useful when you have redundancy on short periodicity (like say when using triangle strips) which my method does not affect.

03-15-2001, 02:03 PM
I'm talking of the kind of memory where the VA are.. like CPU if the vertex array is in RAM, and GPU if you stored it on video or AGP memory. I wasn't talking of vertex cache. By the way, AGP and video memory has no cache at all, so wouldn't the performances be horrible if you put the vertex array in video/AGP memory and access it randomly ?


Tom Nuydens
03-15-2001, 11:11 PM
Originally posted by Ysaneya:
By the way, AGP and video memory has no cache at all, so wouldn't the performances be horrible if you put the vertex array in video/AGP memory and access it randomly ?

Yes. If you're going to write to AGP memory, you should do it sequentially and not randomly. If you need random access, I suspect it would be better to keep one buffer in system RAM on which you perform the random updates. You can then sequentially copy this buffer to another one in AGP mem, which you use for the rendering.

- Tom

03-16-2001, 01:34 AM
If that is a problem, it only occurs when I load a new object in my large vertex arrays.
By the way, when I copy data to my arrays, I always copy one large block at a time (like a big memcpy, I guess that's what you mean by writting sequentially), unless my arrays are very fragmented, then, I agree performance can suffer, so I will probably have to defragment the arrays.

Then, dereferencing the arrays with glDrawElements is always a random operation to some extent (with glDrawElements your index array rarely is {0, 1, 2, 3...}).

And for the moment I don't use AGP memory (but I probably will), so all the data is still in RAM. But I cannot see how there would be more CPU cache problems with my method than if you refill your arrays every time you render a new object or if you change the pointers.

03-16-2001, 01:58 AM
Writing is not the problem. If the idea is to have one huge unique vertex array for your whole scene, assuming you put it in AGP memory, and only random elements of it are used in a glDrawElements call, the problem is *reading* from AGP memory. I guess this read will be done by the hardware or the driver, but it's still unsequential access.. that's what i fear the most.
Really, writing is not a problem, since with this method you create the VA once.


03-16-2001, 02:27 AM
What I don't understand is how it could be a sequential read at all if using glDrawElements?
Assuming you use glDrawElements (which seems to be the most used and most optimised method) with an array containing 10K entries. Your index array may reference the entry 0, then the entry 9999. Doesn't look very sequential to me.
But I'm probably missing something important here, am I?

03-16-2001, 02:37 AM
I think this only applies when not locking the vertex array, or not using nvidias VAR. Because whith these all the vertices in the desired area of memory is processed, double used Vertices are (should) only be processed once.
At least i gained a large increase in Framerate only by locking my VertexArrays (i use one per Object aprox 1000 Vertices per Array)


03-16-2001, 08:42 AM
Moz: i'll explain "sequential" versus "non-sequential" VA access:
Imagine you have a simple VA with 12 elements, and you are using GL_TRIANGLES primitive ( 3 elements per triangle ).
Sequential access, for example: 0 1 2 , 3 4 5, 6 7 8, 9 10 11
Non-sequential access: 0 1 2, 9 10 11, 6 7 8, 3 4 5


03-16-2001, 11:40 AM
Is there substantial overhead in calling gl*Pointer that makes it prohibitive to use? It seems to me that these calls can't possibly take as long as copying (AGP memory or just regular) potentially thousands of vertices each frame. After all, the data is already in memory each frame (in the case of static data); its just a matter of getting the hardware to look in the right place of rit.

03-16-2001, 12:03 PM
hm.. when you do something like this:

unsigned char* buffer0 = new unsigned char[ sizeofbuffer0 ];
fill it..
unsigned char* buffer1 = new unsigned char[ sizeofbuffer1 ];
fill it..

and then use glDrawElements, its NOT important if you do it like this:

glDrawElements( buffer0 );
glDrawElements( buffer1 );
glDrawElements( buffer0 );
glDrawElements( buffer1 );
glDrawElements( buffer0 );
glDrawElements( buffer1 );

or like this:

glDrawElements( buffer0 );
glDrawElements( buffer0 );
glDrawElements( buffer0 );
glDrawElements( buffer1 );
glDrawElements( buffer1 );
glDrawElements( buffer1 );

and why?! because opengl processes EVERY VERTEX AFTER THE OTHER.. and like that it done store anything ( not in your cpumemory! ).. and it does not remember the last pointer.. ( vertex_program shows it best.. one vertex after the other.. )

means.. if you dont lock your array ( then it will be copied onto gpu ram or something like this.. ) or use wglAllocate ( and best of all: lock it there http://www.opengl.org/discussion_boards/ubb/smile.gif ) etc but just simple new or malloc, and then glDrawElements.. it does not do anything else then everytime it is called beginning reading out first vertex, send it to your program/ffp and then do it with the next..


why? cause even when you let the pointer be, you possibly have changed the data, and it has to reread it again..

learn how opengl/gpu's works, and you learn much http://www.opengl.org/discussion_boards/ubb/smile.gif but its simple.. its a pipeline, every thing after the other one.. first the vertices, then it rasterices the triangle ( it stores up to 3 vertices to do that ) and the rastericer calls for every pixel on the triangle the "pixel program", wich is not fully accessable by us today, but you can change much of it with registercombiners (2) and textureshaders..

every one after the other one, and this incredible fast http://www.opengl.org/discussion_boards/ubb/smile.gif

03-17-2001, 04:34 AM
Originally posted by Ysaneya:
Sequential access, for example: 0 1 2 , 3 4 5, 6 7 8, 9 10 11


It's never the case when you use glDrawElements, that's why you need an index array.
Otherwise you would simply use glDrawArrays ... which you don't because glDrawElements is more convenient.
And in nvidia's performance FAQ, it is said that glDrawElements is often faster than glDrawArrays. So there is no gain in having the sequential VA access you describe.

03-17-2001, 06:13 AM
I'm pretty sure that the sequential stuff only matters for writing data into AGP memory. I don't quite see how people keep forgetting that the way the graphics card reads AGP is very different from how the CPU does. They are in two very different places accessing it in very different ways. So feel free to use your glDrawElements() call on vertex_array_range_NV memory.