store indices data in VBO

I rewrite my program to store indices data in VBO,and I think the indices data stored in VBO can
avoid bus data transfer when call glDrawElements()/glMultiDrawElementsEXT(),so the performance can be increased.But to my surprise,the FPS is decreased.This is strange,below is my piece of code:

// the code to generate VBO to store indices:
glGenBuffersARB(1,&id.Iid);
glBindBufferARB(GL_ARRAY_BUFFER_ARB,id.Iid);
glBufferDataARB(GL_ARRAY_BUFFER_ARB,sizeof(GLint)*GetNumIndices(),indices,GL_STATIC_DRAW_ARB);

// the code to use indices VBO to draw:
glBindBufferARB(GL_ARRAY_BUFFER_ARB,m_VboId[s].Vid);
glVertexPointer(3,GL_FLOAT,sizeof(VboVertex),(void *)offsetof(VboVertex,xyz));
glNormalPointer(GL_FLOAT,sizeof(VboVertex),(void *)offsetof(VboVertex,normal));
glTexCoordPointer(2,GL_FLOAT,sizeof(VboVertex),(void *)offsetof(VboVertex,texcoord)); VertexElementCont::iterator iElem = m_VertexElemCont.begin()+s;
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, m_VboId[s].Iid);
glMultiDrawElementsEXT(GL_TRIANGLE_STRIP,&iElem->vCount[0],GL_UNSIGNED_INT,(const GLvoid **)&iElem->vIndices[0],iElem->vCount.size());

I believe that GL_UNSIGNED_SHORT is faster than GL_UNSIGNED_INT.

Regards
elFarto

Do you use nVidia hardware? It is possible, that with indices in RAM the driver converts them to UNSIGNED_SHORT indices (AFAIK nVidias drivers do such things when possible), before sending them to the GPU. On ATI this should be equally fast, no matter where the indices are stored.

However switching to GL_UNSIGNED_SHORT does indeed give you a noticeable speed-boost on nVidia and ATI cards, though it is sometimes difficult to split your data into such chunks.

Hope that helps,
Jan.

you should use glDrawRangeElements (or the multi version of it). Otherwise it’s possible the driver is walking the indices gathering min/max information every time you draw.

I had tried to replace GL_UNSIGNED_INT with GL_UNSIGNED_SHORT when call glMultiDrawElementsEXT(),but it is no use,the FPS is not changed.

The biggest part of the VBO memory management is done when glVertexPointer(…) is called (for nvidia cards anyway), so it’s best to set your glNormalPointer(…) and glTexCoordPointer(…) before you call glVertexPointer(…).

You will probably get a bigger performance boost if you use the GL_T2F_N3F_V3F interleaved array format.

PS. If your data is stored sequentially, you can pass 0 as the stride parameter of your pointer calls.

N.

Are you sure about the work done for glVertexPointer(…)? I would have guessed it will be defferred to the next draw call…

Any any suggestions about the order of glVertexPointer(…, addr) [addr != NULL] and glEnableClientState(GL_VERTEX_ARRAY). And what about the order of glVertexPointer(…, NULL) and glDisableClientState(GL_VERTEX_ARRAY)?

Yes I am. Check out pages 12 and 13 of this document.

I believe that glEnableClientState and glDisableClientState are delayed state changes, much like binding/unbinding a VBO so I don’t think the order of the calls matters here.

N.

Oh, thanks for the link!

This document is rather old: are the mentioned suggestitions still valid?

I’m not sure but I don’t see why this would change. The one thing you know you need with VBOs is the glVertexPointer call, the other pointer calls are optional so it seems quite logical to put the biggest part of the VBO memory manager in the glVertexPointer call.

It’s possible that they moved this e.g. deferred to the next draw call like you said, but changing the order of the pointer calls like they said in the document can only be benificial so I can’t think of a reason why not to do it.

N.

That makes absolutly sense!

Thanx

Also, be advised: nVidia does not suggest putting index data in the same buffer object as vertex data.

Sorry for this offtopic question, but… where do you guys get this kind of information from?

CatDog

Is that even possible? I mean, you allocate a buffer using the buffer-type enum (glBufferData(GL_ARRAY_BUFFER,…,…,…)), so surely that not only dictates the size but also the type of buffer it is?!

Indeed, according to the GL_ARB_vertex_buffer_object spec:

Note that it is expected that implementations may have different memory type requirements for efficient storage of indices and vertices. For example, some systems may prefer indices in AGP memory and vertices in video memory, or vice versa; or, on systems where DMA of index data is not supported, index data must be stored in (cacheable) system memory for acceptable performance. As a result, applications are strongly urged to put their models’ vertex and index data in separate buffers, to assist drivers in choosing the most efficient locations.

Is that even possible? I mean, you allocate a buffer using the buffer-type enum (glBufferData(GL_ARRAY_BUFFER,…,…,…)), so surely that not only dictates the size but also the type of buffer it is?!
[/QUOTE]

Again according to the GL_ARB_vertex_buffer_object spec:

Buffer objects created by binding an unused name to ARRAY_BUFFER_ARB and to ELEMENT_ARRAY_BUFFER_ARB are formally equivalent, but the GL may make different choices about storage implementation based on the initial binding. In some cases performance will be optimized by storing indices and array data in separate buffer objects, and by creating those buffer objects with the corresponding binding points.

N.

The spec again… point taken, NiCo. :slight_smile:

CatDog

For example, some systems may prefer indices in AGP memory and vertices in video memory, or vice versa; or, on systems where DMA of index data is not supported, index data must be stored in (cacheable) system memory for acceptable performance.

I don’t think this is true anymore. It was true at the times of GF1/2/3 (maybe 4). I have been using mixed index/vertex VBO’s and didn’t notice any performance issues with that on GF6/7/8 cards.

bloody hell, no wonder opengl implementation’s are such a mess with that kind of flabby specification.

On NVidia cards VBOs are always faster than client-side vertex arrays.

On ATI cards, VBOs are only faster if they are rendered in psuedo batches, like this:

-Set the vertex and element pointers
-Draw
-Draw
-Draw…

If a mesh is drawn just once, it is faster to use client-side vertex arrays on the ATI cards I tested this with.

Sorry, but that’s false. I know we both “wish” it were true, but it’s not. While your specific usage pattern for VBOs may be faster, some usage patterns are definitely not faster using VBOs on NVidia. And yes, that’s with the latest drivers.

I tested this a year or so ago, and just re-tested it again to ensure nothing’s changed. It hasn’t. In fact, I just halved the frame rate of an old test case (2.6ms -> 4.6ms) merely by enabling VBOs nievely for every batch submitted (all static geometry – no updates).

However, it would be useful to narrow down which usage pattern you think is always faster so collectively we can verify that. Interleaved vertex attributes in one buffer, sequential attribute lists in one buffer, or separate buffers per vertex attribute? Which attributes and attribute formats? 32-byte vertex data alignment or any padding involved? Indices in separate buffer or same buffer? USHORT or UINT indices? Multiple batches packed in each buffer, or just one? VertexPointer set last or not? Average batch size? etc.