PDA

View Full Version : store indices data in VBO



pango
02-19-2008, 02:05 AM
I rewrite my program to store indices data in VBO,and I think the indices data stored in VBO can
avoid bus data transfer when call glDrawElements()/glMultiDrawElementsEXT(),so the performance can be increased.But to my surprise,the FPS is decreased.This is strange,below is my piece of code:

// the code to generate VBO to store indices:
glGenBuffersARB(1,&id.Iid);
glBindBufferARB(GL_ARRAY_BUFFER_ARB,id.Iid);
glBufferDataARB(GL_ARRAY_BUFFER_ARB,sizeof(GLint)* GetNumIndices(),indices,GL_STATIC_DRAW_ARB);


// the code to use indices VBO to draw:
glBindBufferARB(GL_ARRAY_BUFFER_ARB,m_VboId[s].Vid);
glVertexPointer(3,GL_FLOAT,sizeof(VboVertex),(void *)offsetof(VboVertex,xyz));
glNormalPointer(GL_FLOAT,sizeof(VboVertex),(void *)offsetof(VboVertex,normal));
glTexCoordPointer(2,GL_FLOAT,sizeof(VboVertex),(vo id *)offsetof(VboVertex,texcoord)); VertexElementCont::iterator iElem = m_VertexElemCont.begin()+s;
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, m_VboId[s].Iid);
glMultiDrawElementsEXT(GL_TRIANGLE_STRIP,&iElem->vCount[0],GL_UNSIGNED_INT,(const GLvoid **)&iElem->vIndices[0],iElem->vCount.size());

elFarto
02-19-2008, 02:12 AM
I believe that GL_UNSIGNED_SHORT is faster than GL_UNSIGNED_INT.

Regards
elFarto

Jan
02-19-2008, 02:39 AM
Do you use nVidia hardware? It is possible, that with indices in RAM the driver converts them to UNSIGNED_SHORT indices (AFAIK nVidias drivers do such things when possible), before sending them to the GPU. On ATI this should be equally fast, no matter where the indices are stored.

However switching to GL_UNSIGNED_SHORT does indeed give you a noticeable speed-boost on nVidia and ATI cards, though it is sometimes difficult to split your data into such chunks.

Hope that helps,
Jan.

knackered
02-19-2008, 02:55 AM
you should use glDrawRangeElements (or the multi version of it). Otherwise it's possible the driver is walking the indices gathering min/max information every time you draw.

pango
02-19-2008, 06:52 AM
I had tried to replace GL_UNSIGNED_INT with GL_UNSIGNED_SHORT when call glMultiDrawElementsEXT(),but it is no use,the FPS is not changed.

-NiCo-
02-19-2008, 07:32 AM
The biggest part of the VBO memory management is done when glVertexPointer(...) is called (for nvidia cards anyway), so it's best to set your glNormalPointer(...) and glTexCoordPointer(..) before you call glVertexPointer(...).

You will probably get a bigger performance boost if you use the GL_T2F_N3F_V3F interleaved array format.

PS. If your data is stored sequentially, you can pass 0 as the stride parameter of your pointer calls.

N.

Hampel
02-19-2008, 08:12 AM
Are you sure about the work done for glVertexPointer(...)? I would have guessed it will be defferred to the next draw call...

Any any suggestions about the order of glVertexPointer(..., addr) [addr != NULL] and glEnableClientState(GL_VERTEX_ARRAY). And what about the order of glVertexPointer(..., NULL) and glDisableClientState(GL_VERTEX_ARRAY)?

-NiCo-
02-19-2008, 08:24 AM
Are you sure about the work done for glVertexPointer(...)? I would have guessed it will be defferred to the next draw call...

Yes I am. Check out pages 12 and 13 of this (http://developer.nvidia.com/object/using_VBOs.html) document.


Any any suggestions about the order of glVertexPointer(..., addr) [addr != NULL] and glEnableClientState(GL_VERTEX_ARRAY). And what about the order of glVertexPointer(..., NULL) and glDisableClientState(GL_VERTEX_ARRAY)?

I believe that glEnableClientState and glDisableClientState are delayed state changes, much like binding/unbinding a VBO so I don't think the order of the calls matters here.

N.

Hampel
02-19-2008, 08:37 AM
Oh, thanks for the link!

This document is rather old: are the mentioned suggestitions still valid?

-NiCo-
02-19-2008, 08:47 AM
I'm not sure but I don't see why this would change. The one thing you know you need with VBOs is the glVertexPointer call, the other pointer calls are optional so it seems quite logical to put the biggest part of the VBO memory manager in the glVertexPointer call.

It's possible that they moved this e.g. deferred to the next draw call like you said, but changing the order of the pointer calls like they said in the document can only be benificial so I can't think of a reason why not to do it.

N.

Hampel
02-19-2008, 08:53 AM
That makes absolutly sense!

Thanx

Korval
02-19-2008, 11:35 AM
Also, be advised: nVidia does not suggest putting index data in the same buffer object as vertex data.

CatDog
02-19-2008, 12:00 PM
Sorry for this offtopic question, but... where do you guys get this kind of information from?

CatDog

knackered
02-19-2008, 02:05 PM
Also, be advised: nVidia does not suggest putting index data in the same buffer object as vertex data.
Is that even possible? I mean, you allocate a buffer using the buffer-type enum (glBufferData(GL_ARRAY_BUFFER,..,..,..)), so surely that not only dictates the size but also the type of buffer it is?!

-NiCo-
02-19-2008, 02:37 PM
Also, be advised: nVidia does not suggest putting index data in the same buffer object as vertex data.

Indeed, according to the GL_ARB_vertex_buffer_object spec:


Note that it is expected that implementations may have different memory type requirements for efficient storage of indices and vertices. For example, some systems may prefer indices in AGP memory and vertices in video memory, or vice versa; or, on systems where DMA of index data is not supported, index data must be stored in (cacheable) system memory for acceptable performance. As a result, applications are strongly urged to put their models' vertex and index data in separate buffers, to assist drivers in choosing the most efficient locations.




Also, be advised: nVidia does not suggest putting index data in the same buffer object as vertex data.
Is that even possible? I mean, you allocate a buffer using the buffer-type enum (glBufferData(GL_ARRAY_BUFFER,..,..,..)), so surely that not only dictates the size but also the type of buffer it is?!


Again according to the GL_ARB_vertex_buffer_object spec:


Buffer objects created by binding an unused name to ARRAY_BUFFER_ARB and to ELEMENT_ARRAY_BUFFER_ARB are formally equivalent, but the GL may make different choices about storage implementation based on the initial binding. In some cases performance will be optimized by storing indices and array data in separate buffer objects, and by creating those buffer objects with the corresponding binding points.

N.

CatDog
02-19-2008, 03:25 PM
The spec again... point taken, NiCo. :)

CatDog

skynet
02-19-2008, 05:05 PM
For example, some systems may prefer indices in AGP memory and vertices in video memory, or vice versa; or, on systems where DMA of index data is not supported, index data must be stored in (cacheable) system memory for acceptable performance.

I don't think this is true anymore. It was true at the times of GF1/2/3 (maybe 4). I have been using mixed index/vertex VBO's and didn't notice any performance issues with that on GF6/7/8 cards.

knackered
02-19-2008, 05:09 PM
bloody hell, no wonder opengl implementation's are such a mess with that kind of flabby specification.

Leadwerks
02-19-2008, 05:22 PM
On NVidia cards VBOs are always faster than client-side vertex arrays.

On ATI cards, VBOs are only faster if they are rendered in psuedo batches, like this:

-Set the vertex and element pointers
-Draw
-Draw
-Draw...

If a mesh is drawn just once, it is faster to use client-side vertex arrays on the ATI cards I tested this with.

Dark Photon
02-20-2008, 06:19 AM
On NVidia cards VBOs are always faster than client-side vertex arrays.
Sorry, but that's false. I know we both "wish" it were true, but it's not. While your specific usage pattern for VBOs may be faster, some usage patterns are definitely not faster using VBOs on NVidia. And yes, that's with the latest drivers.

I tested this a year or so ago, and just re-tested it again to ensure nothing's changed. It hasn't. In fact, I just halved the frame rate of an old test case (2.6ms -> 4.6ms) merely by enabling VBOs nievely for every batch submitted (all static geometry -- no updates).

However, it would be useful to narrow down which usage pattern you think is always faster so collectively we can verify that. Interleaved vertex attributes in one buffer, sequential attribute lists in one buffer, or separate buffers per vertex attribute? Which attributes and attribute formats? 32-byte vertex data alignment or any padding involved? Indices in separate buffer or same buffer? USHORT or UINT indices? Multiple batches packed in each buffer, or just one? VertexPointer set last or not? Average batch size? etc.

knackered
02-20-2008, 06:56 AM
One day we'll look back at this nonsense and laugh. It's crying out for abstraction, hand it over to the implementation to sort out.

Dark Photon
02-20-2008, 08:08 AM
Man, I sure hope so. Now if we can just unwedge the GL3 spec...

V-man
02-20-2008, 09:16 AM
One day we'll look back at this nonsense and laugh. It's crying out for abstraction, hand it over to the implementation to sort out.

Maybe or maybe not.
There is just too many different things that a developer can do.
There needs to be a a DO and DONT list.

Jan
02-20-2008, 11:53 AM
No, there needs to be an interface, that makes it intuitively clear to anyone, what to DO and what to NOT DO.

The VBO spec is already quite good, it does present few ways to do different things. However the implementations are not really good and since there are 10 other ways to do the same thing (from non-VBO times) IHVs can always tell you "to do it that way for maximum performance" instead of fixing their drivers (and partially the spec).

Jan.

Korval
02-20-2008, 01:05 PM
As I understand it, the problem isn't really with the VBO spec itself, but the way VBOs are shoehorned into the same kind of vertex attribute binding that non-VBOs use. Essentially, the problem is the constant use of gl*Pointer operations, which in some cases require lots of driver backend work.

Also, there's no way to tell whether a format can be reasonably hardware accelerated. The driver basically has to do what you tell it to; if you want unsigned shorts non-normalized, it has to provide that whether directly through hardware or through software. In the non-VBO case, the driver can easily walk through your main memory pointer and convert the data to a hardware format, using an tight loop that takes advantage of write combining to put it into an internal buffer for rendering. In the VBO case, it has to read from VBO memory your vertex data to do this conversion. So while the non-VBO version is faster than the VBO version, it is not faster than the VBO version using the hardware format.

If some combination of vertex buffer usage would force it to do some software processing, there's no way for the implementation to tell you not to use that format.

In short, the problem is that GL's vertex transfer functionality is very underspecified. GL 3.0's interface is an attempt to rectify that, through the use of vertex array objects that define a particular vertex format (who's creation can fail).