Best VBO format

Hi

What´s the most efficient way to store my vertex-data? An interleaved array, or everything separated? And in which order? Vertex/color/normal/tex1,tex2,… or maybe another order?
Maybe someone knows, which format typical drivers like most.

Jan.

Interleaved is supposed to be better because it’s fewer memory transactions per vert.

Order doesn’t really matter (or at least “shouldn’t”).

If you have some parts which are updated every frame (position/normal/binormal) and some parts which aren’t (color/texture0/texture1) then it makes sense to put the frequently-updated ones in one VBO, and the more static ones in another.

Size of the vertex is even more important: if you can get away with using GL_SHORT instead of GL_FLOAT, then do so! Older cards (GeForce2 etc) may have trouble with even smaller sizes, such as GL_BYTE, so don’t go there if you care about older card performance.

If you can store all your data as static data with GL_SHORT format, that’ll be HALF the size of a GL_FLOAT format, and thus could transfer TWICE as fast!

Also, for some cards, aligning your data on a cache line (say, 64 byte aligned) and making the vertex size a power of 2 (say, 16 or 32 bytes) will help a bit, as the card will do less memory fetches per vert.

jwatte - Just to clarify and help me understand things better - why is interleaved vertex data faster than seperate? Which memory transactions are minimized - I have heard this before but I don’t understand the pipeline well enough…

:-)Thanks

The card needs position, normal, texture coordinate, color, and all the other enabled vertex attributes to feed a vertex into the pipeline.

Memory and busses inherently work in blocks of some size (like “cache line” or “DRAM page” or such). Typically, if you need a single byte in a block, it’s (nearly) as expensive as using all the bytes of the block.

If each attribute lives in a different area of memory, then the card needs to read from one memory block per attribute per vertex, which totals a large number of “block” reads. When they all live in the same block for the same vertex, just one block access is needed (or two, if it straddles a boundary).

Note that InterleavedArrays() is not the only (or even the best) way to use an interleaved vertex format; using VertexPointer, NormalPointer, ColorPointer and friends is usually much better, more flexible, and works just well if not better (assuming you get the “stride” argument right :slight_smile:

Exactly where in the process the blocking happens varies; some cards use an actual cache line (just like a CPU cache); others only get the blocking behavior because the underlying DRAM is organized into pages, where switching pages takes some number of cycles of latency.

There is an exception to the rule that interleaved arrays are better. If you have dynamic data that you are putting into the arrays, and the dynamic data does not start out interleaved, then it is probably more efficient to give the card uninterleaved data than it is to interleave it yourself. I have seen performance increases by not interleaving dynamic array data when the original data started out separated on all the hardware I’ve tested.