VBO performance issues ... again !

After the long discussion in the “Batching and VBOs” thread i have come accross another problem regarding vertex buffers.

I have a huge vertex buffer and index buffer and i used it to render a terrain. Below is the approximate polygon fill rate i was getting with different techniques:

  1. Simple vertex buffers: Around 35MTris/s
  2. VBOs: Around 65 - 85 MTris/s

Now i wanted to add normals to the array of vertexes, and i did that by adding a normal to the basic vertex structure. However performance decreased sharply by almost 3 - 4 folds (12MTris/s and 20Mtris/s respectively for cases 1 and 2 explained above) when i used byte normals. I used byte normals in order to conserve memory. I tried a few things but to no avail, and then i changed my normal data to float and suddenly the performance was back to where it should have been. I know that hardware is currently optimized for floating points, but is there a list somewhere that tells exactly what sort of data type is optimized for a particular hardware (ATI and nVidia)? I would be most grateful for any help.

On ATI cards there is severe performance penalty when VBOs are used with elements that are not dword aligned. Store the normal in 4 bytes and ignore the last value.

NVidia cards seems to handle that format natively.

List of formats that are native to ATI cards can be found in the Radeon SDK which can be downloaded from ATI web.

IM using interleaved vertex arrays with VBO’s/
My performance is 1 second to pass through glDrawArrays. Its about 9Megs of data. Without VVBO’s i get 35FPS for the whole app. Anyway do you mean that each element(ie. position, normal, texcoord1,texcoord2,…) should be DWORD aligned or the whole interleaved array for each vertex. I think you mean each element but just checking before I change everyone.

I lived under impression that non-interleaved, separate arrays (glPointer*) are generally faster than interleaved arrays. At least that’s what they’re saying in Game Programming Gems I chapter 4.0 “Optimizing Vertex Submission for OpenGL”.

Not only that, but it should also make the alignment little easier.

Originally posted by Joe Montana:
Anyway do you mean that each element(ie. position, normal, texcoord1,texcoord2,…) should be DWORD aligned or the whole interleaved array for each vertex.
Each element must be DWORD aligned. The following table does contain types, alignment and number of components of the element of that type that are supported by Radeon HW.

Type                  Alignment  Components
GLfloat               32-bit     1,2,3,4
GLushort              32-bit     2,4
GLshort               32-bit     2,4
GLushort (normalized) 32-bit     2,4
GLshort (normalized)  32-bit     2,4 
GLubyte               32-bit     4
GLbyte                32-bit     4
GLubyte (normalized)  32-bit     4
GLbyte (normalized)   32-bit     4

From http://www.ati.com/developer/gdc/PerformanceTuning.pdf

-vertex size ideally multiples of 32 bytes.