a good compact vertex format?

I have a compact interleaved vertex format to draw simple objects :

2 bytes x 3 : position x/y/z
2 bytes x 3 : texture u/v/w (w is layer num in texture array)
3 bytes : normal x/y/z
1 byte : pad

It is cache friendly, as it fits nicely in 16 bytes, but recently I read in the Gallium specs that the driver probably rearanges each component (position, texture…) modulo 4 bytes anyway. It means an additional 2 bytes of padding quietly added after my position and texture coordinates, and now 20 bytes/vertex which is not so nice.

Is this padding behaviour a general hardware limitation, or driver specific? Is there a way around, to truely get 16 bytes per vertex in GPU mem for the proposed format, without tricks like moving up the 3rd texture coord into a 4th position coordinate to respect alignement, and moving it back in the vertex shader at a cost?

I read in the Gallium specs that the driver probably rearanges each component (position, texture…) modulo 4 bytes anyway. It means an additional 2 bytes of padding quietly added after my position and texture coordinates

How can it “add” this space? You control the vertex format with glVertexAttribPointer. You decide how much space things take up. The hardware and driver cannot rearrange the data in your buffer object. It will work with whatever you provide; your concern should be that it might be slower to process than a 20-byte representation.

There has been some suggestion that having each component start on a 4-byte boundary is a good thing for attribute reading performance. However, the last time I read this suggestion was many years ago, in the early D3D-10-era days. So I have no idea how true this is for modern hardware.

Basically, you’re going to have to profile. Remember: vertex formats can sometimes involve be memory/performance tradeoffs. Sometimes, a larger format is faster than a shorter one.

without tricks like moving up the 3rd texture coord into a 4th position coordinate to respect alignement, and moving it back in the vertex shader at a cost?

Well, how much do you want this performance/memory optimization? Because if you want to follow the 4-byte boundary rule with no padding, and fitting into 16 bytes, there isn’t a way to do it without using this method. Not without shrinking a component somewhere.

Thanks for the answer.
I don’t know why, I cannot post the link, but a quick search of Avoid Misaligned Vertex Data Apple, shows that at least some major Apple devices are doing attribute padding on 4 bytes. The driver functions dealing with VBOs would have to all be padding aware. I will profile but it would be interesting to know if that practice is general these days or not.

Avoid Misaligned Vertex Data Apple

This seems much more relevent on moble devices where vertex coordinates might be in 2 byte size. On desktops a coordinate is much more likely to be a float and so is implicitly 4 byte aligned.