I have a compact interleaved vertex format to draw simple objects :

2 bytes x 3 : position x/y/z
2 bytes x 3 : texture u/v/w (w is layer num in texture array)
3 bytes : normal x/y/z
1 byte : pad

It is cache friendly, as it fits nicely in 16 bytes, but recently I read in the Gallium specs that the driver probably rearanges each component (position, texture...) modulo 4 bytes anyway. It means an additional 2 bytes of padding quietly added after my position and texture coordinates, and now 20 bytes/vertex which is not so nice.

Is this padding behaviour a general hardware limitation, or driver specific? Is there a way around, to truely get 16 bytes per vertex in GPU mem for the proposed format, without tricks like moving up the 3rd texture coord into a 4th position coordinate to respect alignement, and moving it back in the vertex shader at a cost?