size and stride in Vertex Array

how do the size&stride parameter in VertexArray setting function affect rendering speed?
for example,
glVertexPointer(3, GL_FLOAT, 0, data);
glVertexPointer(3, GL_FLOAT, 16, data);
glVertexPointer(4, GL_FLOAT, 0, data);
which one is faster?

and i think data alignment also affect speed. should I align data in 16bytes, or other size?

The second is problematic, because the stride is in bytes between consecutive vertices. If you use 4 == 1 * sizeof(float), the only way to use that data would be to use indexed drawing with indices being multiples of 3. That sucks.

The fastest would be the first because it has fewer data than the last and is tightly packed with a stride of 0 == 3 * sizeof(float) == 12.

sorry, i’ve made a mistake. the second stride should be 16, which means 4*sizeof(float). it is corrected now.
I remember there is an article by JC talking about using vertex array in quake3. he said in quake3 position data is 4 float numbers per vertex, with some alignment. he said it is faster using vertex array like that, but didn’t go much into details. i just found my saved link to that article is broken yesterday, that’s why i came up with this question.

The reason alignment by 16 (or other powers of two) is faster has to do more with CPU cache and SIMD alignment requirements, and less to do with API specifics. Whether you pass “0” or sizeof(vert) does not matter (as long as sizeof(vert) is your actual stride, of course).

Given these rules, the first version could be somewhat slower, depending on alignment needs of your hardware. The second two should be equivalent, assuming you have hardware transform (else the possibility of a non-1 “w” might throw you off the fast path). Thus, I’d go for the middle option (and stick, say, diffuse color in the spare space).