Does anyone have any driver knowledge or empirical evidence to suggest that it is more efficent to jam multiple items into the same vertex attribute? For example, instead of using two sets of texture coordinates that each store two values, is it more efficent to use one texture coordinate with four values? I assume the answer is it depends on if the driver pads vertex attributes or packs them. I also don’t know how this works when custom vertex attributes are thrown into the mix.
From what I understand of what you are saying this is more for recent shader models, as on GPUs without instancing of some sort / Geometry shaders I am not sure you can do what you suggest. AFAIK data tied to one vertex is always tied to that vertex and you cannot access data for vertex shaders out of sequence in any way.
I’d love to know if anyone does know a way to do that on earlier shader models. Say pre 8600 and on. (I am aware of TransGaming?'s extension.)
Referring to your example. It’s still moving the same amount of data and I expect it’s all moved in native “system sized” chunks so the difference between four floats, or two sets of two floats is irrelevant. Again, assuming I understand your suggestion.
Where you can save is halfsized data or unsigned bytes / unsigned shorts etc.
4.4.4.1. Not only total number of attributes matters
The important metric is not only the total number of scalar attributes, but a
number of vector attributes used as well. For example, the following have the
same number of scalar attributes, but may not result in the same performance
on GeForce 8 series or later cards.
float4 myData;
and
float3 myDataOne;
float1 myDataTwo;
Full attributes are better for the vertex declaration
I haven’t benchmarked it myself however. I’ll be interested to hear about your results!
ha. A little further down on that page they use the same example that I asked about:
For example, if you are using a pair of texture coordinates, it is better to pack them into a single float4, than using two separate float2 attributes. Almost always vertex position requires just a single float3 value, if you can logically combine it with a separate float value used, do it.