Aaaahhhh... the things we've done before for speed, particularly pre-DrawInstanced.Originally Posted by Alfonse Reinheart
I agree with your first sentence, but your second confused me. This is garden variety vertex stream frequency dividers back from 2006 (see this PDF starting at pg 31), forged from the D3D "Oh crap! Our batch calls are 'so' expensive!" realization.Originally Posted by Groovounet
I believe the typical ARB_instanced_arrays use case works like this: vertex attributes can represent one of two things:
On the latter set, you set glVertexAttribDivisor to 1.
- those that "repeat" per instance (e.g. position, normal, texcoord0, etc.)
- those that are "constant" per instance (e.g. an offset vector in texcoord1, a rotation quaternion in texcoord2, etc.)
Think of the latter set as those values you'd store in a texture buffer and "look up" using gl_InstanceID with glDrawInstanced now. Let's call this "instance data".
Seems to me, the nice thing about this approach is it streams the instance data to the GPU (a push) as needed along with the instance definition, rather than having every single vertex in every single instance of your object bang on 1-N texture fetches from some potentially random (from the GPU's perspective) piece of a texture buffer (a bunch of pulls, albeit cached). Also, this gets rid of the need for texture buffer subload and bind "state changes" between instancing batches using the same material. Not only that, with the instance data now in VBOs, you ideally can bypass the setup overhead using bindless (had to tie that back in somehow ) That said, I haven't actually done a performance face-off between ARB_draw_instanced and ARB_instanced_arrays yet.