Instancing without bindable uniforms?

AMD now supports SM4, but does not support the bindable uniform extension. How should the instances matrice be uploaded the the vertex program?

I think a good way to approach this is to abstract the notion of a variable from the artist’s point of view, more or less what the common FX wrappers do. That way you can can update a variable through an “FX” system, but the actual details of the update are hidden and the update itself is deferred until the next draw that depends on the dirty uniforms, and only those uniforms that are dirty are actually updated (unless they’re back by a buffer, in which case it makes little difference). Obviously uniform buffers, however organized in a file, do require some sort of logical update grouping to max perf though (unless it could somehow be implied from usage, semantic or whatnot)…

In short, beats the heck out of me.

Well, according to the spec:

By using the instance ID or multiples thereof as an index into
a uniform array containing transform data, vertex shaders can 
draw multiple instances of an object with a single draw call.

So I guess they expect you to use a conventional uniform array of matrices. I guess I will have to get creative here and make the batch size adjust for the max array size on the hardware. Oh well, bindable uniforms gave me some problems last year on NVidia hardware, and I never really trusted them, even though the problem was fixed.

I’ve had some success using textures in the vertex shader for instancing. e.g. If you have a single matrix per instance use a 2D RGBA float texture indexed by gl_InstanceId, column.

On newer hardware (ie DX10 or better) if vertexes are instancing these matrices and an average index ordered grouping of say 16-64 of these vertexes fetches different matrices, then you might be better off sampling from a texture instead. To my knowledge divergent uniform sampling (otherwise referred to as constant waterfalling) is slow on all hardware.

I’ve had good results having a vertex shader without any inputs, and just using gl_VertexID to sample all data from textures.