I am trying to find a way to reliably determine the maximum size allowed for a uniform matrix array. I have been looking at MAX_VERTEX_UNIFORM_COMPONENTS_ARB. According to the spec, this returns the maximum number of floats that can be stored in uniform values:
A vertex shader may define one or more "uniform" variables. These values
are to remain constant over a primitive or a sequence of primitives. The
OpenGL Shading Language specification defines a set of built-in uniform
variables for vertex shaders that correspond to the state that GL
manages for the purpose of processing vertices. The amount of storage
that is available for vertex shader uniform variables is specified by
the implementation dependent constant MAX_VERTEX_UNIFORM_COMPONENTS_ARB.
This value represents the number of individual floating point values, or
individual integer values or individual Boolean values that can be held
in uniform variable storage for a vertex shader. A link error will be
generated if an attempt is made to utilize more than the space available
for vertex shader uniform variables.
On my AMD 3870 that value is 512, which would only allow an array of up to 32 matrices. But I know that isn’t right because my animation matrices can be up to 60 matrices on an ATI X1550, which I determined by trial and error.
Is my understanding of MAX_VERTEX_UNIFORM_COMPONENTS_ARB wrong?
The MAX_VERTEX_UNIFORM_COMPONENTS_ARB definition talks about individual floats. Based on following comment within the specification, one float is intended to be really one float and not vec4.
The state required per program object consists of:
…
An array of MAX_VERTEX_UNIFORM_COMPONENTS_ARB words that holds uniform values.
It would be perfectly correct for the driver to say 512 floats and allow only 128 vec4 values.
On the other hand if the driver can not allocate constants to the uniforms with granularity lower than vec4, each standalone float uniform would consume a vec4. Even if such card could support 2048 floats when all uniforms are in vec4 format, only 512 individual float uniforms would be possible. To be compatible with the specification it must return 512 as value for MAX_VERTEX_UNIFORM_COMPONENTS_ARB. I assume that this is the case with the ATI.
Both cases are imho valid and you can not differentiate between them using the GLSL api only. One thing you can do is to assume that all current and future hw will support some minimal limit and use that (this is what I do). The other thing is to look at the assembly api limits and hope that they are better match.
The problem is that for instancing, you want to pass the maximum number of matrices to the GPU that you can. You don’t want to settle for some minimum number and retard performance on better cards.
I am just going to stick to using a uniform buffer (NVidia only). You can get the max size allowed, so there is no guessing.
Have you looked at ARB_instanced_arrays? Looks like this is the long awaited attribute divisor that allows you to stream various per vertex instance data at different rates.
If you’re really pressed for bandwidth you could could send origin, orientation (quat) and uniform scale in 2 vec4s.