Max uniform matrix array size?

Leadwerks · November 30, 2008, 12:45am

I am trying to find a way to reliably determine the maximum size allowed for a uniform matrix array. I have been looking at MAX_VERTEX_UNIFORM_COMPONENTS_ARB. According to the spec, this returns the maximum number of floats that can be stored in uniform values:

A vertex shader may define one or more "uniform" variables. These values
are to remain constant over a primitive or a sequence of primitives. The
OpenGL Shading Language specification defines a set of built-in uniform
variables for vertex shaders that correspond to the state that GL
manages for the purpose of processing vertices. The amount of storage
that is available for vertex shader uniform variables is specified by
the implementation dependent constant MAX_VERTEX_UNIFORM_COMPONENTS_ARB.
This value represents the number of individual floating point values, or
individual integer values or individual Boolean values that can be held
in uniform variable storage for a vertex shader. A link error will be
generated if an attempt is made to utilize more than the space available
for vertex shader uniform variables.

On my AMD 3870 that value is 512, which would only allow an array of up to 32 matrices. But I know that isn’t right because my animation matrices can be up to 60 matrices on an ATI X1550, which I determined by trial and error.

Is my understanding of MAX_VERTEX_UNIFORM_COMPONENTS_ARB wrong?

Leadwerks · December 6, 2008, 1:13pm

If no one can answer this I will have to disable instanced rendering on AMD hardware.

babis · December 7, 2008, 4:45am

Since the spec says about individual float values, and we know (?) that a single float becomes a vec4 anyway, I’d guess it’s 4x that constant.

That is, if the float->vec4 is also the case with uniforms, which I know it is for varyings.

Lumooja · December 7, 2008, 9:29am

MAX_VERTEX_UNIFORM_COMPONENTS_ARB returns 512 on many cards.
You can have then 512 vec4, or 128 mat4, or 170 4x3.

That’s what I came up from this discussion:
http://www.gamedev.net/community/forums/topic.asp?topic_id=425979

Komat · December 8, 2008, 1:23am

The MAX_VERTEX_UNIFORM_COMPONENTS_ARB definition talks about individual floats. Based on following comment within the specification, one float is intended to be really one float and not vec4.

The state required per program object consists of:
…

An array of MAX_VERTEX_UNIFORM_COMPONENTS_ARB words that holds uniform values.

It would be perfectly correct for the driver to say 512 floats and allow only 128 vec4 values.

On the other hand if the driver can not allocate constants to the uniforms with granularity lower than vec4, each standalone float uniform would consume a vec4. Even if such card could support 2048 floats when all uniforms are in vec4 format, only 512 individual float uniforms would be possible. To be compatible with the specification it must return 512 as value for MAX_VERTEX_UNIFORM_COMPONENTS_ARB. I assume that this is the case with the ATI.

Both cases are imho valid and you can not differentiate between them using the GLSL api only. One thing you can do is to assume that all current and future hw will support some minimal limit and use that (this is what I do). The other thing is to look at the assembly api limits and hope that they are better match.

Leadwerks · December 9, 2008, 6:51pm

The problem is that for instancing, you want to pass the maximum number of matrices to the GPU that you can. You don’t want to settle for some minimum number and retard performance on better cards.

I am just going to stick to using a uniform buffer (NVidia only). You can get the max size allowed, so there is no guessing.

Brolingstanz · December 10, 2008, 10:24am

Have you looked at ARB_instanced_arrays? Looks like this is the long awaited attribute divisor that allows you to stream various per vertex instance data at different rates.

If you’re really pressed for bandwidth you could could send origin, orientation (quat) and uniform scale in 2 vec4s.

Leadwerks · December 10, 2008, 1:07pm

I am already using the right-hand column of the mat4 for instance colors, so my actual matrix data only takes 12 bytes.

I am pressed for bandwidth no matter what, because I always want to get the most instances possible in one draw call.

I will take a look at the instanced_arrays thing.