Hardware-specific: vertex_attrib_64-bit and double-resources

The GL spec on double-precision attributes is odd. It says that all double and dvec* types use up only one attribute index. But when it comes time to count resources, the driver may count dvec3 and dev4 attributes twice.

I’m curious: on what hardware do these attributes count twice? I don’t have access to any GL 4.x hardware, so I’d be interested to know if it’s an AMD or NVIDIA thing, so I know who to blame for yet another OpenGL WTF moment.

I have an ATI and nVidia card that claims to support double precision (but I have not tried). Tell me what to do and I will test it for you

Here’s what I get with this vertex shader:

#version 420

in dvec4 P;
in dvec3 N;
in dvec2 uv;
in double Alpha;
in vec3 Cd;
in float z;

void main()
{
    gl_Position = vec4( P + vec4(N, z) + vec4(uv, Alpha, Cd.r));
}

I’m printing out the attribute name with its location as determined by glGetAttribLocation().

GEForce 670, driver 304.15.00.02, Ubuntu 10.10 64b:

Alpha 3
Cd    4
N     1
P     0
uv    2
z     5


AMD FirePro W8000, driver 9.982.8.1, W7 64b:

Alpha 0
Cd    1
N     2
P     4
uv    6
z     7

Nvidia appears to only need 1 attribute location for a dvec4 and dvec3, while AMD needs 2. I was a little curious why it wasn’t specified in the spec when I first read it, since matrix locations were well-defined, and I guess this is the reason. Still it’d be nice to have a glGet for GL_NUM_DVEC4_ATTRIBUTE_LOCATIONS (and DVEC3, 2, DOUBLE, DMAT4, etc). This may also be the reason why Nvidia has 16 attribute locations and AMD has 29 for the cards listed above. I’ve seen 20 attribute locations on other AMD cards.

It’s not the number of attribute locations (though that’s rather disturbing that AMD’s drivers will naturally allocate two slots for them when the spec clearly says that they always get one). It’s about determining when the card is out of resources.

The language is different in the GL_ARB_vertex_attrib_fp64 extension, though:

Additionally, some vertex shader inputs using the wider 64-bit components may count double against the implementation-dependent limit on the number of vertex shader attribute vectors. A 64-bit scalar or a two-component vector consumes only a single generic vertex attribute; three- and four-component “long” may count as two. This approach is similar to the one used in the current GL where matrix attributes consume multiple attributes.

It isn’t the same in the GL4.3 spec, which is probably explains the difference. Weren’t the ARB extensions supposed to be subsets of the GL core features from GL3.3/4.0 onward?

I agree with the spec, though. A dvec4 should count as a single attribute location – otherwise, how do you set the attribute location in the shader in a cross platform way? The only double-precision attribute I’d considering using would be for position. With the compatibility profile, attribute location zero must have an array bound, so that’s where I’d put P since for my shaders it’s more likely that the other attributes could be constant. This means I either need to skip location 1, wasting an attribute on Nvidia cards, or set up defines for all other attribute locations. Not good.

Perhaps the ARB just expects that you’ll figure out that you have too much attribute storage when your shader fails to link.

That non-normative descriptive text. Remember: the overview doesn’t actually mean anything; what matters is what the spec says. The actual spec change noted in that extension reads identical to what’s in the GL 4.3 spec:

Emphasis added. They only use 1 attribute index but may count as consuming twice as many attributes.

That kind of backwards reasoning smacks of some kind of idiotic compromise among the ARB members. Like maybe AMD wanted them to take 2 attributes, but NVIDIA refused to limit their hardware to doing that, so they did this which is worse in every way than either of the alternatives.

Just give us a simple hardware query: GL_ATTRIB_64BIT_LARGE. Was that so much to ask?

I think the source of the confusion lies in the comment, “This approach is similar to the one used in the current GL where matrix attributes consume multiple attributes”. Doubles may consume more storage than their 32b counterparts, but by the description later on they are not supposed to use a similar approach to matrices (multiple attribute locations). By making a false analogy, the extension just muddies the waters.

Just give us a simple hardware query: GL_ATTRIB_64BIT_LARGE. Was that so much to ask?

Well, except that double and dvec2 only appear to consume one 128b register, while dvec3 and dvec4 consume two (at least, that’s what the AMD results appear to indicate). I think you’d need something a bit more like the internalformat query for buffers, where you pass your buffer format and get back the machine-unit storage it consumes (GL_ATTRIB_64BIT_ATTRIB_SIZE). Then you’d need to know how many machine-units of attribute storage the hardware has available, which could be a simple glGet on GL_ATTRIB_STORAGE_SIZE.

I’ve been very reluctant to offer a 64b vertex shader, partially because of the abysmal 64b performance of some cards, and partially because between my position and instance matrix (dvec3, dmat3x4), it’d consume 8 attributes’ worth of storage, leaving my other attributes in a bit of a space crunch. However, if AMD has room for 29 32b vec4s and doubles count as 2, and Nvidia supports more than 8 dvec4 attributes’ worth of storage, it might be worth revisiting. I’d like to have a more concrete method for divining the available storage than creating a series of dummy shaders, though.

Edit: With the GEForce 670, I can bind one dvec4 and 3 dmat4’s. If I add one more dmat4, it fails to link.