PDA

View Full Version : Nvidia 319/320 drivers and shared layout UBOs



malexander
05-21-2013, 03:44 PM
I've run across a change in the Nvidia 319/320 series drivers with regards to std140/packed uniform buffer objects that is causing havoc in our application.



#version 150

layout(shared) uniform scene
{
float A;
float B;
float C;
};

out float result;

void main()
{
result = A + B;
}


In the above shader, when the number of uniforms is queried via glGetActiveUniformBlockiv(... GL_UNIFORM_BLOCK_ACTIVE_UNIFORMS) it is returning "2". This is a change from previous Nvidia drivers where the same code returned "3". While this seems to be in line with the whole notion of active uniforms, it isn't terribly convenient when applied to shared uniform blocks, especially when different shaders may use subsets of the uniforms within the uniform block.

The reason this is causing problems for our application is that it queries all the offsets, names, and sizes of the shared uniform block from the shader and caches them in a C++ object representing the uniform block (backed by a buffer). While I can make modifications to allow piecemeal caching of the various active uniforms' data used by shaders as the object is reused, I'd rather not unless this is truly the intent of the GL spec. The GLSL and GL specs talk mostly about memory layout when discussing shared and std140 uniform blocks, while the OpenGL wiki seems to indicate that all uniforms should be considered active and not optimized out for packed/std140 (http://www.opengl.org/wiki/Interface_Block_%28GLSL%29#Memory_layout).

So, should I be submitting this as a driver bug, or is this proper behaviour?

Alfonse Reinheart
05-21-2013, 04:23 PM
it isn't terribly convenient when applied to shared uniform blocks

This statement is true if you replace the words everything after "it" with "is in violation of the specification."

This is a driver bug.


while the OpenGL wiki seems to indicate that all uniforms should be considered active and not optimized out for packed/std140

No it doesn't. It specifically says that `packed` is allowed to optimize uniforms out; it's shared and std140 that don't. I don't even know why you're mentioning packed and std140 when you not using either.

malexander
05-21-2013, 04:45 PM
No it doesn't. It specifically says that `packed` is allowed to optimize uniforms out; it's shared and std140 that don't. I don't even know why you're mentioning packed and std140 when you not using either.

Type-o -- I mean to say shared/std140.

The uniform block size is constant for shared regardless of which uniforms were used in the shader and the compiler isn't changing the offsets or sizes for the uniforms it is reporting. It isn't "optimizing them out" in terms of shuffling the uniform offsets around; it just isn't reporting the uniforms that were not referenced in the shader, and that is the source of my confusion.

tonyo_au
05-24-2013, 02:31 AM
Why do you need to query locations in a std140 buffer. They are at known offsets from the start of the buffer or am I misunderstanding something.

Alfonse Reinheart
05-24-2013, 05:54 AM
It isn't "optimizing them out" in terms of shuffling the uniform offsets around; it just isn't reporting the uniforms that were not referenced in the shader, and that is the source of my confusion.

I've looked at the spec, and it seems clear that the shared notion is not defined in terms of active uniforms. So the uniforms in a shared or std140 buffer are not considered active, and therefore are not necessarily queriable. But the offsets will always be done as though they were.

That being said as you yourself pointed out this is counterproductive to the whole point of using `shared` to begin with. You need to be able to query all of the uniform offsets from any program. As such, I've filed a bug on that (http://www.khronos.org/bugzilla/show_bug.cgi?id=876) (for all the good it will do).

malexander
05-24-2013, 08:47 AM
Why do you need to query locations in a std140 buffer. They are at known offsets from the start of the buffer or am I misunderstanding something.

My main concern was with shared uniform blocks; std140 just happened to demonstrate this behaviour as well. Since more program information is being defined by shaders these days (like locations), I'd prefer not to have to define the structure twice and keep them synchronized. But I would be fine with std140 not reporting all uniforms, as long as I have an alternative in 'shared' to report everything.


I've looked at the spec, and it seems clear that the shared notion is not defined in terms of active uniforms. So the uniforms in a shared or std140 buffer are not considered active, and therefore are not necessarily queriable. But the offsets will always be done as though they were.

That being said as you yourself pointed out this is counterproductive to the whole point of using `shared` to begin with. You need to be able to query all of the uniform offsets from any program. As such, I've filed a bug on that (http://www.khronos.org/bugzilla/show_bug.cgi?id=876) (for all the good it will do).

Thanks. I've also sent an email to our Nvidia contact, and adjusted our code to be more defensive for this case.

tonyo_au
05-24-2013, 09:17 PM
an alternative in 'shared' to report everything.
I assume you mean includes and I agree. I have setup a template that generates both structures and I have a pseudo include in the shader via a pragma.