sampler-Variables in Uniform-Blocks

I think sampler-variables should be includeable into uniform-blocks/buffers and be mirrored to a C-datatype.
As things are now sampler-variables require special handling and cannot be set in bulk along with other uniforms.

This has been hashed upon before by the way (but I do not have the link ready).

There are multiple issues:

[ul]
[li]GLSL samplers embody two things: source of texture data and how to filter from it (nearest, linear, mipmapping, anisotropic, etc)[/li][li]Samples are opaque things, and to place them into a uniform buffer object requires they have a well defined (and cross-platform) size.[/li][/ul]

Those issues together make it dicey. However there is this: http://www.opengl.org/registry/specs/NV/bindless_texture.txt which is an NVIDIA extension only. In that extension, a sampler variable in GLSL is viewed as a 64-bit handles, and are even construct-able from 64 unsigned ints in GLSL. That extension provides getting (and making resident first is required) such a 64-bit value to feed to GLSL uniforms from a texture or a texture-sampler pair.

But that extension is NVIDIA only, and NVIDIA Kepler at that. I think it is great and a natural evolution to NVIDIA’s extension GL_NV_shader_buffer_load (which is from GeForce 8 series, a GL3/DX10 part). We still have not seen an analogue of GL_NV_shader_buffer_load in GL-core, and I do not think we will because it, analogous to GL_NV_bindless_texture, assumes that a location into a buffer object can be encoded by a 64-bit value.

They are great extensions though, if you can make NVIDIA optimized path, or are on NVIDIA only you cannot go wrong in using them.

Oh come on. It cannot really be that difficult to implement it. Samplers can be set with uniform1i.If the data associated gets looked up when calling gluniform or when a texture-function gets called in the shader cannot make that alot difference.

Samplers can be set with uniform1i.

You assume that this is doing nothing more than any other glUniform1i call: setting a number to a location.

In all likelihood, the driver is doing far more under the hood than this. It recognizes that the uniform location you passed is for an opaque type, and then does special stuff for it.

You should not really think of opaque types as uniforms at all; they’re not memory locations or values in the same way as an int or float. They’re special settings in a program that, thanks to 3D Labs’s poor GLSL design, we talk to as though they were uniforms.

I do not assume anything except that anything that gets done when setting a single sampler could as well be done when updating an uniform-block.

Except that this would be a lot harder. Updating a uniform block is intended to be nothing more than a memcpy. You load some data into a buffer object, and when it comes time for the shader to use it, it copies that data into shader local memory to read from.

Opaque types are not part of “shader local memory.” Even if they were, they would not be an integer as far as the hardware is concerned. They would be some hardware-specific type.

In order for what you want to be possible, the driver can’t simply memcpy data from the buffer into local memory. It must inspect certain bytes (namely, those belonging to opaque types) and change the other settings as appropriate. That’s not going to be any faster than calling glUniform1i on the program, and probably will be slower, since it would have to be done via driver logic. So it would have to read the buffer data from the GPU to the CPU, process it, and upload it in the special registers or whatever that the shader needs.

Personally, my feeling about opaque types is this: you should never be setting them more than once. Sampler X comes from texture unit Y, always and forever. Where available, I would use layout(binding = #) to set it’s value and never touch it from code.

That’s not going to be any faster than calling glUniform1i on the program, and probably will be slower, since it would have to be done via driver logic. So it would have to read the buffer data from the GPU to the CPU, process it, and upload it in the special registers or whatever that the shader needs.

The driver could bulk-copy the data to the gpu, check if opaques were in the block and do the settings if needed. This would require decoupling the uniform-variable from any internal-state vars that might be stored in it right now. Checking if there are opaques in the block is merely a matter of collecting the right data during compilation. This would lead to wasting a few Bytes of Memory in the uniform block.

The driver could bulk-copy the data to the gpu, check if opaques were in the block and do the settings if needed.

Uniform block storage comes from buffer objects. The driver doesn’t know that a particular region of a buffer object will be used for a uniform block until you bind it with glBindBufferRange and then render with it. No, not even binding it to GL_UNIFORM_BUFFER is sufficient, because until you bind it to an indexed target and render, it doesn’t even know what the block layout looks like.

So it can’t catch it at glBufferSubData or glBufferData time. And it certainly can’t catch it during glMapBufferRange time. Remember: after you finish uploading or mapping, the buffer object data is (ideally) on the GPU now. Remember: the copy from the buffer object to the shader local memory is all done in GPU operations: the driver simply issues a command to copy from GPU memory to a spot of shader local memory. Nothing more.

In order to do the “check if opaques” part, you have to do it at render time. Which means you must read the GPU data back onto the CPU, where the opaque checking happens. So it’s not merely wasting a few bytes; you’re wasting valuable GPU bandwidth on a data readback. Or, you’re forcing GL to place any buffer object that is used for uniform blocks in CPU memory, thus hurting the buffer-to-shader copy performance.

In order to do the “check if opaques” part, you have to do it at render time. Which means you must read the GPU data back onto the CPU, where the opaque checking happens.

I do not get the Point.
void glBufferData(…) {
if(bufferBlockBinding()) {
if(opaquesInBlock()) setupStuff();
}
copyToGPU(bufferdata);
}
The gl-implementation certainly keeps track of what a buffer-object is bound to, how it is mapped and so forth.

The gl-implementation certainly keeps track of what a buffer-object is bound to, how it is mapped and so forth.

Yes it does. But that’s far from enough.

Consider this hypothetical uniform block.


uniform Block1
{
  sampler2D tex1;
  vec4 vector;
  sampler1D tex2;
};

Now, let’s say that I do this in my C++ code:


glBindBuffer(GL_UNIFORM_BUFFER, buf);
glBufferData(GL_UNIFORM_BUFFER, size, data, ...);

How does OpenGL know that buf is intended to be used with Block1? It is entirely possible that I have not even compiled the shader that Block1 is in. There is no association between the glBufferData call and the shader I plan to use it with.

Indeed, even if OpenGL knew that Block1 existed as a block definition, and that some Block1 data existed in buf, it doesn’t know where that data is in the buffer. It is entirely possible for me to provide some offset from the beginning of the buffer. All I need to do is provide that same offset to glBindBufferRange when it comes time to use the buffer (and of course, obey the offset requirements for UBOs).

Even worse, I don’t have to bind the buffer to GL_UNIFORM_BUFFER; I can bind it to any other buffer object binding point. Remember: I could render to a GL_TEXTURE_BUFFER, then turn around and use that rendered data as a GL_UNIFORM_BUFFER, and OpenGL is completely fine with that. I could bind one with some atomic counters, and then use the result of that as a uniform buffer. Buffers are not always uniform buffers, the way a 2D texture is always a 2D texture. So in this case, the data may not have ever come from the CPU at all.

There are other cases I could site, but I think you get the point. There is no way you can write your hypothetical opaquesInBlock function. Not at the time that glBufferData is called.

I still do not get the Point. There is a finite set of API-commands that Change the data inside a uniform-block. There is a finite set of commands that change buffer-data. One could easily prevent writes to a buffer bound to a uniform block containing opaques or deal with those cases - be it with loss of Performance for any readbacks that may be required.

Once again:

  1. At the time you modify your buffer data (e.g. using glBufferSubData or though a mapped pointer) the driver doesn’t know which uniform block you’ll bind your buffer to thus it cannot know which buffer addresses will actually represent a sampler.
  2. A sampler variable is an opaque type. You do set it to the texture unit index using glUniform*, but a texture unit index is an API concept and the actual data that a sampler variable holds can vary from hardware to hardware and it might be way bigger data than one that fits into a 32 bit integer.
  3. What happens if the buffer is written by the GPU using image stores, shader storage block writes or transform feedback and then it is immediately used as a uniform buffer afterwards. How could you “patch” the sampler values then? Should the CPU wait on the first pass being done on the GPU, parse the buffer, patch it and then start the second step on the GPU? That would be horribly inefficient.

I think the main point you have missed is the fundamental point I made: what a sampler is and how a GPU accesses it is completely determined by a GPU. What I think you see is this: a sampler object is an integer, that integer holds what texture unit to use. What texture to use and how it is sampled is “stored” by the texture unit. That is the interface that GL exposes, but that may or may not be at all what happens inside an implementation. The only guarantee one happens is that when one calls glUniform1i passing the uniform location of a sampler is that a GL implementation will make sure that the data used is whatever texture is bound at the named unit at call time and filtered as according to what is bound to that texture unit. Internally, except for the driver to track, what happens is likely nothing until the actual draw call. When a draw call is finally issued, a GL implementation likely then looks at what is bound to the named texture unit and sets the GPU state for all of those goodies. The best analogy I can give you is this:

On CPU (not GPU), there is an array, indexed by texture unit, storing what data and how to filter that data.
On CPU (not GPU), as part of program state, each sampler stores an index into that array.
On CPU (not GPU), a GL implementation then looks at that index, and sets GPU state by the values of that array. In addition, it likely also does additional work to make sure the data for the texture is resident in VRAM.

What you are thinking, I think, is that the array is stored on GPU and the GPU architecture is flexible enough to look at that array. That may or may not be the case at all. Indeed, even the NVIDIA extension requires one to -by hand- make sure the data is resident, but beyond that it just wants a 64-bit address to be happy.

Lets go one wild, ugly step further. Suppose that a GPU’s architecture does NOT have a dedicated discreet piece of hardware to do filtering. Suppose that the filtering is done by doing stuff to the assembly of the shader? Such a GPU is still ok for GL, since the driver would then for each shader store a map of shaders keyed by texture filtering. This might sound wild, but it may not be totally wild. Indeed as a related, but not really, an example, NVIDIA Tegra2 adds additional shader code to a fragment shader based upon blending state.

I do not think I’ve missed the Point. I just don’t care about implementation-issues that come up when making decisions about how I think the spec should look like.
All that ever again is come up with is implementation issues that read like:
“You cannot do that when calling glBufferData, you would have to do it when calling glBindBuffer as well. So it is impossible.” I don’t write the specs. I do not implement opengl. I make suggestions.
The Argument is simple:

  • Samplers can be set by a number as texture Units are enumerable -> uniform1i
  • Data buffers can be watched for changes: One can enumerate the possibilites that can change buffer Contents.
  • Buffer-bindings are well defined: it is known where which part of the buffer is bound to.
    From this it follows that an integer representing a sampler can get fiddled out of the buffer either if it’s data changes or if it gets bound.

As for Performance issues: an api does not Need to prevent conditions in which an Operation would be slow by making the Operation impossible. If a buffer bound to a uniform-block is written to, that is a bad practice. Who would bind a uniform-block-bound-buffer containing opaques to an image-unit and render to it expecting full performace operation? That might be possible. That might be impossible. I do not know about the hardware-Details. One could even prevent using the same buffer in incompatible contexts if needed. That would result in an error either if binding the buffer to an uniform-block with opaques or as Memory-pool for an image, as pixel-(un)-pack buffer and so on. That are Details I’m not concerned with. What I’m concerned with by making this suggestion is that I think the api is missing some Point that makes it less practicable to use in certain use-cases.

Buffer-bindings are well defined: it is known where which part of the buffer is bound to.

Consider this:


GLuint uniformBuffer;
glGenBuffers(1, &uniformBuffer);
glBindBuffer(GL_UNIFORM_BUFFER, uniformBuffer);

GLuint bufferData = 5; //Use texture image unit 5.
glBufferData(GL_UNIFORM_BUFFER, sizeof(bufferData), bufferData, GL_STATIC_DRAW);

This is the only thing OpenGL sees. How is OpenGL to know, at the time glBufferData is called, which uniform block this buffer is going to be used with? There is no glCompileShader, glLinkProgram, glUseProgram, or similar function in this code. This code is perfectly legal to call before any shaders have been compiled. There is no uniform block yet. So how does OpenGL know, from this code alone, that this 4-byte block should be interpreted as a sampler uniform?

OpenGL simply has no way of knowing that any particular upload of data to a buffer object is destined for any particular uniform block. And without that knowledge, OpenGL cannot determine at the time data is uploaded what is and is not opaque.

If a buffer bound to a uniform-block is written to, that is a bad practice.

That’s the part you don’t understand: a buffer is never bound to a uniform block. The association between a uniform block and a buffer object is implicit. It’s done by separate state, one part in the context, and one part in the program. Without both, OpenGL has no idea how a buffer object will be used. And until you actually render, OpenGL can’t be sure that any particular buffer binding state is not merely temporary.

And there’s nothing in the API that requires the use of a program when uploading data to a buffer object. Without that knowledge, there is no way to understand what particular data means.

an api does not Need to prevent conditions in which an Operation would be slow by making the Operation impossible.

If you can do something, then that something should be reasonably fast. And if it’s not possible to make something reasonably fast, then the user shouldn’t be able to do it at all. That’s good API design.

To do otherwise creates performance traps for the user, where simple and obvious uses of the API are terribly slow without any warning. That makes the API harder to use for no real benefit; users have to have some arbitrary knowledge, outside of what a function does, to know what is the proper way to use the API.

OpenGL already has too many of these performance traps as it is (especially around buffer objects); it doesn’t need more of them.

That are Details I’m not concerned with. What I’m concerned with by making this suggestion is that I think the api is missing some Point that makes it less practicable to use in certain use-cases.

In short: you want the feature, and you don’t care if it’s actually possible to implement, or what the performance implications of implementing it will be, or how it will affect the useability of the API. Fortunately, the ARB does care about these things, which is why it doesn’t exist and won’t in the near future. At least, not this way.

As I said earlier on, it’s best that you not change opaque uniform settings to begin with. You should set them once, and leave them that way; this is how most code is written. It’s easier to bind a texture to the right texture image unit than to change which unit is used by a shader. So even if this were done, only a fraction of users would actually need it. Bindless texturing is mostly about eliminating the glBindTexture overhead, as well as potentially determining which texture to use in the shader itself. So even users of that aren’t using it for the reasons you’re talking about.

That’s the part you don’t understand: a buffer is never bound to a uniform block. The association between a uniform block and a buffer object is implicit.

As I understand the spec blockBindings say:Get the data from that buffer if needed. So - if a block-binding gets established after the first code-fragment, it can be checked if the block contains uniforms that require special handling. That is: the opaque types. This takes place before a shader can use the data contained in the block. The other way around it is the same: If a block binding has been established and bufferData() is called to update data - that can be checked. Once again, before any shader uses the data.
You are concerned with the internal data-type of the opaque i guess. That is INTERNAL data and could be moved to a special memory-location that the user does not even know of - existentially.
thought of as
block {
sampler2D s; // this would be an integer to the c-interface and does not get used at all by the gpu
};
hidden_block_of_data: for example the nvidia 64 bit unsigneds
So if it gets requested that data for block should be pulled from a buffer everything stays the way it is except that the hardware-state gets updated if necessary and the hidden-block gets updated with data.
I could call glGetBufferSubData( offsetof(s), &s); glUniform1i(hidden_block_location, s) whenever I establish a blockBinding or update the buffer data.

EDIT:

Fortunately, the ARB does care about these things,

You seem to be quite familar with the ARB. Do you have a seat?

EDIT:

If you can do something, then that something should be reasonably fast. And if it’s not possible to make something reasonably fast, then the user shouldn’t be able to do it at all. That’s good API design.

That is API-design that expects that the user does not know what he is doing.
If uniformBlocks with opaques needed to be handled differently then that could be done.
Describing that would take one or two extra-lines in the spec.

Could be done as follows: “If a block-binding is established or the contents of a buffer bound to an uniform block is updated any objects with an opaque data-types are made effective instantly, not just when a shader reads data from the block. Buffers bound to uniform-blocks containing opaque types cannot be bound to <whatever-can-be-modified-by-shaders> at the same time.”

OK, I’m going to say this one more time, and then I’m done:

There is no such thing as a “buffer bound to a uniform block.” Buffers are not bound to uniform blocks. They are bound to the context, to indices in the GL_UNIFORM_BUFFER binding point. Programs are bound to the context. The association between bound buffer objects and program uniform blocks is implicit. Uniform blocks in programs reference an index in the GL_UNIFORM_BUFFER binding point.

Therefore, the only time OpenGL knows when a buffer object is unquestionably to be used for a specific uniform block is when you render. And never before that point. Therefore:

1: There is no way to detect when this happens at data upload time. You can’t catch opaque indices and convert them into something else at the time the user uploads data to the buffer. So your glBufferData intercept stuff is out.

2: Detecting this at render time means either storing the buffer in CPU-accessible memory or doing costly GPU-CPU readbacks when you render. Either way, you’re losing performance. Guaranteed.

In short, there is no way to make this anywhere nearly as fast as just using glUniform1i. So what exactly the point?

Or, to put it another way, what is the compelling use case for this feature besides “I want to do it?”

You seem to be quite familar with the ARB. Do you have a seat?

No, but I saw what they dropped with GL 3.1. And I’ve seen what they added. And, generally speaking, the modern ARB doesn’t add APIs that can be misused easily.

That is API-design that expects that the user does not know what he is doing.

No, this API design expects that the user doesn’t have magical, unspecified knowledge of what happens to be fast and what happens to be slow.

In short, there is no way to make this anywhere nearly as fast as just using glUniform1i. So what exactly the point?

The point is a use case where the performance-hit due to a readback would be neglicable. That is where the ease of use and genericity would have priority. For example if initializing all state-variables of a shader.
One can define uniform-blocks, do a little preprocessing and get a c-structure that can be used to mirror an uniform-block.
Everything works fine until you get to the opaque types. That means that one cannot define one data-block per shader, mirror it’s data and simply upload the whole block whenever the shader/program gets bound. Same thing with the offset-alignment requirements for bind buffer range. They make things near unusable. Consider the following


VertexShaderVariables{
//...
}
FragmentShaderVariables{
//...
}

Each of those can be mirrored with c-structures. But when one then tries to


struct ProgramVariables {
   struct VertexShaderVariables vsVars;
   struct FragmentShaderVariables fsVars;
};

one cannot simply create one buffer and use bindbufferrange to map the sub-structures to uniform-blocks. Just because of the alignment-requirements. That sucks. One could easily live with that readbacks can occur and so on which would lead to the conclusion that updating particular variables is faster than replacing whole program states. The offset alignment requirement should be hidden from the user. The implementation should split the buffer as needed should that be necessary. The user should not have to care about such things.
Would the api define things that way,maybe the hardware vendors would organize things in a way that reduces any performance hits. The difference between hardware and software is not that big when it comes to adapting to the needs present.

Or, to put it another way, what is the compelling use case for this feature besides "I want to do it?"

I had to look this up. Of course there is none. I’m a user of OpenGL - not am implementor, But - would I have to decide which api to use for my next projects I’ll have a very close look on D3D as an alternative because of such things- and that despte of the fact that I have a linux-background which means preferring to write things easily portable. Of course that decision will be made in half-knowledge as I guess the pitfalls and limitations will crop up during implementation. But I guess as d3d is not an open-standard like opengl it does not hinge behind because the specs may only contain common-ground,that is, things that all vendors see as “no problem”. If that means that directly rendering to program-variable blocks is impossible i have no problem with that. I don’t like extensions from particular vendors as the functionality is missing on other hardware then. Using d3d hardware not supporting that stuff disappears from the market because things get slow because of software-emulation. At least I guess so.

Everything works fine until you get to the opaque types. That means that one cannot define one data-block per shader, mirror it’s data and simply upload the whole block whenever the shader/program gets bound.

Then stop pretending that opaque types are uniforms like any others. Don’t set them every time a shader is bound. Set them once, during initialization. Leave them set to those values.

Then you can change all the uniforms you want on a per-object basis without incident. You shouldn’t need to be changing texture image units and such.

That’s exactly why we have layout(binding) syntax; so that we can set these things in the shader and not have to ever set them in our code.

Your mistake is wanting to set these uniforms at all.

one cannot simply create one buffer and use bindbufferrange to map the sub-structures to uniform-blocks. Just because of the alignment-requirements.

Sure you can; you just can’t do it that way. You can have each block’s data in the same buffer, but you can’t do it by putting them all in one struct. You have to manually put them into a buffer.

Just because you can’t do it the way you want doesn’t mean it can’t be done.

One could write some generic code that would take an arbitrary boost::tuple of structs and create or update a buffer object based on them. C++11 makes this rather much easier with variadic templates, though Boost.Fusion makes it possible on pre-variadic compilers. It’s not too difficult to do; just time-consuming to write.

The offset alignment requirement should be hidden from the user. The implementation should split the buffer as needed should that be necessary. The user should not have to care about such things.

The reason the offset alignment is exposed is to ensure maximum performance. It gives implementations the freedom to do things the fastest way possible, which requires imposing upon users that they do things a certain way. What you want makes things slower.