PDA

View Full Version : sampler-Variables in Uniform-Blocks



hlewin
01-28-2013, 07:28 AM
I think sampler-variables should be includeable into uniform-blocks/buffers and be mirrored to a C-datatype.
As things are now sampler-variables require special handling and cannot be set in bulk along with other uniforms.

kRogue
01-28-2013, 01:45 PM
This has been hashed upon before by the way (but I do not have the link ready).

There are multiple issues:

GLSL samplers embody two things: source of texture data and how to filter from it (nearest, linear, mipmapping, anisotropic, etc)
Samples are opaque things, and to place them into a uniform buffer object requires they have a well defined (and cross-platform) size.


Those issues together make it dicey. However there is this: http://www.opengl.org/registry/specs/NV/bindless_texture.txt which is an NVIDIA extension only. In that extension, a sampler variable in GLSL is viewed as a 64-bit handles, and are even construct-able from 64 unsigned ints in GLSL. That extension provides getting (and making resident first is required) such a 64-bit value to feed to GLSL uniforms from a texture or a texture-sampler pair.

But that extension is NVIDIA only, and NVIDIA Kepler at that. I think it is great and a natural evolution to NVIDIA's extension GL_NV_shader_buffer_load (which is from GeForce 8 series, a GL3/DX10 part). We still have not seen an analogue of GL_NV_shader_buffer_load in GL-core, and I do not think we will because it, analogous to GL_NV_bindless_texture, assumes that a location into a buffer object can be encoded by a 64-bit value.

They are great extensions though, if you can make NVIDIA optimized path, or are on NVIDIA only you cannot go wrong in using them.

hlewin
01-28-2013, 02:32 PM
Oh come on. It cannot really be that difficult to implement it. Samplers can be set with uniform1i.If the data associated gets looked up when calling gluniform or when a texture-function gets called in the shader cannot make that alot difference.

Alfonse Reinheart
01-28-2013, 06:52 PM
Samplers can be set with uniform1i.

You assume that this is doing nothing more than any other glUniform1i call: setting a number to a location.

In all likelihood, the driver is doing far more under the hood than this. It recognizes that the uniform location you passed is for an opaque type, and then does special stuff for it.

You should not really think of opaque types as uniforms at all; they're not memory locations or values in the same way as an `int` or `float`. They're special settings in a program that, thanks to 3D Labs's poor GLSL design, we talk to as though they were uniforms.

hlewin
01-29-2013, 02:40 AM
I do not assume anything except that anything that gets done when setting a single sampler could as well be done when updating an uniform-block.

Alfonse Reinheart
01-29-2013, 04:52 AM
Except that this would be a lot harder. Updating a uniform block is intended to be nothing more than a memcpy. You load some data into a buffer object, and when it comes time for the shader to use it, it copies that data into shader local memory to read from.

Opaque types are not part of "shader local memory." Even if they were, they would not be an integer as far as the hardware is concerned. They would be some hardware-specific type.

In order for what you want to be possible, the driver can't simply memcpy data from the buffer into local memory. It must inspect certain bytes (namely, those belonging to opaque types) and change the other settings as appropriate. That's not going to be any faster than calling glUniform1i on the program, and probably will be slower, since it would have to be done via driver logic. So it would have to read the buffer data from the GPU to the CPU, process it, and upload it in the special registers or whatever that the shader needs.

Personally, my feeling about opaque types is this: you should never be setting them more than once. Sampler X comes from texture unit Y, always and forever. Where available, I would use layout(binding = #) to set it's value and never touch it from code.

hlewin
01-29-2013, 06:03 AM
That's not going to be any faster than calling glUniform1i on the program, and probably will be slower, since it would have to be done via driver logic. So it would have to read the buffer data from the GPU to the CPU, process it, and upload it in the special registers or whatever that the shader needs.
The driver could bulk-copy the data to the gpu, check if opaques were in the block and do the settings if needed. This would require decoupling the uniform-variable from any internal-state vars that might be stored in it right now. Checking if there are opaques in the block is merely a matter of collecting the right data during compilation. This would lead to wasting a few Bytes of Memory in the uniform block.

Alfonse Reinheart
01-29-2013, 06:14 AM
The driver could bulk-copy the data to the gpu, check if opaques were in the block and do the settings if needed.

Uniform block storage comes from buffer objects. The driver doesn't know that a particular region of a buffer object will be used for a uniform block until you bind it with glBindBufferRange and then render with it. No, not even binding it to GL_UNIFORM_BUFFER is sufficient, because until you bind it to an indexed target and render, it doesn't even know what the block layout looks like.

So it can't catch it at glBufferSubData or glBufferData time. And it certainly can't catch it during glMapBufferRange time. Remember: after you finish uploading or mapping, the buffer object data is (ideally) on the GPU now. Remember: the copy from the buffer object to the shader local memory is all done in GPU operations: the driver simply issues a command to copy from GPU memory to a spot of shader local memory. Nothing more.

In order to do the "check if opaques" part, you have to do it at render time. Which means you must read the GPU data back onto the CPU, where the opaque checking happens. So it's not merely wasting a few bytes; you're wasting valuable GPU bandwidth on a data readback. Or, you're forcing GL to place any buffer object that is used for uniform blocks in CPU memory, thus hurting the buffer-to-shader copy performance.

hlewin
01-29-2013, 07:14 AM
In order to do the "check if opaques" part, you have to do it at render time. Which means you must read the GPU data back onto the CPU, where the opaque checking happens.
I do not get the Point.
void glBufferData(...) {
if(bufferBlockBinding()) {
if(opaquesInBlock()) setupStuff();
}
copyToGPU(bufferdata);
}
The gl-implementation certainly keeps track of what a buffer-object is bound to, how it is mapped and so forth.

Alfonse Reinheart
01-29-2013, 07:39 AM
The gl-implementation certainly keeps track of what a buffer-object is bound to, how it is mapped and so forth.

Yes it does. But that's far from enough.

Consider this hypothetical uniform block.



uniform Block1
{
sampler2D tex1;
vec4 vector;
sampler1D tex2;
};


Now, let's say that I do this in my C++ code:



glBindBuffer(GL_UNIFORM_BUFFER, buf);
glBufferData(GL_UNIFORM_BUFFER, size, data, ...);


How does OpenGL know that `buf` is intended to be used with `Block1`? It is entirely possible that I have not even compiled the shader that `Block1` is in. There is no association between the glBufferData call and the shader I plan to use it with.

Indeed, even if OpenGL knew that `Block1` existed as a block definition, and that some `Block1` data existed in `buf`, it doesn't know where that data is in the buffer. It is entirely possible for me to provide some offset from the beginning of the buffer. All I need to do is provide that same offset to glBindBufferRange when it comes time to use the buffer (and of course, obey the offset requirements for UBOs (http://www.opengl.org/wiki/Uniform_Buffer_Object#Limitations)).

Even worse, I don't have to bind the buffer to GL_UNIFORM_BUFFER; I can bind it to any other buffer object binding point. Remember: I could render to a GL_TEXTURE_BUFFER, then turn around and use that rendered data as a GL_UNIFORM_BUFFER, and OpenGL is completely fine with that. I could bind one with some atomic counters, and then use the result of that as a uniform buffer. Buffers are not always uniform buffers, the way a 2D texture is always a 2D texture. So in this case, the data may not have ever come from the CPU at all.

There are other cases I could site, but I think you get the point. There is no way you can write your hypothetical `opaquesInBlock` function. Not at the time that glBufferData is called.

hlewin
01-29-2013, 08:36 AM
I still do not get the Point. There is a finite set of API-commands that Change the data inside a uniform-block. There is a finite set of commands that change buffer-data. One could easily prevent writes to a buffer bound to a uniform block containing opaques or deal with those cases - be it with loss of Performance for any readbacks that may be required.

aqnuep
01-29-2013, 05:24 PM
Once again:

1. At the time you modify your buffer data (e.g. using glBufferSubData or though a mapped pointer) the driver doesn't know which uniform block you'll bind your buffer to thus it cannot know which buffer addresses will actually represent a sampler.
2. A sampler variable is an opaque type. You do set it to the texture unit index using glUniform*, but a texture unit index is an API concept and the actual data that a sampler variable holds can vary from hardware to hardware and it might be way bigger data than one that fits into a 32 bit integer.
3. What happens if the buffer is written by the GPU using image stores, shader storage block writes or transform feedback and then it is immediately used as a uniform buffer afterwards. How could you "patch" the sampler values then? Should the CPU wait on the first pass being done on the GPU, parse the buffer, patch it and then start the second step on the GPU? That would be horribly inefficient.

kRogue
01-30-2013, 04:01 AM
I still do not get the Point. There is a finite set of API-commands that Change the data inside a uniform-block. There is a finite set of commands that change buffer-data. One could easily prevent writes to a buffer bound to a uniform block containing opaques or deal with those cases - be it with loss of Performance for any readbacks that may be required.

I think the main point you have missed is the fundamental point I made: what a sampler is and how a GPU accesses it is completely determined by a GPU. What I think you see is this: a sampler object is an integer, that integer holds what texture unit to use. What texture to use and how it is sampled is "stored" by the texture unit. That is the interface that GL exposes, but that may or may not be at all what happens inside an implementation. The only guarantee one happens is that when one calls glUniform1i passing the uniform location of a sampler is that a GL implementation will make sure that the data used is whatever texture is bound at the named unit at call time and filtered as according to what is bound to that texture unit. Internally, except for the driver to track, what happens is likely nothing until the actual draw call. When a draw call is finally issued, a GL implementation likely then looks at what is bound to the named texture unit and sets the GPU state for all of those goodies. The best analogy I can give you is this:

On CPU (not GPU), there is an array, indexed by texture unit, storing what data and how to filter that data.
On CPU (not GPU), as part of program state, each sampler stores an index into that array.
On CPU (not GPU), a GL implementation then looks at that index, and sets GPU state by the values of that array. In addition, it likely also does additional work to make sure the data for the texture is resident in VRAM.

What you are thinking, I think, is that the array is stored on GPU and the GPU architecture is flexible enough to look at that array. That may or may not be the case at all. Indeed, even the NVIDIA extension requires one to -by hand- make sure the data is resident, but beyond that it just wants a 64-bit address to be happy.

Lets go one wild, ugly step further. Suppose that a GPU's architecture does NOT have a dedicated discreet piece of hardware to do filtering. Suppose that the filtering is done by doing stuff to the assembly of the shader? Such a GPU is still ok for GL, since the driver would then for each shader store a map of shaders keyed by texture filtering. This might sound wild, but it may not be totally wild. Indeed as a related, but not really, an example, NVIDIA Tegra2 adds additional shader code to a fragment shader based upon blending state.

hlewin
01-30-2013, 12:15 PM
I do not think I've missed the Point. I just don't care about implementation-issues that come up when making decisions about how I think the spec should look like.
All that ever again is come up with is implementation issues that read like:
"You cannot do that when calling glBufferData, you would have to do it when calling glBindBuffer as well. So it is impossible." I don't write the specs. I do not implement opengl. I make suggestions.
The Argument is simple:
- Samplers can be set by a number as texture Units are enumerable -> uniform1i
- Data buffers can be watched for changes: One can enumerate the possibilites that can change buffer Contents.
- Buffer-bindings are well defined: it is known where which part of the buffer is bound to.
From this it follows that an integer representing a sampler can get fiddled out of the buffer either if it's data changes or if it gets bound.

As for Performance issues: an api does not Need to prevent conditions in which an Operation would be slow by making the Operation impossible. If a buffer bound to a uniform-block is written to, that is a bad practice. Who would bind a uniform-block-bound-buffer containing opaques to an image-unit and render to it expecting full performace operation? That might be possible. That might be impossible. I do not know about the hardware-Details. One could even prevent using the same buffer in incompatible contexts if needed. That would result in an error either if binding the buffer to an uniform-block with opaques or as Memory-pool for an image, as pixel-(un)-pack buffer and so on. That are Details I'm not concerned with. What I'm concerned with by making this suggestion is that I think the api is missing some Point that makes it less practicable to use in certain use-cases.

Alfonse Reinheart
01-30-2013, 12:48 PM
Buffer-bindings are well defined: it is known where which part of the buffer is bound to.

Consider this:



GLuint uniformBuffer;
glGenBuffers(1, &uniformBuffer);
glBindBuffer(GL_UNIFORM_BUFFER, uniformBuffer);

GLuint bufferData = 5; //Use texture image unit 5.
glBufferData(GL_UNIFORM_BUFFER, sizeof(bufferData), bufferData, GL_STATIC_DRAW);


This is the only thing OpenGL sees. How is OpenGL to know, at the time glBufferData is called, which uniform block this buffer is going to be used with? There is no glCompileShader, glLinkProgram, glUseProgram, or similar function in this code. This code is perfectly legal to call before any shaders have been compiled. There is no uniform block yet. So how does OpenGL know, from this code alone, that this 4-byte block should be interpreted as a sampler uniform?

OpenGL simply has no way of knowing that any particular upload of data to a buffer object is destined for any particular uniform block. And without that knowledge, OpenGL cannot determine at the time data is uploaded what is and is not opaque.


If a buffer bound to a uniform-block is written to, that is a bad practice.

That's the part you don't understand: a buffer is never bound to a uniform block. The association between a uniform block and a buffer object is implicit. It's done by separate state, one part in the context, and one part in the program (http://www.opengl.org/wiki/Uniform_Buffer_Object#Shader_Specification). Without both, OpenGL has no idea how a buffer object will be used. And until you actually render, OpenGL can't be sure that any particular buffer binding state is not merely temporary.

And there's nothing in the API that requires the use of a program when uploading data to a buffer object. Without that knowledge, there is no way to understand what particular data means.


an api does not Need to prevent conditions in which an Operation would be slow by making the Operation impossible.

If you can do something, then that something should be reasonably fast. And if it's not possible to make something reasonably fast, then the user shouldn't be able to do it at all. That's good API design.

To do otherwise creates performance traps for the user, where simple and obvious uses of the API are terribly slow without any warning. That makes the API harder to use for no real benefit; users have to have some arbitrary knowledge, outside of what a function does, to know what is the proper way to use the API.

OpenGL already has too many of these performance traps as it is (especially around buffer objects); it doesn't need more of them.


That are Details I'm not concerned with. What I'm concerned with by making this suggestion is that I think the api is missing some Point that makes it less practicable to use in certain use-cases.

In short: you want the feature, and you don't care if it's actually possible to implement, or what the performance implications of implementing it will be, or how it will affect the useability of the API. Fortunately, the ARB does care about these things, which is why it doesn't exist and won't in the near future. At least, not this way.

As I said earlier on, it's best that you not change opaque uniform settings to begin with. You should set them once, and leave them that way; this is how most code is written. It's easier to bind a texture to the right texture image unit than to change which unit is used by a shader. So even if this were done, only a fraction of users would actually need it. Bindless texturing is mostly about eliminating the glBindTexture overhead, as well as potentially determining which texture to use in the shader itself. So even users of that aren't using it for the reasons you're talking about.

hlewin
01-30-2013, 01:53 PM
That's the part you don't understand: a buffer is never bound to a uniform block. The association between a uniform block and a buffer object is implicit.
As I understand the spec blockBindings say:Get the data from that buffer if needed. So - if a block-binding gets established after the first code-fragment, it can be checked if the block contains uniforms that require special handling. That is: the opaque types. This takes place before a shader can use the data contained in the block. The other way around it is the same: If a block binding has been established and bufferData() is called to update data - that can be checked. Once again, before any shader uses the data.
You are concerned with the internal data-type of the opaque i guess. That is INTERNAL data and could be moved to a special memory-location that the user does not even know of - existentially.
thought of as
block {
sampler2D s; // this would be an integer to the c-interface and does not get used at all by the gpu
};
hidden_block_of_data: for example the nvidia 64 bit unsigneds
So if it gets requested that data for block should be pulled from a buffer everything stays the way it is except that the hardware-state gets updated if necessary and the hidden-block gets updated with data.
I could call glGetBufferSubData( offsetof(s), &s); glUniform1i(hidden_block_location, s) whenever I establish a blockBinding or update the buffer data.

EDIT:
Fortunately, the ARB does care about these things,
You seem to be quite familar with the ARB. Do you have a seat?

EDIT:
If you can do something, then that something should be reasonably fast. And if it's not possible to make something reasonably fast, then the user shouldn't be able to do it at all. That's good API design.
That is API-design that expects that the user does not know what he is doing.
If uniformBlocks with opaques needed to be handled differently then that could be done.
Describing that would take one or two extra-lines in the spec.

Could be done as follows: "If a block-binding is established or the contents of a buffer bound to an uniform block is updated any objects with an opaque data-types are made effective instantly, not just when a shader reads data from the block. Buffers bound to uniform-blocks containing opaque types cannot be bound to <whatever-can-be-modified-by-shaders> at the same time."

Alfonse Reinheart
01-30-2013, 02:51 PM
OK, I'm going to say this one more time, and then I'm done:

There is no such thing as a "buffer bound to a uniform block." Buffers are not bound to uniform blocks. They are bound to the context, to indices in the GL_UNIFORM_BUFFER binding point. Programs are bound to the context. The association between bound buffer objects and program uniform blocks is implicit. Uniform blocks in programs reference an index in the GL_UNIFORM_BUFFER binding point.

Therefore, the only time OpenGL knows when a buffer object is unquestionably to be used for a specific uniform block is when you render. And never before that point. Therefore:

1: There is no way to detect when this happens at data upload time. You can't catch opaque indices and convert them into something else at the time the user uploads data to the buffer. So your glBufferData intercept stuff is out.

2: Detecting this at render time means either storing the buffer in CPU-accessible memory or doing costly GPU-CPU readbacks when you render. Either way, you're losing performance. Guaranteed.

In short, there is no way to make this anywhere nearly as fast as just using glUniform1i. So what exactly the point?

Or, to put it another way, what is the compelling use case for this feature besides "I want to do it?"


You seem to be quite familar with the ARB. Do you have a seat?

No, but I saw what they dropped with GL 3.1. And I've seen what they added. And, generally speaking, the modern ARB doesn't add APIs that can be misused easily.


That is API-design that expects that the user does not know what he is doing.

No, this API design expects that the user doesn't have magical, unspecified knowledge of what happens to be fast and what happens to be slow.

hlewin
01-30-2013, 03:26 PM
In short, there is no way to make this anywhere nearly as fast as just using glUniform1i. So what exactly the point?
The point is a use case where the performance-hit due to a readback would be neglicable. That is where the ease of use and genericity would have priority. For example if initializing all state-variables of a shader.
One can define uniform-blocks, do a little preprocessing and get a c-structure that can be used to mirror an uniform-block.
Everything works fine until you get to the opaque types. That means that one cannot define one data-block per shader, mirror it's data and simply upload the whole block whenever the shader/program gets bound. Same thing with the offset-alignment requirements for bind buffer range. They make things near unusable. Consider the following


VertexShaderVariables{
//...
}
FragmentShaderVariables{
//...
}

Each of those can be mirrored with c-structures. But when one then tries to


struct ProgramVariables {
struct VertexShaderVariables vsVars;
struct FragmentShaderVariables fsVars;
};

one cannot simply create one buffer and use bindbufferrange to map the sub-structures to uniform-blocks. Just because of the alignment-requirements. That sucks. One could easily live with that readbacks can occur and so on which would lead to the conclusion that updating particular variables is faster than replacing whole program states. The offset alignment requirement should be hidden from the user. The implementation should split the buffer as needed should that be necessary. The user should not have to care about such things.
Would the api define things that way,maybe the hardware vendors would organize things in a way that reduces any performance hits. The difference between hardware and software is not that big when it comes to adapting to the needs present.

hlewin
01-30-2013, 03:51 PM
Or, to put it another way, what is the compelling use case for this feature besides "I want to do it?"
I had to look this up. Of course there is none. I'm a user of OpenGL - not am implementor, But - would I have to decide which api to use for my next projects I'll have a very close look on D3D as an alternative because of such things- and that despte of the fact that I have a linux-background which means preferring to write things easily portable. Of course that decision will be made in half-knowledge as I guess the pitfalls and limitations will crop up during implementation. But I guess as d3d is not an open-standard like opengl it does not hinge behind because the specs may only contain common-ground,that is, things that all vendors see as "no problem". If that means that directly rendering to program-variable blocks is impossible i have no problem with that. I don't like extensions from particular vendors as the functionality is missing on other hardware then. Using d3d hardware not supporting that stuff disappears from the market because things get slow because of software-emulation. At least I guess so.

Alfonse Reinheart
01-30-2013, 04:51 PM
Everything works fine until you get to the opaque types. That means that one cannot define one data-block per shader, mirror it's data and simply upload the whole block whenever the shader/program gets bound.

Then stop pretending that opaque types are uniforms like any others. Don't set them every time a shader is bound. Set them once, during initialization. Leave them set to those values.

Then you can change all the uniforms you want on a per-object basis without incident. You shouldn't need to be changing texture image units and such.

That's exactly why we have layout(binding) syntax; so that we can set these things in the shader and not have to ever set them in our code.

Your mistake is wanting to set these uniforms at all.


one cannot simply create one buffer and use bindbufferrange to map the sub-structures to uniform-blocks. Just because of the alignment-requirements.

Sure you can; you just can't do it that way. You can have each block's data in the same buffer, but you can't do it by putting them all in one struct. You have to manually put them into a buffer.

Just because you can't do it the way you want doesn't mean it can't be done.

One could write some generic code that would take an arbitrary boost::tuple of structs and create or update a buffer object based on them. C++11 makes this rather much easier with variadic templates, though Boost.Fusion makes it possible on pre-variadic compilers. It's not too difficult to do; just time-consuming to write.


The offset alignment requirement should be hidden from the user. The implementation should split the buffer as needed should that be necessary. The user should not have to care about such things.

The reason the offset alignment is exposed is to ensure maximum performance. It gives implementations the freedom to do things the fastest way possible, which requires imposing upon users that they do things a certain way. What you want makes things slower.

hlewin
01-30-2013, 04:59 PM
That is it - it would make things slower if one did not care instead of impossible. That's what I want.

Alfonse Reinheart
01-30-2013, 08:11 PM
That's what a performance trap is: something that looks convenient, but is in reality slow and should never be used. Like immediate mode. Or client-side vertex arrays. Notably, both of which are gone.

The API is not there to be convenient; it's there to provide access to the hardware, with minimal overhead.

hlewin
01-31-2013, 04:24 AM
The statement "are gone" is somewhat misleading. The learning/optimization curve of OpenGL is not to be cut any soon. I prefer things to be quickly codeable first and quickly to execute later on.


it's there to provide access to the hardware, with minimal overhead.
OpenGL is not a hardware-driver in my reading. Windows has a great GUI despite the fact it is an Operating System. One can criticize the fact that one cannot get the OS without the GUI but not that the GUI is shipped with the OS, if you get my reading.

thokra
01-31-2013, 05:44 AM
The statement "are gone" is somewhat misleading.

They are - at least for everyone doing modern OpenGL right and caring about performance. BTW, even though GL_ARB_compatibility permits using all the old nonsense, it's just the syntax that's still there. Under the hood, all that crap is emulated using current hardware facilities.


I prefer things to be quickly codeable first and quickly to execute later on.

Since when is something really elaborate quickly codeable in OpenGL? Correct OpenGL usage needs knowledge, effort and in most cases time. It doesn't matter if you save time coding when the result runs several times slower than the semantic equivalent you put more effort into.


OpenGL is not a hardware-driver in my reading.

No, OpenGL is a specification. Your OpenGL implementation, however, is part of the driver and it implements an interface to the graphics hardware - hopefully with minimal overhead, like Alfonse suggested.


Windows has a great GUI despite the fact it is an Operating System. One can criticize the fact that one cannot get the OS without the GUI but not that the GUI is shipped with the OS, if you get my reading.

I don't know about the others, but I don't get it.

hlewin
01-31-2013, 09:11 AM
They are - at least for everyone doing modern OpenGL right and caring about performance. BTW, even though GL_ARB_compatibility permits using all the old nonsense, it's just the syntax that's still there. Under the hood, all that crap is emulated using current hardware facilities.
Which is a good Thing as using the old crap makes learning OpenGL quite a lot easier. And as you say the principles stay roughly the same. For my taste the compatibility spec goes not far enough to provide for means of a simple fade from a beginners-tutorial as downloadable everywhere to a state-of-the art application.


Since when is something really elaborate quickly codeable in OpenGL? Correct OpenGL usage needs knowledge, effort and in most cases time. It doesn't matter if you save time coding when the result runs several times slower than the semantic equivalent you put more effort into.
It matters for example when using declaratory elements of the language binding. See the example above. When scatching things I do not want to care about alignment-requirements of bind-buffer-range. That can be optimized if things have been implemented and if a bottleneck occurs. I feel it's unnecessary to be forced to write hardware-friendly, optimized code in the first place. Who cares about the Need for 100, let it be 1000 readbacks from the gpu per Frame? That's something one Needs to care about when writing bleeding-edge stuff. bleeding-edge for about 6 month until the next gpu-generation Comes out. I have no Problems wasting 10000 clock-cycles per Frame. I have Problems wasting some work-hours having to cope with offset-alignment-requirements.

Alfonse Reinheart
01-31-2013, 10:00 AM
Which is a good Thing as using the old crap makes learning OpenGL quite a lot easier.

Is the dark side stronger?

No. Quicker. Easier, more seductive.

Just because something is easy doesn't make it good. I have never seen a fixed-function-based tutorial really explain how things actually work in the code, what all those parameters to various functions mean and so forth. Whereas you can't write shader-based code without knowing what you're doing.

Users learn to use gluPerspective without having the slightest clue what it means. They learn to use glTexEnv without knowing what it's doing. They memorize and regurgitate glBlendFunc parameters to achieve some effect without any idea what it is really doing. And all the while, they think they are "learning" computer graphics, when in reality, they're just copy-and-pasting bits of code that worked before into some other place.

And when they encounter a problem, because the Frankenstein's code that they've assembled from 20 different tutorials doesn't integrate well, they ask here. Without the slightest clue what's broken or how to fix it.

It may take longer to learn via shaders, and you may not be able to see glamorous results quickly. But when you learn it, you learn it. You aren't just copying bits of code around; you're understanding what you are doing.

thokra
01-31-2013, 10:00 AM
Thing as using the old crap makes learning OpenGL quite a lot easier.

Nonsense. A lot of the stuff you needed to do with legacy OpenGL simply does not apply to modern OpenGL.


When scatching things I do not want to care about alignment-requirements of bind-buffer-range.

When I registered on this forum almost 3 years ago it was because I stumbled over the buffer offset alignment for uniform buffers. Ok, so it's not too intuitive. However, when you're doing OpenGL there's stuff that's implementation dependent. Knowing that and how to deal with it is sometimes essential. In any case, there's the spec you can read. And don't tell me you don't have to read other specs or API docs or documentation in general during your workday. If you don't want to read the spec you can ask here or other places and people will help you. Still, nobody's going to change the spec just because some parts of it are an inconvenience to you.


I feel it's unnecessary to be forced to write hardware-friendly, optimized code in the first place.

:doh: Who forces you? YOU need to force yourself if you want fast code. By your logic, writing code that uses cache lines well is wasted. Or making sure data is properly aligned so memory accesses work properly. Or utilizing SIMD instructions. Or inline assembly. Etc, etc.... Oh well ... It's cool to first make code correct and then fast but disregarding platform specific quirks is simply unwise to be diplomatic.


Who cares about the Need for 100, let it be 1000 readbacks from the gpu per Frame?

Ehm, everyone who's not completely insane? Do you have any idea what that much readbacks will do to your program's performance?


That's something one Needs to care about when writing bleeding-edge stuff.

So your argument is, unless one writes a high-end renderer for use in next-gen AAA games, performance simply doesn't matter?

Since we're straying very far from your original proposel, let me finally urge you to consider the following: If you don't want to write high-performance code that's ok and if you're happy with the result, good for you. Still, I'm pretty confident that most experienced or semi-experienced OpenGL devs like to make things fast and they want and need an API to cater to that desire. At least that's the case for me. OpenGL is not designed to provide maximum convenience, it's supposed to provide a means to write high-performance rendering applications - and performance usually comes with a price. This includes decisions on the hardware-level which may not be transparent to the application developer but still necessary to keep the performance up. If that means I have to sacrifice some convenience than sign me up. Wishing for changes to be adopted that result in implementations being slower compared to predecessors is simply unacceptable.

Bringing suggestions to improve OpenGL is always good if they're valid, but your suggestion has been dismissed by several very experienced people (myself not included) during a long discussion. It's time to let it go.

kRogue
01-31-2013, 01:18 PM
This is .. almost fun to watch.

At any rate, what hlewin (http://www.opengl.org/discussion_boards/member.php/25576-hlewin) wants is already available as an NVIDIA only extension. As stated before, that extension assumes point blank that all the GPU needs in the shader when accessing a texture is a 64-bit value.... what he fails to grasp is that other hardware may or may not operate that way.

As a side note, the NVIDIA extension offers several distinct advantages over glBindTexture jazz:

Avoid glBindTexture, pass the 64-bit address directly. This is the same avoid binding savings that NVIDIA's original bindless offers
With NVIDIA's bindless texture, the need for texture atlas utterly disappears. You no longer need to make sure you are using no more than N textures, you can use them all (subject to VRAM room!). What one uses to choose the texture can then be fitted to anything: attributes, buffer object {be it uniform or texture buffer objects} (the former which is what he wants so badly)..


In theory one could imagine that an integer computed/determined in a shader could be used to specify what texture unit to use; but I do not really buy that either since it forces an implementation to have a separate thing orthogonal to the fragment shader to do the sampling (which I guess is the case for NVIDIA).

I'd still like to see NVIDIA's bindless for buffer object data somehow come to core in some form, but I do not think I will; it assumes too much: that data behind a buffer object all that one needs is a 64-bit value.

aqnuep
01-31-2013, 04:29 PM
At any rate, what hlewin (http://www.opengl.org/discussion_boards/member.php/25576-hlewin) wants is already available as an NVIDIA only extension.
Not exactly. Bindless texture works because it introduced opaque handles represented by a 64-bit integer to accomplish getting samplers from buffers. What hlewin want is that a non-opaque API concept, the texture unit index to be enough for the shader to create samplers out of them. Furthermore, bindless textures require one more important additional step: making the texture resident.

Also, what he wants is the GL implementation to parse the buffer and automatically update API values to opaque, implementation dependent values automagically. That's the non-sense part.

kRogue
02-01-2013, 02:22 AM
Not exactly. Bindless texture works because it introduced opaque handles represented by a 64-bit integer to accomplish getting samplers from buffers. What hlewin want is that a non-opaque API concept, the texture unit index to be enough for the shader to create samplers out of them. Furthermore, bindless textures require one more important additional step: making the texture resident.

Also, what he wants is the GL implementation to parse the buffer and automatically update API values to opaque, implementation dependent values automagically. That's the non-sense part.

The need to make it resident I already noted, what he was originally after was sampler in a buffer object. NVIDIA bindless does give that. The rest of what he was going on about a GL implementation needing to check the buffer object, etc I think was him just getting painted into a corner... you can definitely emulate using what texture unit to use stored in a buffer object just by having an additional array (in a separate block) indexed by texture unit with values as that texture/sampler pair is bound there.

But I confess, the idea of storing what texture unit(instead of what texture) to use in a buffer object sounds almost useless.. like Alfhonse originally state, the vast majority of times the texture unit to use for a sampler uniform is static for the life time of a GL program.