Official feedback on OpenGL 4.3 thread

August 6th, 2012 – Los Angeles, SIGGRAPH 2012 – The Khronos™ Group today announced the immediate release of the OpenGL® 4.3 specification,bringing the very latest graphics functionality to the most advanced and widely adopted cross-platform 2D and 3D graphics API (application programming interface). OpenGL 4.3 integrates developer feedback and continues the rapid evolution of this royalty-free specification while maintaining full backwards compatibility, enabling applications to incrementally use new features while portably accessing state-of-the-art graphics processing unit (GPU) functionality across diverse operating systems and platforms. The OpenGL 4.3 specification contains new features that extend functionality available to developers and enables increased application performance. The full specification is available for immediate download at Khronos OpenGL® Registry - The Khronos Group Inc.

Twenty years since the release of the original OpenGL 1.0, the new OpenGL 4.3 specification has been defined by the OpenGL ARB (Architecture Review Board) working group at Khronos, and includes the GLSL 4.30 update to the OpenGL Shading Language.

New functionality in the OpenGL 4.3 specification includes:

[ul]
[li]compute shaders that harness GPU parallelism for advanced computation such as image, volume, and geometry processing within the context of the graphics pipeline;[/li][li]shader storage buffer objects that enable vertex, tessellation, geometry, fragment and compute shaders to read and write large amounts of data and pass significant data between shader stages;[/li][li]texture parameter queries to discover actual supported texture parameter limits on the current platform;[/li][li]high quality ETC2 / EAC texture compression as a standard feature, eliminating the need for a different set of textures for each platform;[/li][li]debug capability to receive debugging messages during application development;[/li][li]texture views for interpreting textures in many different ways without duplicating the texture data itself;[/li][li]indirect multi-draw that enables the GPU to compute and store parameters for multiple draw commands in a buffer object and re-use those parameters with one draw command, particularly efficient for rendering many objects with low triangle counts;[/li][li]increased memory security that guarantees that an application cannot read or write outside its own buffers into another application’s data;[/li][li]a multi-application robustness extension that ensures that an application that causes a GPU reset will not affect any other running applications.[/li][/ul]

Learn more about what is new in OpenGL 4.3 at the BOF at Siggraph, Wed, August 8th from 6-7PM in the [LEFT]JW Marriott Los Angeles at LA Live, Gold Ballroom –Salon 3. Then join us to help celebrate the 20th anniversary of OpenGL on Wednesday August 8th from 7-10 PM in the JW Marriott Los Angeles at LA Live Gold Ballroom – Salon 1, 2 & 3.

Complete details on the BOF are here: [/LEFT]

Complete details on the Party are here: [/LEFT]

A Big Thankyou to all the contributors of the GL4.3 specification. The reorganization of the specification (which is almost like a text-rewrite) is wonderful . Comments on the added beans to follow from me.

Wish I could have made it to SIGGRPAH this year :(.

OpenGL 4.3 review available: http://www.g-truc.net/doc/OpenGL4.3review.pdf :slight_smile:

The compute shaders with the shader storage buffer objects are very interesting. (Calculating data and integrating that data into the graphics pipeline + other direction and doing both in an advanced application could achieve very interesting blends of things.)

high quality ETC2 / EAC texture compression as a standard feature: Hopefully this solves texture compression for the foreseeable future. Becomes core in a future OpenGL version.

Multi-application robustness: important, I can’t believe OpenGL only has this now. Hope to see this in OpenGL ES and WebGL too!

Feature requests:

Direct State Access functions: less error prone, same done with less code.

Bindless textures: less error prone and abstraction allows things not possible before while driver can be optimized for gpu specific architecture things.

If I debug an application I want to be able to see if api calls are deprecated, either in the currently used OpenGL version in the context or other versions (let me specify api’s and version ranges, compatibility ranges, profiles). If I use an 3.3 context also show me deprecations for further contexts. First list the currently used version then the other ones (from earlier deprecation to recent deprecation). This way a developer can work from the earlier stuff that should be tackled first to the later stuff that should be tackled last in the list. Make sure this is not tied to the used context of the OpenGL application just like the show debug output.

The Unofficial OpenGL 4.3 Feature Awards!

I hereby hand out the following awards:

We Did What We Said We Were Gonna Award

The OpenGL Graphics System: A Specification (Version 4.3)

Literally as I was downloading the spec, I said out loud something to the effect of, “There’s no way the ARB actually did the rewrite they promised.” I was wrong.

It’s a much better spec. There are still a couple of issues, like why VAOs are still presented after all of the functions that set state into them. But outside of a few other anomalies like that, it is much more readable. Props for having a dedicated error section for each function, in a different background, to make it clear what the possible errors are.

Most Comprehensive Extension Award

ARB_internalformat_query2

This is something OpenGL has needed for ages, and I generally don’t like giving props for people finally doing what they were supposed to have done long ago. But this extension provides pretty much every query you could possibly imagine. It’s even a little too comprehensive, as you can query aspects of formats that aren’t implementation-dependent (color-renderable, for example).

One Little Mistake Award

ARB_vertex_attrib_binding

This was good functionality, until I saw that the stride is part of the buffer binding, not the vertex format. Indeed, that’s the reason why it can’t use glBindBufferRange; because it needs a stride. Why is the stride there and not part of the format? Even NVIDIA’s bindless attribute stuff puts the stride in the format.

This sounds like some horrible limitation ported over from D3D’s stream nonsense that OpenGL simply doesn’t need.

We Declare A Do-Over Award

ARB_compute_shader

The existence of ARB_compute_shader is a tacit admission that OpenCL/OpenGL interop is a failure. After all, the same hardware that runs compute shaders will run OpenCL. So why wouldn’t you use OpenCL to do GPGPU work, and interop with OpenGL with the interop layer.

3D Labs Is Finally Dead Award

ARB_explicit_uniform_location

The last remnants of the Old Republic have been swept away forever. GLSL, as 3D Labs envisioned it, is now dead and buried in a shallow grave. OpenGL has finally accepted that uniform locations won’t be byte offsets and will have to be mapped into a table to reconstruct those byte offsets.

So it’s high time the ARB cut their losses with the last of 3D Labs’s horrible ideas and provided functionality in a way that coincides better with reality.

We Need More Ways To Do Things Award:

ARB_shader_storage_buffer_object

You know what OpenGL doesn’t have enough of? Ways to access buffer objects from shaders. I mean, buffer textures, UBOs, and image buffer textures just weren’t enough, right? We totally needed a fourth way to access buffer objects from shaders.

Yeah, I get what they were saying in Issue #1 (where they discussed why they didn’t re-purpose UBOs for this). But it doesn’t change the fact that there are now 4 separate ways to talk to buffer objects from shaders, and each one has completely different performance characteristics.

Let’s Rewrite Our API And Still Leave The Original Award:

ARB_program_interface_query

I get the idea, I really do. It provides a much more uniform and extensible ways to query information from shaders. That’s a good thing, and I’m not disputing that. But we already have APIs to do that; indeed, pretty much the only thing missing from those APIs was querying fragment shader outputs.

Now the API is very cluttered. We have the old way of doing things, plus this entirely new way.

You Were Right Award

Mhagain’s suggestion for texture_views, which was implemented. I_belev’s initial idea was close, but much more narrow (changing the format on an existing texture, rather than creating a new texture that referenced the old data) than the actual texture view stuff provided by 4.3.

Overall, I’m getting a feeling of deja vu: once again, OpenGL has many ways to do things, and little guidance as to how to do it. We’ve got multiple ways to set up vertex formats and buffers, multiple ways to read data from buffers in shaders, etc.

Of course, we won’t see another API cleanup and function removal round, since the last one went so well.

[QUOTE=Alfonse Reinheart;1241120]One Little Mistake Award

ARB_vertex_attrib_binding

This was good functionality, until I saw that the stride is part of the buffer binding, not the vertex format. Indeed, that’s the reason why it can’t use glBindBufferRange; because it needs a stride. Why is the stride there and not part of the format? Even NVIDIA’s bindless attribute stuff puts the stride in the format.

This sounds like some horrible limitation ported over from D3D’s stream nonsense that OpenGL simply doesn’t need.[/QUOTE]
I couldn’t agree more, stride is just strange there, as it’s more a format specifier than a resource specifier. Also, the additional indirection between vertex attribute indices and vertex buffer binding indices sounds way too much abstraction. But to be honest, personally, I would drop all vertex array stuff at once. Using shader storage buffers now is probably the best way to access vertex data, if you can live with the limitation that your shaders now have to know explicitly your vertex data formats (actually, from performance-wise it’s probably preferable to be explicit).

[QUOTE=Alfonse Reinheart;1241120]We Declare A Do-Over Award

ARB_compute_shader

The existence of ARB_compute_shader is a tacit admission that OpenCL/OpenGL interop is a failure. After all, the same hardware that runs compute shaders will run OpenCL. So why wouldn’t you use OpenCL to do GPGPU work, and interop with OpenGL with the interop layer.[/QUOTE]
Sad, but probably true. I wouldn’t expect it either, but if that’s what it takes to have efficient compute-graphics interworking, then let it be.

But to be honest, personally, I would drop all vertex array stuff at once. Using shader storage buffers now is probably the best way to access vertex data, if you can live with the limitation that your shaders now have to know explicitly your vertex data formats (actually, from performance-wise it’s probably preferable to be explicit).

… how? Attribute format conversion is free; doing it in the shader would be not free. How is “not free” faster than “free”?

Furthermore, attributes are fast, with dedicated caches and hardware designed to make their particular access pattern fast. Shader storage buffers are decidedly not. Specific hardware still beats general-purpose code.

Hey, but we have gained some consitency. Now both inputs (attributes) and outputs (frag outs) from pipeline have the indirection :wink:

Combing the forums, one can see lots of stuff that was stated in Suggestion for next release of GL found there way into GL4.3:
[ul]
[li]Texture views (though I strongly suspect that it was in the draft specs before it appeared as a suggestion)[/li][li]The decoupling of vertex attribute source and format[/li][li]explicit uniform location[/li][li]read stencil values from depth-stencil texture[/li][li]ability to query in’s and outs of shaders and programs[/li][li]arbitrary formatted to structure writes to buffer objects from shaders (I freely confess that what I was begging for was NVIDIA’s GL_NV_sahder_buffer_load/store but what is delivered in GL4.3 is still great)[/li][/ul]

Really happy about the spec rewrite/reorg.

You think that everything is “free” what looks “free” from the API point of view? Well, I think I have to dissolve your illusions…

Hmmm…

Section 7.2 Shader Binaries:

shaders contains a list of count shader object handles. Each handle refers to a
unique shader type,

why do the shader types need to be unique? Is this a typo and should it read that the handles are unique? In contrast, AttatchShader of section 7.3:

Multiple shader objects of the same type may be attached to a single program
object, and a single shader object may be attached to more than one program object.

Section 10.3.1 (Specifying Arrays for Generic Vertex Attributes) and 10.4(Vertex Array Objects)
Table 10.2 refers to glVertexAttribPointer calls rather than glVertexAttribFormat calls.
I found that reading the extension GL_ARB_vertex_attrib_binding was easier to grok the index indirection that VertexAttribBinding specifies. The interaction of VAO’s with VertexAttribBinding also takes a moment or so to grok correctly. It is in the table, but perhaps a little more text to hold a reader’s hand on it.

[SIZE=3]“Clear Texture”?
[/SIZE]There is ClearBufferSubData, why is there not an analogue for textures? Or is it in the specification and I missed it?

Easier if…
Section 7.6.2.2 (Standard Uniform Block Layout) 1st paragraph, last sentence:

“Applications may query the off-sets assigned to uniforms inside uniform blocks with query functions provided by
the GL”

Would be a touch more merciful if a reference to the query function and parameters was indicated
on how to query of GLSL program for the format of a uniform block… this is nit picking, but hey, I missed SIGGRAPH this year, so I am grouchy.

[QUOTE=Alfonse Reinheart;1241123]… how? Attribute format conversion is free; doing it in the shader would be not free. How is “not free” faster than “free”?

Furthermore, attributes are fast, with dedicated caches and hardware designed to make their particular access pattern fast. Shader storage buffers are decidedly not. Specific hardware still beats general-purpose code.[/QUOTE]

First the stride parameter needs to be a binding parameter. It allows to reusing the same vertex formats but data that is dispatched into various number of buffer. Typically we can expect cases where a vertex format will be used for a perfectly packed buffer which interleave all the data. We can however imagine that the same vertex format can be reused without switching for a data that are stored into two buffers: One static and one dynamically updated.

Second, attribute format conversion is the same cost when done in the shader because it is effectively already done in the shader behind or back. GPUs no longer use dedicated hardware for that, it takes space that can’t be reused for something else. That will be more and more the case.

[QUOTE=Groovounet;1241136]First the stride parameter needs to be a binding parameter. It allows to reusing the same vertex formats but data that is dispatched into various number of buffer. Typically we can expect cases where a vertex format will be used for a perfectly packed buffer which interleave all the data. We can however imagine that the same vertex format can be reused without switching for a data that are stored into two buffers: One static and one dynamically updated.

Second, attribute format conversion is the same cost when done in the shader because it is effectively already done in the shader behind or back. GPUs no longer use dedicated hardware for that, it takes space that can’t be reused for something else. That will be more and more the case.[/QUOTE]
Exactly, shaders do perform the fetching and attribute format conversion internally, so they have to know the stride in order to create the appropriate fetching and conversion code, thus despite we have now separate format and binding, considering that the stride is coupled with the binding, there can be still internal re-compiles of vertex fetching code even one only changes binding (as we had it with the old API), thus it defeats the purpose.

But, of course, this all depends on hardware and driver implementation.

Could the stride parameter be a variable which content would be fetched from a register file?

Maybe, it all depends on the hardware and driver implementation, though I suppose that not all existing hardware can do it that way, but maybe I’m wrong. However, that means an additional indirection which might not be good performance-wise for some applications. I still believe that for new applications programmable vertex fetching is the way to go, especially having something like shader storage buffers in place.

However, shader storage buffers have their problems too:

  1. They have to be writeable, thus they cannot be supported on GL3 hardware
  2. GL4.3 only requires a max of 16MB for storage buffers which may be too small for some use cases (of course, implementations are free to allow larger buffers, but still, it just sounds too small for me)

First the stride parameter needs to be a binding parameter. It allows to reusing the same vertex formats but data that is dispatched into various number of buffer. Typically we can expect cases where a vertex format will be used for a perfectly packed buffer which interleave all the data. We can however imagine that the same vertex format can be reused without switching for a data that are stored into two buffers: One static and one dynamically updated.

What good is that? The assumption with this is that the non-stride part of the vertex format is the performance-limiting issue, rather than the buffer binding itself. Bindless suggests quite the opposite: that changing the vertex format is cheap, but binding buffer objects is expensive. At least for NVIDIA hardware.

Second, attribute format conversion is the same cost when done in the shader because it is effectively already done in the shader behind or back. GPUs no longer use dedicated hardware for that, it takes space that can’t be reused for something else. That will be more and more the case.

Do you have evidence of this? And for what hardware is this true?

What makes you think that? Please give a reference to where did you read that because I would be also interested.

From bindless, I feel that NVIDIA thinks two things to be expensive:

  1. Vertex format change
  2. Mapping buffer object names to GPU memory addresses

Maybe I’m wrong, so please disprove my assumptions.

Thanks!
Definitely agree that this was long overdue. As for the ‘too comprehensive’ aspect – my desire is that this will become widely supported across GL and GLES versions. In this usage, the properties that aren’t implementation-dependent for a specific version of GL, then do have value.

So when can we expect fully programmable vertex pulling (with right performace) extension from AMD? :smiley:
I think its a little bold to assume that over significant amount of hardware out there.

I know of at least some recent that cant do that (without silly amount of shaders recompilation before batch submition).

In D3D at least, when using IASetVertexBuffers (slot, num, buffers, strides, offsets), changing the strides and/or offsets alone is cheaper than changing everything. Since the buffer part of the specification is not changing the driver can make use of this knowledge and optimize behind the scenes.

With GL this is now explicit in the new state that has been introduced. VERTEX_BINDING_STRIDE and VERTEX_BINDING_OFFSET are separate states, so they are the only states that need to be changed on a BindVertexBuffer call where everything else remains equal.

Where this functionality is useful is when you may have multiple models packed into a single VBO, or multiple frames for the same model in a single VBO, or for LOD schemes. You can jump to a different model/frame/LOD with a single BindVertexBuffer call, rather than having to specify individual VAOs for each model/frame/LOD, or respecify the full set of VertexAttribPointer calls for each model/frame/LOD (worth noting that stride is not really a big deal here and is unlikely to change in real-world programs; offset is the important one).

The decoupling of buffer from layout introduced here is useful functionality on it’s own. Your VertexAttribArray (VertexAttribFormat) calls are no longer dependent on what the previous BindBuffer call was, which introduces extra flexibility and reduces potential for error. Getting rid of such inter-dependencies is also some nice cleaning up of the API and it’s a good thing that GL has finally got this, the only seeming bad part being (and I haven’t fully reviewed the spec so I may have missed something) that BufferData/BufferSubData/MapBuffer/MapBufferRange haven’t been updated to take advantage of the new binding points.

I recommend sitting down and writing some code for non-trivial cases using this API; you should soon see how superior this method is to the old.