PDA

View Full Version : Official feedback on OpenGL 4.4 thread



Khronos_webmaster
07-22-2013, 06:00 AM
July 22nd 2013 – SIGGRAPH - Anaheim, CA – The Khronos™ Group today announced the immediate release of the OpenGL® 4.4 specification,bringing the very latest graphics functionality to the most advanced and widely adopted cross-platform 2D and 3D graphics API (application programming interface). OpenGL 4.4 unlocks capabilities of today’s leading-edge graphics hardware while maintaining full backwards compatibility, enabling applications to incrementally use new features while portably accessing state-of-the-art graphics processing units (GPUs) across diverse operating systems and platforms. Also, OpenGL 4.4 defines new functionality to streamline the porting of applications and titles from other platforms and APIs. The full specification and reference materials are available for immediate download at http://www.opengl.org/registry.

In addition to the OpenGL 4.4 specification, the OpenGL ARB (Architecture Review Board) Working Group at Khronos has created the first set of formal OpenGL conformance tests since OpenGL 2.0. Khronos will offer certification of drivers from version 3.3, and full certification is mandatory for OpenGL 4.4 and onwards. This will help reduce differences between multiple vendors’ OpenGL drivers, resulting in enhanced portability for developers.

New functionality in the OpenGL 4.4 specification includes:

Buffer Placement Control (GL_ARB_buffer_storage)
Significantly enhances memory flexibility and efficiency through explicit control over the position of buffers in the graphics and system memory, together with cache behavior control - including the ability of the CPU to map a buffer for direct use by a GPU.

Efficient Asynchronous Queries (GL_ARB_query_buffer_object)
Buffer objects can be the direct target of a query to avoid the CPU waiting for the result and stalling the graphics pipeline. This provides significantly boosted performance for applications that intend to subsequently use the results of queries on the GPU, such as dynamic quality reduction strategies based on performance metrics.

Shader Variable Layout (GL_ARB_enhanced_layouts)
Detailed control over placement of shader interface variables, including the ability to pack vectors efficiently with scalar types. Includes full control over variable layout inside uniform blocks and enables shaders to specify transform feedback variables and buffer layout.

Efficient Multiple Object Binding (GL_ARB_multi_bind)
New commands which enable an application to bind or unbind sets of objects with one API call instead of separate commands for each bind operation, amortizing the function call, name space lookup, and potential locking overhead. The core rendering loop of many graphics applications frequently bind different sets of textures, samplers, images, vertex buffers, and uniform buffers and so this can significantly reduce CPU overhead and improve performance.

Streamlined Porting of Direct3D applications
A number of core functions contribute to easier porting of applications and games written in Direct3D including GL_ARB_buffer_storage for buffer placement control, GL_ARB_vertex_type_10f_11f_11f_rev which creates a vertex data type that packs three components in a 32 bit value that provides a performance improvement for lower precision vertices and is a format used by Direct3D, and GL_ARB_texture_mirror_clamp_to_edge that provides a texture clamping mode also used by Direct3D.Extensions released alongside the OpenGL 4.4 specification include:

Bindless Texture Extension (GL_ARB_bindless_texture)
Shaders can now access an effectively unlimited number of texture and image resources directly by virtual addresses. This bindless texture approach avoids the application overhead due to explicitly binding a small window of accessible textures. Ray tracing and global illumination algorithms are faster and simpler with unfettered access to a virtual world's entire texture set.

Sparse Texture Extension (GL_ARB_sparse_texture)
Enables handling of huge textures that are much larger than the GPUs physical memory by allowing an application to select which regions of the texture are resident for ‘mega-texture’ algorithms and very large data-set visualizations.

OpenGL BOF at SIGGRAPH, Anaheim, CA July 24th 2013
There is an OpenGL BOF (http://www.khronos.org/news/events/siggraph-anaheim-2013#opengl_bof) “Birds of a Feather” Meeting on Wednesday July 24th at 7-8PM at the Hilton Anaheim, California Ballroom A & B, where attendees are invited to meet OpenGL implementers and developers and learn more about the new OpenGL 4.4 specification.

mhagain
07-22-2013, 06:10 AM
full certification is mandatory for OpenGL 4.4 and onwards

This on it's own is cause for joy. Any chance of mandatory full certification being brought back to earlier versions as time goes by and drivers mature?

Godlike
07-22-2013, 06:33 AM
The ARB extensions (bindless texture & sparse texture) sound way more interesting/useful compared to the core ones. Also, having updated specs and new extensions backed by ARB every year is really great for the GL developers.

nigels
07-22-2013, 08:33 AM
GLEW 1.10.0 is now available, including GL 4.4 support.
http://glew.sourceforge.net/

- Nigel

thokra
07-22-2013, 09:29 AM
and full certification is mandatory for OpenGL 4.4 and onwards

This is so awesome. We can only hope this won't slow down spec adoption even further.

The other features sound cool as well, but we'll see how it works out in practice. GL_ARB_buffer_storage, GL_ARB_query_buffer_object, GL_ARB_multibind ... very interesting.

mhagain
07-22-2013, 10:10 AM
Issue #9 for GL_ARB_buffer_storage (http://www.opengl.org/registry/specs/ARB/buffer_storage.txt) makes for fairly grim reading, unfortunately... :(

It's a pity as this could have been the kick up the jacksie that GL's buffer object API really needed, and the issue in question should really have been resolved by just saying "this is client memory, full stop, using incompatible flags generates an error, here are the flags that are incompatible and the vendors will have to just live with it", but it seems another case of shooting too high and missing the basic requirement as a result.

aqnuep
07-22-2013, 10:27 AM
Issue #9 for GL_ARB_buffer_storage (http://www.opengl.org/registry/specs/ARB/buffer_storage.txt) makes for fairly grim reading, unfortunately... :(


Have to agree...

mdriftmeyer
07-22-2013, 10:29 AM
This is so awesome. We can only hope this won't slow down spec adoption even further.

The other features sound cool as well, but we'll see how it works out in practice. GL_ARB_buffer_storage, GL_ARB_query_buffer_object, GL_ARB_multibind ... very interesting.

How do you figure? The spec is mulled over by members of all GPGPU vendors. They are the ones who signed off on it. This strikes me as an official commitment by the vendors to make OpenGL a solid and fully commited spec.

kRogue
07-22-2013, 10:39 AM
My 2 cents on Issue #9 of GL_ARB_buffer_storage: the ultimate causes is that there are so, so many ways that buffer object data may reside. Indeed, there is the traditional dedicated video card where the client-server thing makes sense. But there are lots of other situations in UMA land. Memory unified but not cached by CPU, cached by CPU, shared cache between CPU and GPU [whatever that exactly means], if GPU can page memory.. the list goes on and on.

At the end of the day, I think the new console folks are laughing at the whole thing because in that environment how the memory is can be precisely specified by the developer. Oh well. Life goes on.

thokra
07-22-2013, 10:40 AM
Issue #9 for GL_ARB_buffer_storage (http://www.opengl.org/registry/specs/ARB/buffer_storage.txt) makes for fairly grim reading, unfortunately... :(

True, but one will have to see how it plays out in practice. It should still work out pretty nicely with non-UMA setups.


This strikes me as an official commitment by the vendors to make OpenGL a solid and fully commited spec.

AFAIK, AMD sadly doesn't have a fully compliant GL4.3 driver out by the time GL4.4 is released ... that's how I figure. Let's not even speak of Intel. I'm not bashing them, it's just an observation. Also, we have no idea if the conformance tests are only specified or already fully implemented and whatnot. Talking the talk isn't walking the walk ...

Nowhere-01
07-22-2013, 11:35 AM
AFAIK, AMD sadly doesn't have a fully compliant GL4.3 driver out by the time GL4.4 is released ... that's how I figure. Let's not even speak of Intel. I'm not bashing them, it's just an observation. Also, we have no idea if the conformance tests are only specified or already fully implemented and whatnot. Talking the talk isn't walking the walk ...

AMD is effin' weird, they own 33% of GPU market, yet it's driver development division and support feels like some tiny indie company of 5-10 enthusiasts that just started it's life.

Anyway, subscribing to epic thread. Glad to see OpenGL evolving. I hope those extensions gonna be available for every major GPU vendor in a finite time to be any useful for mainstream development.

Alfonse Reinheart
07-22-2013, 11:38 AM
The Second Annual Unofficial OpenGL Feature Awards!

I hereby hand out the following awards:

We (Finally) Did What We Said We Were Gonna Award

The conformance test suite.

I'll just quote Tychus Findley (http://en.wikipedia.org/wiki/StarCraft_II:_Wings_of_Liberty), "Hell, it's about time!"

One Little Mistake Award

ARB_multi_bind

This was good functionality, until I saw glBindImageTextures. I can't really think of a more useless way to specify that. It applies the "defaults", based on the texture target. Which means that an array texture will always be bound layered.

OK to be fair, you can use texture_views to effectively create single textures who's defaults match what you would pass to glBindImageTexture. And then you just bind them all in one go.

3D Labs Is Really, Really Dead Award

ARB_enhanced_layouts

So, not only can we specify uniform locations in the shader, we can even specify packing behavior. To the point that we can steal components from other vectors and make them look just like other variables.

That one must be a nightmare to implement. I hope the ARB has a really comprehensive conformance test for it...

Oh, and this also wins Most Comprehensive Extension Award. It lets us steal components from other interface elements, specify the uniform/storage block layout, define locations for interface block members, and define transform feedback parameters directly in the shader.

Is OpenGL Still Open Award?

ARB_bindless_texture

So. NVIDIA comes out with NV_bindless_texture. And unlike bindless attributes and pointers in GLSL, they actually patent (http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1=20110242117.PGNR.) this (http://assignments.uspto.gov/assignments/q?db=pat&pub=20110242117).

And now it's an ARB extension. It's not core... but it's not a proprietary extension. Yet anyone who implements it will almost certainly be stepping on US20110242117, and therefore must pay whatever NVIDIA says they have to pay. Unless NVIDIA has some agreement with the ARB, granting anyone a license to implement ARB_bindless_texture without paying a fee.

The really disconcerting part is that the patent encumbrance issue... isn't mentioned in the spec. Other extensions like EXT_texture_compression_s3tc mention their patent issues. But not this one.

Last Kid Picked for Baseball Award

EXT_direct_state_access

When bindless texturing gets the nod from the ARB, and this doesn't, something interesting is happening behind the scenes. How much does the ARB not want this in core GL, for them to deal with sparse and bindless textures first?

Then again, apparently NVIDIA wants to support DSA so badly that they may be updating the DSA extension with new stuff... and not telling anyone else who's implementing it (http://www.khronos.org/bugzilla/show_bug.cgi?id=908). If true, that's not playing fair, guys. There's clearly some kind of huge fight happening around this functionality within the ARB.

So I hope nobody's holding their breath on this one.

Fragmenting The World Award

ARB_compute_variable_group_size

I understand why ARB_bindless_texture and ARB_sparse_texture aren't core. That reason being (besides the patent issues) that we don't want IHVs to have to say that <insert hardware here> can do 4.3, but not 4.4. There are lower-classes of 4.x hardware that just can't do this stuff. So we leave them as extensions until as such time as the ARB decides that the market penetration of higher-end hardware is sufficient to incorporate them.

Or until we finally decide to have 5.0 (ie: when Microsoft decides to go to D3D 12).

But compute_variable_group_size really seems like something any 4.x-class hardware should be able to handle. Something similar goes for ARB_indirect_parameters.

Hey, That's Actually Useful Now Award

ARB_buffer_storage

This extension adds functionality to glFlushMappedBufferRange, one of the more useless functions from ARB_map_buffer_range. Now, you can effectively keep a buffer mapped indefinitely and simply synchronize yourself with the server by flushing written ranges.

You Were Right Award

Um, me. (http://www.opengl.org/discussion_boards/showthread.php/175541-buffer_storage) Admittedly, it's old, but I still called it: immutable buffer object storage, with better, enforced behavior. I even called the name (which admittedly is derivative and therefore obvious). Though to be fair, the whole "render while mapped" was not something I predicted.

I was going to say that it seemed odd that not specifying GL_DYNAMIC_STORAGE_BIT still allowed you to map the buffer for writing. But I did see a notation that glBufferStorage will fail if you use GL_MAP_WRITE_BIT without also specifying GL_DYNAMIC_STORAGE_BIT. And of course, you can't map an immutable buffer for writing without GL_MAP_WRITE_BIT. ;)

It doesn't have all of the bits I would have liked to see. But it has all the bits the IHV's wanted (they even said so in the "issues" section). So I'll call that "good enough."

Oh, and as for the complaints about GL_CLIENT_STORAGE_BIT being a hint... then don't use it. Remember: the problem with the current buffer object setup isn't merely that they're hints (that contributes, but that alone isn't the full problem). It's that the hints don't really communicate what you are going to do with the buffer. Buffer Storage lets you do that.

You describe exactly how you intend to use it. The bits answer the important questions up-front: will I write to it, will I map it for reading or writing, do I want OpenGL to access it while it's mapped, etc. And the API enforces every single one of these methods of use.

I have no idea why they even bothered to put GL_CLIENT_STORAGE_BIT there, since that doesn't describe how you will use the buffer. But as the issue rightly stated, drivers will ultimately just ignore the hint.

So encourage them to do so by ignoring it yourself.

Alfonse Reinheart
07-22-2013, 11:40 AM
AMD is effin' weird, they own 33% of GPU market, yet it's driver development division and support feels like some tiny indie company of 5-10 enthusiasts that just started it's life.

Yeah, that's what happens when your company is slowly imploding thanks to their failing CPU division. Whatever profits might have been made off of GPUs are eaten by the CPU division.

Booner
07-22-2013, 11:54 AM
Is OpenGL Still Open Award?


You probably want to read http://www.khronos.org/members/ip-framework



Fragmenting The World Award

I understand why ARB_bindless_texture and ARB_sparse_texture aren't core. That reason being (besides the patent issues) that we don't want IHVs to have to say that <insert hardware here> can do 4.3, but not 4.4. There are lower-classes of 4.x hardware that just can't do this stuff. So we leave them as extensions until as such time as the ARB decides that the market penetration of higher-end hardware is sufficient to incorporate them.
<snip>
But compute_variable_group_size really seems like something any 4.x-class hardware should be able to handle. Something similar goes for ARB_indirect_parameters.

Your logic for the former extensions applies to the later ones as well.

Alfonse Reinheart
07-22-2013, 12:15 PM
You probably want to read http://www.khronos.org/members/ip-framework

Here's the relevant paragraph:


The Khronos IP policy can be summarized in that all Khronos members reciprocally agree not to assert IP rights for technology in a Khronos specification against any other Khronos member that is implementing that specification. As Khronos membership is available to any company at a nominal price, this effectively means that any company interested to implement a Khronos specification can execute the royalty-free reciprocal license.

It states very clearly that Khronos members only agree not to sue "other Khronos member"s. That doesn't sound very "open" to me. It sounds more like "pay Khronos money (by becoming a member) or you can't implement our specifications."

Are the people behind Mesa a "company" who can afford the "nominal price" of membership?

I'm not concerned about just whether AMD or Intel could implement it. It can't rightly be called an "open specification" if you have to join an industry consortium to implement the specification.


Your logic for the former extensions applies to the later ones as well.

My point was that they don't seem like it. Bindless texture support requires something very substantial from the hardware, which is only found in the more modern shader-based systems. The ability to get compute dispatch and rendering call parameters from arbitrary locations seems like something that every 4.x piece of hardware ought to be able to do.

mhagain
07-22-2013, 12:43 PM
My 2 thingies on the DSA shenanigans, and other thoughts off the top of my head.

Right now DSA is not really needed any more (it never was from a purely technical perspective; I'm talking API cleanliness here).

Most of the functionality covered by the DSA extension is dead functionality in modern OpenGL. The only really relevant areas where this actually matters anymore are texture objects and buffer objects, and with vertex attrib binding, buffer objects don't really need it (PBOs are a special case that can be passed over here). DSA as it stands will never be fully implemented in modern OpenGL because modern OpenGL doesn't need all of it; the ARB can pick the best bits (as they have done before with the glProgramUniform calls) and respecify elsewhere to avoid the requirement for it.

I'm slightly disappointed that buffer storage didn't specify a DSA API in core, but it's not a big deal.

I would have liked to have seen glBindMultiTextureEXT go core, but it hardly seems worth it for a single entry point. GL_ARB_multi_bind covers the needed functionality anyway.

It's unclear how GL_ARB_multi_bind interacts with a subsequent glTexImage/glTexSubImage/glTexParameter/etc call, or even a subsequent glBindTexture call. It seems obvious that since the original active texture selector remains unmodified, that's the one that gets used. The resolution to issue #10 makes it clear for buffers, and it would have been nice to see similar for textures. This just makes texture objects even messier, and to be honest it's looking as though junking the whole API and specifying a new GL_ARB_texture_objects2 (or whatever) from scratch may have been a better approach. That's my prediction for OpenGL 5.

Gedolo
07-22-2013, 01:06 PM
Issue #9 for GL_ARB_buffer_storage (http://www.opengl.org/registry/specs/ARB/buffer_storage.txt) makes for fairly grim reading, unfortunately... :(

It's a pity as this could have been the kick up the jacksie that GL's buffer object API really needed, and the issue in question should really have been resolved by just saying "this is client memory, full stop, using incompatible flags generates an error, here are the flags that are incompatible and the vendors will have to just live with it", but it seems another case of shooting too high and missing the basic requirement as a result.

This could be something that could be improved, the OpenCL 2.0 announcement has a mention of a 6 months feedback window. Too bad the OpenGL 4.4 does not seem to mention such a thing. Could be a good thing to do this for OpenGL too.

Alfonse Reinheart
07-22-2013, 01:39 PM
It's unclear how GL_ARB_multi_bind interacts with a subsequent glTexImage/glTexSubImage/glTexParameter/etc call, or even a subsequent glBindTexture call.

The reason the issue for buffer objects needed clarification is because buffer objects have a separation between an indexed bind point and the target bind point. Binding with glBindBufferRange binds to both the target and the indexed bind point, while glBindBuffer only binds to the indexed bind point. So it's not clear whether glBindBuffersRange would bind to the target bind point the way glBindBufferRange does. Hence the clarification.

Textures have no such dichotomy; there is no "target bind point". There are only texture image unit binding points. So there's nothing that needs to be said about them. glBindTextures binds the textures to those texture image units, period.

Nowhere-01
07-22-2013, 01:52 PM
This topic actually makes me curious about some of active posters here and this forum in general.Those new functions get a lot of discussion. What are you using this recent(4.2 and above) functionality for? What is the scope of application? Because this functionality is barely supported, it means you either target really specific hardware or have time to maintain additional code paths.

mdriftmeyer
07-22-2013, 02:51 PM
Here's the relevant paragraph:



It states very clearly that Khronos members only agree not to sue "other Khronos member"s. That doesn't sound very "open" to me. It sounds more like "pay Khronos money (by becoming a member) or you can't implement our specifications."

Are the people behind Mesa a "company" who can afford the "nominal price" of membership?

I'm not concerned about just whether AMD or Intel could implement it. It can't rightly be called an "open specification" if you have to join an industry consortium to implement the specification.



My point was that they don't seem like it. Bindless texture support requires something very substantial from the hardware, which is only found in the more modern shader-based systems. The ability to get compute dispatch and rendering call parameters from arbitrary locations seems like something that every 4.x piece of hardware ought to be able to do.

Boo hoo. Perhaps you have millions of dollars and are willing to build the standards body wherein all hardware manufacturers come together and build specs, all for free gratis. Have at it.

Becoming a member is chump change.

thokra
07-22-2013, 02:51 PM
Those new functions get a lot of discussion. What are you using this recent(4.2 and above) functionality for?

Which functionality exactly?

Jon Leech (oddhack)
07-22-2013, 02:58 PM
This on it's own is cause for joy. Any chance of mandatory full certification being brought back to earlier versions as time goes by and drivers mature?

No. Khronos can't require certification for specifications that have already gone through ratification and have shipping implementations. Mandatory starts with GL 4.4 implementations. But some vendors who are shipping earlier versions are likely to choose to make conformance submissions on those drivers. All the major desktop IHVs (and one IV who isn't on desktop at the moment) have been actively contributing to the conformance tests and running them against their drivers during the test development process.

The conformance tests are not going to solve every behavior difference or address every bug - that level of testing is far out of scope relative to the resources Khronos can put into developing tests - but they should be a significant improvement over SGI's OpenGL CTS,l which was last updated about 14 years ago.

thokra
07-22-2013, 02:59 PM
Becoming a member is chump change.

You don't get it, do you? Aside from there being principle to adhere to here (yeah I know, stupid principle, duh), Mesa is an important entity to pretty much any open-source driver on Linux as it is an OpenGL reference implementation. But the project is open-source, maintained by various people of various companies and independent developers and not by a single, loaded corporation. One might consider that a problem.

aqnuep
07-22-2013, 03:02 PM
This could be something that could be improved, the OpenCL 2.0 announcement has a mention of a 6 months feedback window. Too bad the OpenGL 4.4 does not seem to mention such a thing. Could be a good thing to do this for OpenGL too.
Couldn't agree more...

thokra
07-22-2013, 03:03 PM
that level of testing is far out of scope

So, what is to be expected? Just out of interest.

Nowhere-01
07-22-2013, 03:07 PM
Which functionality exactly?

such things as image load\store, ssbo's, atomic operations and texture views.
30% of the question is actual things you do with this functionality in real projects(except for texture views, it's kinda obvious). and 70% is what kind of projects and why did you decide to invest your time to implement functionality using those extensions.

Jon Leech (oddhack)
07-22-2013, 04:25 PM
So, what is to be expected? Just out of interest.

The GL tests share a codebase with the OpenGL ES 2/3 tests, which a bunch of stuff added for GL-specific features. So there are API coverage and functional tests, in some cases quite comprehensive and in other cases not, and a bunch of shading language tests. There is work underway to integrate a more comprehensive set of GLSL tests contributed by drawElements. We're still light on tests for the most recently added GL features, but making progress.

mhagain
07-22-2013, 05:30 PM
No. Khronos can't require certification for specifications that have already gone through ratification and have shipping implementations. Mandatory starts with GL 4.4 implementations. But some vendors who are shipping earlier versions are likely to choose to make conformance submissions on those drivers. All the major desktop IHVs (and one IV who isn't on desktop at the moment) have been actively contributing to the conformance tests and running them against their drivers during the test development process.

That's reasonable and understandable (particularly in light of vendors who may have 3.x hardware that they no longer ship drivers for), if regrettable. The fear is that some vendors may freeze their implementations at pre-4.4 level in order to avoid the certification requirement, but given your last sentence it seems a lot more positive.


The conformance tests are not going to solve every behavior difference or address every bug - that level of testing is far out of scope relative to the resources Khronos can put into developing tests - but they should be a significant improvement over SGI's OpenGL CTS,l which was last updated about 14 years ago.

I don't think many people expect every behaviour difference or bug to be fixed (even D3D with WHQL can't do that); it's a good thing to push them in the direction of more consistent and predictable behaviour though (and also good to hear that they're all on-board with this). I think everyone wants to avoid another Rage, and this is a solid step in the right direction.

neilt
07-22-2013, 06:33 PM
You don't get it, do you? Aside from there being principle to adhere to here (yeah I know, stupid principle, duh), Mesa is an important entity to pretty much any open-source driver on Linux as it is an OpenGL reference implementation. But the project is open-source, maintained by various people of various companies and independent developers and not by a single, loaded corporation. One might consider that a problem.

Khronos is committed to the creation of royalty-free specifications for use by the entire industry. It is our stated committed mission, and our actions over our history demonstrate that commitment.

We achieve that goal in a number of reasoned legal steps that also protect the IP of the Khronos membership. If we are not careful to create a structure that protects members IP, as well as the use of the specification in the industry, many of the members would not be able to participate in the creation of these standards for the good of the industry.

The wording in the Khronos IP framework grants reciprocal royalty-free license to other members. This is not exclude non-members as a goal, but because it is not acceptable to grant a valuable IP license to an unknown entity or entities (e.g. 'the whole world') that do not explicitly agree to reciprocal terms. So, the Khronos IP framework establishes the largest 'raft' of written reciprocal contractual obligations possible - i.e. between the entire Khronos membership.

Behind this is the stated commitment that anyone can implement a Khronos spec royalty free. In practice this means that if a non-member is tacitly following the terms of the written reciprocal agreement between the members, i.e. not suing Khronos members over the use of a Khronos specification, then Khronos welcomes their using the specification. Now, this stated commitment is not a written contract, but if a non-member requires a written contract between itself and the entire Khronos membership for implementing any Khronos specification, it just has to join Khronos. As Khronos membership is guaranteed (by our bylaws) to be open to any company that wishes to join, any implementer may gain access to a written reciprocal license for the cost of a Khronos membership - $10K.

For companies implementing a complete specification, $10K is very inexpensive (and we do need membership fees to keep the lights on). But the good point was made that open source communities cannot afford a Khronos membership. To address this, Khronos has a proud history of waiving membership fees to open source practitioners who are undertaking bona fide efforts to construct open source implementations of Khronos specifications. This enables them to enjoy the same protection as other Khronos members for free.

Finally, a comment was made that a member possessing a patent on a Khronos specification was a bad thing. The reverse is true. Under the Khronos IP Framework all members with patents that are essential to a ratified Khronos specification reciprocally license that patent royalty-free. Importantly, the more patents that Khronos members posses that are reciprocally licensed, the larger and stronger the patent 'raft' that protects implementers of the specification against non-members asserting patents against the spec. Patents that are licensed to you for your protection are a very good thing.

I hope this helps explain the Khronos IP Framework.

Neil Trevett
Vice President Mobile Content, NVIDIA | President, Khronos Group
ntrevett@nvidia.com

thokra
07-23-2013, 02:19 AM
@Jon: first of all, that's definitely a very good start. I wonder, would it be possible to open up the test descriptions and let the community participate? I guess we got enough man-power around here to at least fill some of the gaps with suggestions for viable tests. In general, an open conformance test-suite would be awesome - one would probably need one to n people signing off on contributions.

Foobarbazqux
07-23-2013, 02:26 AM
I've read the ARB_sparse_texture spec and I noticed that the AMD_sparse_texture spec has functions to fetch from a sparse texture and return information about whether any texture data is present, is there something similar to this in ARB_sparse_texture?

thokra
07-23-2013, 02:37 AM
@neilt: Thanks for the extensive answer. Good to hear from the Prez. ;)

One thing, however, sounds very vague:


who are undertaking bona fide efforts

How does Khronos determine that some independent, non-member entity is trustworthy enough? Plus, I assume you mean not only trustworthy but also promising, in the sense that it has to be a potentially successful endeavor?

malexander
07-23-2013, 09:38 AM
Khronos will offer certification of drivers from version 3.3, and full certification is mandatory for OpenGL 4.4 and onwards. This will help reduce differences between multiple vendors’ OpenGL drivers, resulting in enhanced portability for developers.

This is fantastic news! I'm more excited about this than any of the new core 4.4 features or extensions. Dealing with broken features debuting in drivers, spec interpretation differences, and driver regressions are a particularly unpleasant part of cross-platform OpenGL development. Working around these issues drains resources from otherwise 'useful' development. While I don't expect the situation to magically improve overnight nor make drivers perfect, this is a good start.

Is there a process for submitting conformance tests to be reviewed by the ARB? Or is this limited to ARB members?

Alfonse Reinheart
07-23-2013, 04:26 PM
Issue #9 for GL_ARB_buffer_storage (http://www.opengl.org/registry/specs/ARB/buffer_storage.txt) makes for fairly grim reading, unfortunately... :(

It's a pity as this could have been the kick up the jacksie that GL's buffer object API really needed, and the issue in question should really have been resolved by just saying "this is client memory, full stop, using incompatible flags generates an error, here are the flags that are incompatible and the vendors will have to just live with it", but it seems another case of shooting too high and missing the basic requirement as a result.

OK, so... what is the "basic requirement?"

That's what I don't understand about this whole issue. What exactly would you like "CLIENT_STORAGE_BIT" to mean that is in any way binding? You say that certain flags would be incompatible. OK... which ones? And why would they be incompatible?

If client/server memory is a significant issue for some hardware, then that would mean something more than just "incompatible bits". If client memory exists, then why would the driver be unable to "map" it? Why would it be unable to map it for reading or writing? Or to allow it to be used while mapped or to make it coherent?

The only limitations I could think of for suc memory would be functional. Not merely accessing it, but uses of it. Like an implementation that couldn't use a client buffer for transform feedback storage or image load/stores. It's not the access pattern that is the problem in those cases; it's the inability to allow them to be used as buffers in certain cases.

So the ARB could have specified that client buffer objects couldn't be used for some things. It would be the union of all of the IHVs who implement it. Which would exclude any new IHVs or new hardware that comes along. They could provide some queries so that implementations could disallow certain uses of client buffers.

But is that really something we want to encourage?

BTW, if you want to trace the etymology of CLIENT_STORAGE_BIT, it was apparently not in the original draft from January. According to the revision history (seriously ARB, use Git or something so that we can really see the revisions, not just a log. That's what version control is for), the ancestor of CLIENT_STORAGE_BIT was BUFFER_STORAGE_SERVER_BIT (ie: reversed of the current meaning), which was added two months ago.

Also, from reading the issue it sounds very much like they didn't really want to added it, but had to. Granted, since "they" are in charge of the extension, I have no idea why they would be forced to add something they didn't want.

But as I said before, you can just ignore the bit and the extension is fine.

Jon Leech (oddhack)
07-23-2013, 05:04 PM
How does Khronos determine that some independent, non-member entity is trustworthy enough?

Generally speaking, a member company recommends them, the affected working group talks about it and and makes a recommendation to the Board of Promoters, and the BoP discusses and votes on the recommendation. Which is pretty much the way most things are decided in Khronos.

Jon Leech (oddhack)
07-23-2013, 05:13 PM
use Git or something so that we can really see the revisions, not just a log. That's what version control is for

The extension specifications are in a public part of Khronos' Subversion tree and you can see the history of public updates after a spec has been ratified. We're not going to publish the entire history of a spec through it's internal development, though.

mhagain
07-23-2013, 06:12 PM
Also, from reading the issue it sounds very much like they didn't really want to added it, but had to. Granted, since "they" are in charge of the extension, I have no idea why they would be forced to add something they didn't want.

Well that's precisely what the problem is. It's nothing specifically to do with CLIENT_STORAGE_BIT itself, it could have been about anything; it's the introduction of more vague, woolly behaviour, more driver shenanigans, and via another "one of those silly hint things".

What's grim about issue #9 is the prediction that the extension will make no difference, irrespective of whether or not the bit is used:


In practice, applications will still get it wrong (like setting it all the time or never setting it at all, for example), implementations will still have to second guess applications and end up full of heuristics to figure out where to put data and gobs of code to move things around based on what applications do, and eventually it'll make no difference whether applications set it or not.

It seems to me that if behaviour can't be specified precisely, then it's better off not being specified at all. I've no particular desire for CLIENT_STORAGE_BIT to mean that the buffer storage is allocated in client memory; that's irrelevant. I have a desire for specified functionality to mean something specific, and put an end to the merry-go-round of "well it doesn't matter what hints you set, the driver's just going to do it's own thing anyway". If that's going to be the way things are then why even have usage bits at all? That's not specification, that's throwing chicken bones in the air.

Alfonse Reinheart
07-23-2013, 07:03 PM
What's grim about issue #9 is the prediction that the extension will make no difference, irrespective of whether or not the bit is used:

That section said "set it", referring to the bit. Not to all of the flags, just CLIENT_STORAGE_BIT.


I have a desire for specified functionality to mean something specific, and put an end to the merry-go-round of "well it doesn't matter what hints you set, the driver's just going to do it's own thing anyway".

Ultimately, drivers are going to have to pick where these buffers go. The point of this extension is to allow the user to provide sufficient information for drivers to know how the user is going to use that buffer. And, unlike the hints, these represent binding contracts that the user cannot violate.

Drivers are always "going to do it's own thing anyway." It could stick them all in GPU memory, or all of them in client memory, or whatever, and still be functional. But by allowing the user to specify access patterns up front, and then enforcing those access patterns, the driver is able to have sufficient information to decide up front where to put it.

The only way to get rid of any driver heuristics is to just name memory pools and tell the user to pick one. And that's just not going to happen. OpenGL is not D3D, and buffer objects will never work that way. OpenGL must be more flexible than that.

mhagain
07-23-2013, 08:03 PM
That section said "set it", referring to the bit. Not to all of the flags, just CLIENT_STORAGE_BIT.

It also said "or not". So take the situation where you don't set CLIENT_STORAGE_BIT and explain how that text doesn't apply.


Ultimately, drivers are going to have to pick where these buffers go. The point of this extension is to allow the user to provide sufficient information for drivers to know how the user is going to use that buffer. And, unlike the hints, these represent binding contracts that the user cannot violate.

Drivers are always "going to do it's own thing anyway." It could stick them all in GPU memory, or all of them in client memory, or whatever, and still be functional. But by allowing the user to specify access patterns up front, and then enforcing those access patterns, the driver is able to have sufficient information to decide up front where to put it.

The only way to get rid of any driver heuristics is to just name memory pools and tell the user to pick one. And that's just not going to happen. OpenGL is not D3D, and buffer objects will never work that way. OpenGL must be more flexible than that.

Again, I'm not talking about CLIENT_STORAGE_BIT specifically, I'm talking about specification vagueness and woolliness in general. You say that "OpenGL is not D3D" but yet D3D 10+ (which by the way doesn't have memory pools, it has usage indicators just like ARB_buffer_storage) has no problem whatsoever specifying explicit behaviour and yet working on a wide range of hardware. This isn't theory, this is something that's already out there and proven to work, and "OpenGL must be more flexible than that" just doesn't cut it as an excuse.

Referring specifically to CLIENT_STORAGE_BIT now, go back and read the stated intention of this extension:


If an implementation is aware of a buffer's immutability, it may be able to make certain assumptions or apply particular optimizations in order to increase performance or reliability. Furthermore, this extension allows applications to pass additional information about a requested allocation to the implementation which it may use to select memory heaps, caching behavior or allocation strategies.

Now go back and read issue #9:


In practice, applications will still get it wrong (like setting it all the time or never setting it at all, for example), implementations will still have to second guess applications and end up full of heuristics to figure out where to put data and gobs of code to move things around based on what applications do, and eventually it'll make no difference whether applications set it or not.

Realise that it's being predicted to not make the blindest bit of difference even if applications don't set CLIENT_STORAGE_BIT.

This extension would have been great if CLIENT_STORAGE_BIT was more strictly specified.
This extension would have been great if CLIENT_STORAGE_BIT was not specified at all.

Right now best case is that implementations will just ignore CLIENT_STORAGE_BIT and act as if it never even existed. MAP_READ_BIT | MAP_WRITE_BIT seem enough to clue in the driver on what you want to do with the buffer. Worst case is that we've an exciting new way of specifying buffers that does nothing to resolve a major problem with the old way.

Alfonse Reinheart
07-23-2013, 10:39 PM
Realise that it's being predicted to not make the blindest bit of difference even if applications don't set CLIENT_STORAGE_BIT.

You're really blowing this way out of proportion.

The mere existence of the bit changes nothing about how the implementation will handle implementing the rest, because it changes nothing about any of the other behavior that is specified. If you say that you won't upload to the buffer by not making it DYNAMIC, you cannot upload to it. If you don't say that you will map it for writing, you can't. If you don't say that you will map the buffer while it is in use, you can't.

All of that information still exists, is reliable, and is based on a API-enforced contract. Therefore, implementations can still make accurate decisions based on it.


Worst case is that we've an exciting new way of specifying buffers that does nothing to resolve a major problem with the old way.

Um, how?

The fundamental problem with the current method is that the hints you provide are not guaranteed usage patterns. The API can't stop you from using them the wrong way, nor can the documentation explain the right access pattern for the hints. Therefore, those hints will frequently be misused. Since they are misused, driver developers cannot rely upon them to be accurate. So driver developers are forced to completely ignore them and simply watch how you use the buffer, shuffling it around until they figure out a place for it.

With the exception of CLIENT_STORAGE_BIT, all of the hints are enforced by the API. You cannot use them wrong. Therefore they represent real, actionable information about how you intend to use the buffer. Information that driver developers can use when wanting to allocate the storage for it.

The mere existence of CLIENT_STORAGE_BIT changes nothing at all about how useful the other bits are. The discussion in Issue 9 is specifically about those cases where the other usage bits alone cannot decide between different memory stores.

And, as far as the DX10 comparisons go, I checked the DX10 API. The only functional difference between these two is that CLIENT_STORAGE_BIT exists in GL (that, and the GL version gives you more options, such as using GPUs to updating non-dynamic buffers). So why should I believe that the mere existence of an option suddenly turns an API that is functionally equivalent to DX10 into the wild west of current OpenGL buffer objects?

Or let me put it another way. If the DX10's usage (http://msdn.microsoft.com/en-us/library/windows/desktop/bb172499%28v=vs.85%29.aspx) and access (http://msdn.microsoft.com/en-us/library/windows/desktop/bb204908%28v=vs.85%29.aspx) flags are sufficient information to place buffer objects in memory, why are the current set of bits provided by this extension not equally sufficient for this task? And if those bits are not sufficient, then there must already exist "heuristics to figure out where to put data and gobs of code to move things around based on what applications do" in D3D applications, so why would that code not apply equally well to OpenGL implementations?

I think you really taking that issue way out of proportion.

Is it possible for implementations to just ignore all of these bits (outside of enforcing the contract) and rely entirely on heuristics? Absolutely. But the information is there and it is reliable. So why would they? Just because there's one bit that may not be reliable?

Jon Leech (oddhack)
07-23-2013, 11:27 PM
The 4.4 updates to the man pages are live now, again thanks to Graham Sellers.

There appears to be some weirdness with Javascript showing up at the beginning of the page, at least for me in Chromium 27, but the actual content beneath that looks OK. We'll work on the weirdness, probably something to do with updated Docbook stylesheets.

kRogue
07-24-2013, 02:06 AM
Looking at the links into D3D10 docs that Alfhonse provided one sees that the usage enumerations and flags for mapping in D3D10 are far fewer than what is found in OpenGL. The main difference, I guess, is that in D3D10 (and 11) is that MS-Windows does the memory management where as in GL it is the IHV that does this job. This is a guess.

Now my 2 cents on the memory thing. Essentially the idea is that GL is supposed to manage memory for a developer. This is why we have all these hints and weird things drivers do in either(or both) using the hints or the application behavior. In all honesty, I think that sucks.

On the other hand, lets say we want an API available so that one can manage the memory more directly one-self, like in console development. The issue here is then that different boxes have very different memory architectures. The most obvious being UMA vs discrete memory. But even in that case there are subtleties about caching behavior and so on.

And now I propose something quite heretical. I propose that each IHV makes a buffer object extension that exposes these subtleties and allows a developer to make choices and the driver just obeys. Not only will each IHV need to make the extension, they will also need to make their extension (or another extension) for different memory configurations of their hardware (for example AMD APU's vs AMD discrete cards). A GL application would then be written as follows: check which of the memory extensions it supports is available and use that one, otherwise use the current buffer object interface. Going further, texture data should be more easily manipulated too where the texture swizzle format can be queried (or even specified) so that one can stream texture data more directly.

But still, my suggestion is not all that great either. From the developer point of view, that means more code.

Alfonse Reinheart
07-24-2013, 04:30 AM
one sees that the usage enumerations and flags for mapping in D3D10 are far fewer than what is found in OpenGL

D3D provides 4 usage bits and 2 mapping type bits. ARB_buffer_storage provides 2 mapping type bits and 4 usage bits. Now yes, OpenGL does offer more valid combinations. But OpenGL is also exposing new functionality that D3D doesn't: PERSISTENT and COHERENT allow you to map a buffer while it is in use. Which is always expressly forbidden in D3D. If you take those away, then the valid combinations have approximately equal descriptive power compared to D3D's stuff.

Unless you were talking about the old usage fields.


I propose that each IHV makes a buffer object extension that exposes these subtleties and allows a developer to make choices and the driver just obeys.

Let's ignore the obvious flaws in this idea (the large array of different memory types resulting in an explosion of extensions to cover them all, the fact that such information might be considered proprietary and therefore held secret, etc). Let's just take this at face value.

What you seem to forget is this: there aren't that many consoles. And with them, they don't offer that many different choices. The 360 uses a completely unified memory architecture, so it offers no choices at all (maybe cached vs. uncached memory accessing, but that's about it). The PS3 has a split memory architecture, so there are two choices. If one way is too slow, you try the other way and it works. And it works on every PS3 that has existed and ever will exist.

Every generation of PC hardware would have its own extension. Within generations, there would be different extensions too. An HD5xxx with GPU memory would use a different architecture from an embedded HD5xxx chip. An HD7xxx chip would have to use a different memory architecture from the 5xxx chip, even though they still have the GPU paradigm.

Given all of this myriad of choices... what is the chance that the vast majority of game developers will always and consistently pick the right one for their usage pattern? For every piece of hardware? Do you expect most game developers to sit down and find the optimal memory arrangement for each one of their usage patterns, for every piece of hardware that exists?

Lastly, I would remind you that D3D works just fine under this paradigm. Is there some performance being lost? Possibly. But it's a sacrifice, and it isn't terribly much of one at the end of the day. Especially considering the alternative...

kRogue
07-24-2013, 09:36 AM
What you seem to forget is this: there aren't that many consoles. And with them, they don't offer that many different choices. The 360 uses a completely unified memory architecture, so it offers no choices at all (maybe cached vs. uncached memory accessing, but that's about it). The PS3 has a split memory architecture, so there are two choices. If one way is too slow, you try the other way and it works. And it works on every PS3 that has existed and ever will exist.


Sighs, let me make it more clear: I am talking about the next generation consoles that are to be in consumer's hands this fall and winter: the PS4 and Xbox One. There you will find each has a unified memory architecture, but with differences on how that memory is handled. XBox One (I think) has a huge cache where as the PS4 has really fast main memory, GDDR5. However, depending on how it is allocated and such it can behave differently with respect to caching behavior, i.e. writes by CPU are immediately observed by GPU, and so on. On the PC *now* there are boxes with more unified memory magicks going on (namely AMD's unified memory jazz it has made a big deal about which is what is driving the PS4 and I suspect also the XBox One).

Going over to mobile, effective memory management is really important. Currently, through GL, it is basically "hope for the best". This usually translates to the data is static and not streamed from the CPU at all. So, even though memory is unified, the lack of an API prevents intelligent use of that feature. Moreover, good usage is going to be sensitive to the details of the memory architecture: nature of caches, etc.



What you seem to forget is this: there aren't that many consoles. And with them, they don't offer that many different choices. The 360 uses a completely unified memory architecture, so it offers no choices at all (maybe cached vs. uncached memory accessing, but that's about it). The PS3 has a split memory architecture, so there are two choices. If one way is too slow, you try the other way and it works. And it works on every PS3 that has existed and ever will exist.

Every generation of PC hardware would have its own extension. Within generations, there would be different extensions too. An HD5xxx with GPU memory would use a different architecture from an embedded HD5xxx chip. An HD7xxx chip would have to use a different memory architecture from the 5xxx chip, even though they still have the GPU paradigm.

Given all of this myriad of choices... what is the chance that the vast majority of game developers will always and consistently pick the right one for their usage pattern? For every piece of hardware? Do you expect most game developers to sit down and find the optimal memory arrangement for each one of their usage patterns, for every piece of hardware that exists?


There are a few obvious bits and I freely admit I wrote too soon. Firstly, it need not be each IHV makes their own extension, but rather each memory model architecture would have it's own extension. This though can quickly go to a generic interface which we already have. What is wanted is the ability to specify how and where the memory is allocated and how it is cached, etc. Right now all we have is the ability to provide hints and hope that an application's behavior is recognized by a GL implementation. That is not engineering, that is crossing one's fingers and hoping for the best. In an ideal world, a manual memory management extension suite would expose the issues: unified vs non-unified (the latter basically means all communication is though the PCI-bus then), where as the former then has further joys: nature of caching of GPU and CPU (or for that matter if the cache is some how shared) and on and on. It won't be pretty to do, but if one wants control, then one needs to know what one is controlling.

It is absolutely true that a given IHV will then have different hardware have different memory management extensions, essentially one for each memory architecture. Yes, this sucks. D3D has an edge here because (I believe) Microsoft wrote the memory handler, not the IHV's thus the same flags give the exact same behavior for gizmos with the same memory architectures across IHV's.

Right now though, the current system is bad. We now have two different ways to specify the intent of using a buffer object, the new way and old way. However, all we are specifying is intent and in return given guarantees. We still have buffer object ouji board and we have that because the requirement that the same API works on all hardware means that we can never specify what is to do.

I am not advocating that an application must use such an extension suite, but I am advocating giving a developer an option. Improper memory management configuration can eat a massive hole though bandwidth and performance quite easily.

As for mobile, where this really is a huge big deal: I believe we will see GL4 (not just GLES3 and GL3) in the mobile space soon (atleast the next generation Tegra and more if NVIDIA has licensees of it's TEGRA GPU magicks to other SoC folks). Memory management is a big deal and it will get worse quite quickly soon I think. As an example, the sparse texture jazz is to some degree about an application performing limited manual memory management. This will get worse; the want for GPU and CPU to use each others data is going to become a bigger and bigger issue and GL right now is still trying to get by with the client-server model (which is fine for non-unified memory situations) but loses so much capability in the unified land.

It might be my idea is nuts, but I kindly suggest that when we discuss this to try to brainstorm ideas on how to make this memory management issue better rather than just shooting down others. This is a dead serious issue, the hardware has capabilities not at all exposed by the API and these capabilities are a really big deal.

Stephen A
10-23-2013, 09:43 AM
Is it possible to clarify the wording in 4.4 core spec, section 8.21, page 251:


For texture types that do not have certain dimensions, this command treats those dimensions as having a size of 1. For example, to clear a portion of a two- dimensional texture, use zoffset equal to zero and depth equal to one.

format and type specify the format and type of the source data and are inter- preted as they are for TexImage3D, as described in section 8.4.4. Textures with a base internal format of DEPTH_COMPONENT, STENCIL_INDEX, DEPTH_STENCIL require depth component, stencil, or depth/stencil component data respectively. Textures with other base internal formats require RGBA formats. Textures with in- teger internal formats (see table 8.12) require integer data.





These paragraphs appear conflicting to me: the first implies that you can pass GL_TEXTURE_2D for <type>, but the second states that <type> is interpreted according to TexImage3D which doesn't support GL_TEXTURE_2D.

What is the valid behavior here?