PDA

View Full Version : Direct State Access



ViolentHamster
08-25-2011, 02:42 PM
I'm surprised I didn't find anything in this forum on Direct State Access. Everyone seems to want DSA included in the core, and I haven't heard a good reason that it's not included.

At the very least, the additional DSA methods in NV_texture_multisample really need to be included in some EXT extension. It's a big pain that those few methods have no DSA version outside of the Nvidia extension.

overlay
08-25-2011, 03:43 PM
EXT_texture_compression_s3tc will never be in the core OpenGL spec because of some patent issue.

Maybe there is an IP issue with Direct State Access too. I don't know, just speculating here.

Alfonse Reinheart
08-25-2011, 03:53 PM
The multisample thing is actually surprising. Usually, you can tell which extensions NVIDIA worked on closely by whether or not there are any DSA interactions. ARB_texture_storage and ARB_separate_shader_objects are good examples of this. Maybe that's just a more recent thing, though.

I imagine that the ARB is hesitant on this because it doesn't actually change anything. That it provides an alternate mechanism for doing stuff. And with 3.1, we had that whole "get rid of alternate mechanisms for doing stuff" thing.

Plus, you don't really get anything from DSA (besides API annoyances being removed). The driver cannot assume that binding something means you intend to use it, because those old functions still exist. Which means that we would have to have yet another round of deprecation and removal; the last one was painful enough. Especially since NVIDIA didn't seem to care much for it, saying that it wouldn't help drivers and so forth.

Perhaps this is a place where the ARB could employ profiles to greater effect. We already have core vs. compatibility. We also have debug contexts. We could therefore have DSA contexts. Basically, they are a way of saying, "I will not call glTexParameter, glVertexAttribPointer, or any other state change functions." Of course, by "I will not call," I mean "these functions will fail and throw a GL_INVALID_OPERATION error". That way, the driver knows that you will not be calling old-style functions, so it knows that if you bind a texture, you're serious about using it.

Also, there are many context values are aren't bound up in objects. The viewport transform, rasterization settings, blending, post-fragment tests, etc. If we're going to have DSA, we need to basically have an object-based API. It makes no sense to have some things be objects and other things not be objects.


Maybe there is an IP issue with Direct State Access too.

That seems unlikely. The S3TC extension has an IP section specifically mentioning the issue. The DSA extension does not. Also, I imagine that AMD wouldn't have bothered with DSA if they had to pay money to implement it.

Groovounet
08-26-2011, 04:38 AM
There is no divisor DSA function either. Actually there is a LOT of issues still with them but I am confident that sooner or latter we will have proper DSA and maybe something better that just a fixed version of the EXT extension.

We need to keep screaming our love for DSA!

Foobarbazqux
08-26-2011, 05:33 AM
We need to keep screaming our love for DSA!


We've been screaming it since July 2008, that's 3 years ago, it doesn't seem like the people we need to convince are listening.

Alfonse Reinheart
08-26-2011, 05:59 AM
3 years? Please.

How long did it take to get buffer objects? From NV_vertex_array_range to the first publication of the ARB_vertex_buffer_object spec was a good 2 years.

FBOs? From the initial GL 2.0 proposal from 3D Labs to final spec was a good 3 years. And that was *just* the EXT version; we didn't get a core version for years after that. Oh, and it still took months to get necessary functionality like blitting and multisample.

How long did we wait until we finally got rid of that 3D Labs nonsense of linking programs? From the first ARB_shader_objects spec to ARB_separate_shader_objects, it took seven years. And you have to understand: nobody liked linked programs. The people clamoring for DSA are nothing compared to the hell raised over that. I was pretty much the only one who stood up for it, and even I look back on that opinion with shame.

The ARB takes a long time to do things that are obvious. Especially if they go against what they've done in the past.

NVIDIA clearly wants it to happen. I'd guess that Apple would probably be their biggest foes in this, as Apple is a very conservative member (consider how long it took to get even rudimentary GL 3.x support on Macs). They spend a lot of time writing part of the GL implementation; they wouldn't want to change their whole driver model just to accommodate DSA. To accommodate a way of doing exactly what you can do now.

Groovounet
08-26-2011, 07:42 AM
You would have been such a great WWE wrestler.

Dark Photon
08-26-2011, 06:15 PM
You would have been such a great WWE wrestler.
:) Fortunately that all that coffee on monitor will wipe right off.

Chris Lux
08-30-2011, 02:43 PM
I want to make one suggestion:

When introducing DSA to core, it is the perfect opportunity to introduce opaque object handles!

Just let the new functions accept only specially created object handles of type GLintptr. So provide new functions e.g. glGenTextureObjects() instead of glGenTextures(). The old objects then only work with the old API and the DSA EXT extension. The ARB DSA extension could clean up in the core with this.

The advantage would be that the driver saves time from the hash translations from GLuint to the internal object (much like bindless graphics does).

As a result deprecate all the non ARB DSA functions in a new deprecation round. I think this would be the perfect opportunity to have opaque handles introduced without breaking old functionality (and deprecating the latter)!

What do you think?

Alfonse Reinheart
08-30-2011, 05:22 PM
That would involve making more APIs. Actually, that would involve making a new API. The ARB has already shown hesitance to create new APIs to do the same thing; that's why DSA isn't in yet. Do you think that they'll be more lenient to basically rewriting their object model for every object?

At that point, you may as well say, "Do Longs Peak!"

Groovounet
08-30-2011, 06:25 PM
I want to make one suggestion:

When introducing DSA to core, it is the perfect opportunity to introduce opaque object handles!



That could make AMD_name_gen_delete relevant. It removes the translation and the mutex (or kind of) in case of multithreading with multiple shared OpenGL context. Potentially using pointers implies a unified namespace for all the object names making glGenNames even more relevant.

No need to expose GLintptr it could be a typedef GLintptr GLname; so that implementation that want security (for WebGL eg) could use index instead of actually pointer.

This is my golden dream! Silver green would be to have just a ARB_direct_state_access extension that works fully and cover everything.

Chris Lux
08-31-2011, 10:12 AM
That would involve making more APIs. Actually, that would involve making a new API. The ARB has already shown hesitance to create new APIs to do the same thing; that's why DSA isn't in yet. Do you think that they'll be more lenient to basically rewriting their object model for every object?

At that point, you may as well say, "Do Longs Peak!"
i don't think its more a new API than DSA is and the more APIs to be introduced should be orders smaller than the APIs introduced by DSA. Basically the glGen* and glBind* APIs need to be adapted to the opaque handles and all the new DSA functions just take the opaque handles...

Could be so simple and beautiful :), plus it would help to get some performance bottlenecks from the drivers.

Are there any ARB members reading here, i would very much love to hear some insights into the DSA to core issues and if something like the opaque handles are discussed.

Alfonse Reinheart
09-02-2011, 01:25 AM
Basically the glGen* and glBind* APIs need to be adapted to the opaque handles and all the new DSA functions just take the opaque handles...

Which means all of the DSA-style functions provided by core extensions like ARB_separate_shader_objects and ARB_sampler_objects are now useless. Not to mention all of the glProgram* and glShader* functions that are already DSA-style.

How does this not constitute a new API? Any calls that don't affect the state of an object or bind/unbind objects (which at this point is probably about 15 functions) are obsolete. You can't use objects created with the new method with the old objects, and vice-versa.

At least with the Longs Peak method, where you make a clean API break, you wouldn't have to specify the interaction between the old way and the new way. That's one reason I suggested profiles, because it allows you to tell the driver up-front that you're not going to do things the old way.

If you're going to make this kind of radical change, you may as well go all the way and start bringing in things like immutable objects and such. At which point, you're back in LP territory.

Gedolo
09-02-2011, 08:02 AM
YOu could also add the lowest previous api version to which the current stuff is compatible too.
Can do this in drivers.

Instead of profiles.
Programs only have to add their OpenGL version number in the context creation call and everything can work exactly as expected.

Chris Lux
09-02-2011, 08:13 AM
At least with the Longs Peak method, where you make a clean API break, you wouldn't have to specify the interaction between the old way and the new way. That's one reason I suggested profiles, because it allows you to tell the driver up-front that you're not going to do things the old way.
we know that the ARB is unwilling to do a clean break, so we can only hope for a slow change of the API to the better. And I think opaque handles are essential for an efficient API, so the introduction of DSA to core can be a bigger step in this slow change process without a radical break of the API. Either use DSA and the opaque handles or don't. As for existing APIs with DSA-like function, these do not interact with the functions that are currently not DSA. So this would mean no problem here, but it is also desirable to change these functions to opaque handles at some point...


If you're going to make this kind of radical change, you may as well go all the way and start bringing in things like immutable objects and such. At which point, you're back in LP territory. yes. and i still hope we can develop OpenGL to the point where we have most of the great stuff Long Peaks promised.

elFarto
09-02-2011, 08:49 AM
...and i still hope we can develop OpenGL to the point where we have most of the great stuff Long Peaks promised.
I really don't know if that's such a good idea. I mean like love the Longs Peak design, but do you really want an API that's had that much change while maintaining backward compatibility (because you know they'll have to do that)?

IMHO I can't see OpenGL ever getting itself out of the hole it's dug for itself. With Longs Peaks they basically said, "the API no long reflects hardware, we need to change it" and then they didn't for whatever reason (did we ever get the real reason they didn't?).

So they tried to deprecate things, and that failed, with the entire deprecation mechanism itself being deprecated.

So now we're left with compatibility and core, and while that sounds good, compatibility mode is likely to be around forever anyway, and core mode doesn't really get us anything.

So, we're left with an API that can't shed itself of all the things we know are wrong with it. Just take the object handle issue. We know using an int for the handle is bad, it causes a lot of cache-misses in the worst possible place for the application (going by the bindless slides, and the Brink SIGGRAPH slides). I don't know if an intptr for handles would solve the issue (wouldn't the driver still have to deference that pointer to get the relevant information?)

I can only see OpenGL heading for increased complexity from now on.

What can be done? Design a new API from scratch[1], of course. OK, so I'm only semi serious about this, but after having seen some libgcm (the PS3 graphics API) examples, I'm thinking why can't we just do this on the PC (with some slightly tweaks for cross-platform-ness and new features of course).

Feel free to ignore me, I'm likely talking out of my arse again :)

Regards
elFarto

[1] and we shall call it OpenGL Bare Metal, because that sounds awesome.

ZbuffeR
09-02-2011, 09:38 AM
What can be done? Design a new API from scratch[1], of course. OK, so I'm only semi serious about this, but after having seen some libgcm (the PS3 graphics API) examples, I'm thinking why can't we just do this on the PC (with some slightly tweaks for cross-platform-ness and new features of course).

I am wondering if that could be done on top of OpenCL. Of course it would be less efficient now, but with time and more generic GPU/CPU mixed chips it might be not so stupid.

elFarto
09-02-2011, 09:55 AM
I am wondering if that could be done on top of OpenCL.
Well, if they (NVIDIA and AMD) were to expose the rasteriser as an OpenCL extension you could probably get some good performance out of it, but attempting to do it without access to all the hardware is not really going to work well enough.

I had thought of using OpenCL as the base for a new API, but it seems like we'd just be adding more complexity to another API (and we've all seen what that gets you).

Regards
elFarto

glfreak
09-02-2011, 10:15 AM
Who cares about a couple of antique software that uses the old-school OpenGL?

Trash the compatibility profile and rewrite the specification, straight to the metal...version 4.5?

Groovounet
09-02-2011, 11:21 AM
Yeah, Maya, 3Dsmax, Autocad, that just "antique software", who cares about them?

...

malexander
09-02-2011, 02:34 PM
Who cares about a couple of antique software that uses the old-school OpenGL?

The ARB should care. Many of these large applications that Groovounet mentioned helped carry OpenGL through its middle years. Simply dropping support for them would be a bad political move. It is unfortunate that these apps have created a lot of inertia, but I believe the fact that OpenGL is still around makes up for it :)

These apps tend to have huge amounts of GL code in them, and there are instances where "updating" to the core profile simply has no positive performance impact. Forcing core on them would just make for busy work, which doesn't make customers or marketing departments very happy.


Trash the compatibility profile and rewrite the specification, straight to the metal...version 4.5?

The compatibility profile is useful for migrating an older GL application to more modern OpenGL. It allows for a more gradual transition of GL code. Otherwise, you'd be stuck making the switch to core, wading through tons of old GL code and hoping it all works out in the end. For a new application, it would make more sense to start out using the core profile, though (or at least adhere to its tenets).

There is certainly no one telling driver developers that they must support the compatibility profile - it's optional. Developers that have had GL implementations for years will likely continue offering the Compatibility profile (AMD, Nvidia). Other developers have retired it, like Apple.

I would like to see DSA as part of the core profile only. Since it's a fairly major API shift, it might possibly be the only change in a new GL version. Those people using the compatibility profile already have all their code built around the old bind-to-modify paradigm, so I can't imagine it'd be a great loss to not have in the compatibility profile.

If opaque handles truly improve performance by a decent margin, I say bite the bullet and reissue all the newer GL3/GL4 direct-access functions with opaque handles, and do it all at once when implementing DSA. I would also say deprecate the int-versions, but I guess deprecation is passe nowadays, so they'd likely have to stay too.

Alfonse Reinheart
09-02-2011, 02:48 PM
And I think opaque handles are essential for an efficient API, so the introduction of DSA to core can be a bigger step in this slow change process without a radical break of the API.

Except that this is a radical break. You would have two kinds of objects for everything.

Look, I would be very happy if we could just snap our fingers and have it be done. But since 3.0, the ARB has shown an unwillingness to add multiple APIs for the same task. And that's exactly what this would be.

DSA is hard enough to justify. Throwing opaque handles (aka: new object types) on top of that isn't going to make it easier.


We know using an int for the handle is bad, it causes a lot of cache-misses in the worst possible place for the application (going by the bindless slides, and the Brink SIGGRAPH slides). I don't know if an intptr for handles would solve the issue (wouldn't the driver still have to deference that pointer to get the relevant information?)

Using a GLuint for the name of a buffer object isn't (exactly) the reason for the cache issue. The problem is that the GL driver does not know that the buffer object:

1: Exists.
2: Has Storage.
3: Isn't mapped.
4: Is on the GPU.

Therefore, it must verify all of these at either gl*Pointer time or glDraw* time. Changing from GLuint to an opaque handle that could be a pointer to a driver object only removes #1 (since it will start pointing at garbage if you delete the object). Even if you had immutable buffer objects, where creating them effectively did a glBufferData with a fixed size, that would only eliminate #2. The rest all still have to be done.

The thing that makes bindless fast is the locking of the buffer object. You're telling the GPU that "I'm not changing this. And throw an error whenever I do. Oh, and don't check to see if I walk off the end of the buffer." Yes, the fact that locking the buffer gives you a GPU pointer rather than converting a GLuint into an object and getting a GPU pointer from that helps certainly, but that's the icing on the cake, not the main course.

To get the lion's share of the advantages of bindless, you have to have some kind of locking API, where you are telling the driver that you will not change the buffer or anything, and thus the buffer becomes fixed and immobile.


I can only see OpenGL heading for increased complexity from now on.

Except that OpenGL has gotten simpler in recent years. It's not just deprecation, but moving data into the shader, separate shaders, etc. All of this has made writing GL applications simpler.

elFarto
09-02-2011, 03:30 PM
Using a GLuint for the name of a buffer object isn't (exactly) the reason for the cache issue. The problem is that the GL driver does not know that the buffer object:

1: Exists.
2: Has Storage.
3: Isn't mapped.
4: Is on the GPU.
The point I was working from was in the bindless presentation:


Most commands (Binds) make the driver fetch object state from sysmem.
The new bottleneck! Hundreds of clocks per cache miss Several Binds per Draw



I can only see OpenGL heading for increased complexity from now on.

Except that OpenGL has gotten simpler in recent years. It's not just deprecation, but moving data into the shader, separate shaders, etc. All of this has made writing GL applications simpler.
Perhaps I badly worded that. It has gotten simpler to use, but to do so they added more APIs in the process. The complexity I meant was not how hard it is to get stuff done, but how many things interact with each other in the API, and how hard that makes it to add future extensions.

Regards
elFarto

V-man
09-02-2011, 05:31 PM
OpenGL is not the kind of API that can go through a radical change.

They need to make a gaming API. Can we consider OpenGL ES a gaming API? Perhaps make a desktop version of it = OpenGL NES (Non Embeded Systems) and only add the latest features (GL 4.x core).

Also, the beauty of GL ES is that it has a the cross platorm egl.

Alfonse Reinheart
09-02-2011, 05:52 PM
And what, we just forget about the mass of GL 3.x hardware out there?

V-man
09-03-2011, 06:56 AM
I would leave that up to nvidia and AMD and khronos and Apple. It might take them years to agree and then to get things done.

Actually, it would be better to have DSA in the next version of GL ES. They should work on that first.

kRogue
09-04-2011, 01:21 AM
They need to make a gaming API. Can we consider OpenGL ES a gaming API? Perhaps make a desktop version of it = OpenGL NES (Non Embeded Systems) and only add the latest features (GL 4.x core).


I would not want OpenGL ES to take over the world. I am going to give an idea how much OpenGL ES2 sucks (some of which I have complained about before):
Texture Image specification: the glTexImage/glTexSubImage commands in OpenGL ES (1 and 2) are awful. The internal format of the texture is essentially set by the 6'th and 7th arguments together (format and type), and even that is only sort-of-ish. ClipPlanes: That *other* 3D API on mobile devices requires clip planes support. The OpenGL ES2 spec brilliantly thought that a developer would just emulate that with discard. Guess what, PowerVR SGX, likely the most widely used GPU has hardware clipping, just not exposed in the ES2 implementation (but they also have a full OpenGL 2 implementation which *does*, oh and that *other* 3D API). Deliberately cutting out the opportunity for a hardware optimization for a classical 3D operation is idiocy. if the hardware platform has to do it via discard, so be it and the GPU makers can warn the developers of it too... oh wait, Imagination Technologies made an extension for ES1(not ES2) for more clip plane support. For the record, discard on PowerVR SGX does horrible awful things to performance. Read back from GL: In OpenGL ES (1 and 2) there is NO support for reading back texture or buffer object data. None. What is particularly offensive is that for almost all (if not all) embedded platforms, the memory model is unified which means that this support would be a no brainer to have. Related to read from GL: OpenGL ES(1 and 2) specs does not support mapping of buffer objects. Worse, the extension for mapping said buffer objects is write only and only the entire thing. No flags, no ranges, no read back. When in a unified memory model, this is just *embarrassing*. The GLES2 shading language is horribly awful when it comes the the precision qualifier ordering fiasco. This I ran into very recently: "in mediump float v" as an argument declaration for a function is WRONG, it should be "mediump in float v"... I found this out when I got my hands on a TEGRA device. The verdict here is that now because the spec is icky, the implementations have a harder time getting this right, so a developer gets to deal with checking what hardware and driver and then going to town. If the spec had been done right, it would have been simple and obvious what would be right in this case, or atleast allow for any order permutation. Unextended GLES2 does not support textureLod functions in a fragment shader. Nevermind that all the hardware out there can do it. Fail. Worse, there is nothing for GLES2 to compute the LOD for you. Fail.


As for EGL I have learned to hate that API more each day. As a reference. the most successful mobile platform that support OpenGL ES (1 and 2), Apple's iOS does NOT use EGL at all. They bypassed that fiasco. But here goes my complaints about EGL:
No core support for sharing GL data across process boundaries. As of now the only way to share image data across process boundaries is by making a pixmap and using a corresponding EGL extension to get an EGLImage from the pixmap. This path does not allow for sharing mipmaps or 1 or 2 channel texture data. The EGLConfig/EGLsurface bit depths fiaco. You specify the bit depths in the EGLConfig, but, for many windowing systems (like X11[shudders]) those bit depths are specified by the XVisual made to make the window. Epic fail. Worse, if we really get into what the system will be doing: compositing. This is a combination of fails of X11 and EGL working together. All an application should have is a color buffer shared across process boundaries with the compositer where the application writes to the buffer and the compositer presents it [there is a relatively simple almost lockless way to do this with 3 buffers per application]. This fail is touching up to X11 fail, so I cannot blame EGL entirely for this. eglGetProcAddress only gets those functions that are not core in the GLES version. Now this is epic ugly for the future. If we such to use EGL to to do the context creation jazz on desktop, we have a big issue: we will check the GL version (for today is it 3.x or 4.x?) and then go fetch function pointers... ah, but in EGL, eglGetProcAddress will not return those functions that are in the spec, only those of extensions. We get around this using direct loading of the function from the shared library.. but this is just foolish.

One might say "So what? A game developer does not care..." But a system does. Roughly speaking, a user interface should be drawn with GL on an embedded platform, you get every possible win: higher performance and lower power consumption. In that regard sharing image data across process boundaries: fonts, themes, etc is really a good idea.

Alfonse Reinheart
09-04-2011, 02:14 AM
Unextended GLES2 does not support textureLod functions in a fragment shader. Nevermind that all the hardware out there can do it. Fail. Worse, there is nothing for GLES2 to compute the LOD for you. Fail.

To be fair, OpenGL didn't support for the latter until GL 3.0. And it's not like there aren't bits of hardware that have required extensions under desktop GL to get at in the past.

I'd say the big problem with GL ES is that they haven't made a version 3.0 yet. They've been stuck on 2.0 and just patching things with extensions.


there is a relatively simple almost lockless way to do this with 3 buffers per application

And you want to do that on an embedded platform, where memory is already at a premium?


eglGetProcAddress only gets those functions that are not core in the GLES version. Now this is epic ugly for the future. If we such to use EGL to to do the context creation jazz on desktop, we have a big issue: we will check the GL version (for today is it 3.x or 4.x?) and then go fetch function pointers... ah, but in EGL, eglGetProcAddress will not return those functions that are in the spec, only those of extensions. We get around this using direct loading of the function from the shared library.. but this is just foolish.

Well, wglGetProcAddress works exactly the same way. Besides, both that and the process issue could easily be cleared up with a couple of extensions. I don't think anyone is arguing that GL ES and EGL should be brought over exactly as is.

Granted, I don't care much for the idea of bringing GL ES over because it still has the same non-object-based problems that desktop GL has. Indeed, core OpenGL 3.2+ is basically GL ES on the desktop. Well, obviously with more stuff, but that stuff actually matters.

kRogue
09-04-2011, 04:07 AM
And you want to do that on an embedded platform, where memory is already at a premium?


Let's take a look at numbers for a moment: firstly the application needs to be double buffered. Secondly, it usually needs a depth and stencil buffer (and a 16bit depth buffer is almost never a good idea). Thirdly, the color buffer is usually 16bpp (typically RGB565), and that is the only buffer that is double and/or triple buffered. Adding up:
Double buffered: 2*16bpp+ 1*32bpp= 64bpp = 8 bytes/pixel Triple buffered: 3*16bpp+ 1*32bpp= 80bpp = 10 bytes/pixel
Not exactly a lot more relatively speaking. It gets better: such an application is NOT fullscreen even, it is often a small animated widget or "app". Resolutions of phones top at out eight-hundred something by four-hundred something, which is less that 400,000 pixels... so even at fullscreen (which is not the usual case really) we are talking less than 800KB, and such gizmos clock in with often atleast 256MB (and more often 512MB). For tablets, the typical resolutions are 1024x768 to 1280x800 which are both hovering about 1 million pixels.. the absolute worst case scenario is an extra 2MB... but it really is much, much less since such triple buffering is only needed for windowed apps. But wait it get better! An application can choose to render at a lower resolution and have the compositor display at a larger resolution (it will get a touch blurry). A number of iPhone apps running on the iPad act this way.



Well, wglGetProcAddress works exactly the same way. Besides, both that and the process issue could easily be cleared up with a couple of extensions. I don't think anyone is arguing that GL ES and EGL should be brought over exactly as is.


That wglGetProcAddress does the exact same thing is not an excuse. Actually it is worse for EGL because wgl jazz considers any GL function not in opengl32.lib (the .lib not the .dll) as an extension function, thus what functions are fetchable via wglGetProcAddress is set in stone, independent of the version of the GL context, the nature of EGL does not have that luxury.

Additionally, when integrating a system, EGL is a burden to implement and if not done properly can leave a major headache (in truth the larger headaches come from X11). If we look at what is needed:

Allocate buffers to draw to (i.e. surfaces) Allocate and bind a GL context(s) Get function pointers

That is all that is needed. EGL in trying to be general makes such simple operations unnecessarily painful. It also unnecessarily hides the fact that for (1) we are really talking about a global memory manager, and all is needed is cross-process texture support to make everyone happy. Nothing more. In that regards, the GL API _already_ has that in texture image specification (or if you wish render buffer storage).

aqnuep
09-04-2011, 05:39 AM
Double buffered: 2*16bpp+ 1*32bpp= 64bpp = 8 bytes/pixel Triple buffered: 3*16bpp+ 1*32bpp= 80bpp = 10 bytes/pixel

I don't quite understand this. If you meant 16bpp for depth and 32bpp for color then you are wrong as double buffered mode has also two color buffers and triple bufferred mode has three color buffers so it should be rather:
Double buffered: 2*16bpp+ 2*32bpp= 96bpp = 12 bytes/pixel Triple buffered: 3*16bpp+ 3*32bpp= 144bpp = 18 bytes/pixel

kRogue
09-04-2011, 06:13 AM
The depth and stencil buffers are single buffered, that is why in both the double and triple buffered situations there is only one 32bpp. In practice this is also true. Indeed:

A compositor does not use the depth/stencil buffers of an application. In fact, it does not even have access to those buffers. GL ES2 does NOT support read back of stencil and depth buffers

So in practice, only the color buffer is doubled (or tripled or whatever). The key point here being that the stencil and depth buffers are "private" data of an application, where as double or triple (or whatever) buffering is about what to render to so that a compositor (or a more low level part) can present it.

As a side note, if one really needs read back from the stencil and depth buffers of the previous and current frame separately, triple buffering _only_ happens on the color buffer, so the relative added memory is even lower as a percentage. The idea behind triple buffering the color buffer is so that neither the compositor or the application has to ever wait: the application chooses whichever buffer is "not the last one rendered to and not the buffer the compositor is presenting" and the compositor always presents the last buffer finished. Only locks are for updating those pointers/integers.

Alfonse Reinheart
09-04-2011, 06:54 AM
Not exactly a lot more relatively speaking.

So you want to increase the overhead of simply drawing something by 20%. And even moreso if you take the bold and unorthodox step of wanting 32-bit colordepth. And how does antialiasing fit into that?

All to make initializing your application and buffer swapping slightly easier? Is that really worth it?


An application can choose to render at a lower resolution and have the compositor display at a larger resolution (it will get a touch blurry). A number of iPhone apps running on the iPad act this way.

Are you really saying that developers should compensate for the increased memory overhead of having a slightly easier to use API by making their apps look worse? This is a good idea?


It also unnecessarily hides the fact that for (1) we are really talking about a global memory manager, and all is needed is cross-process texture support to make everyone happy.

But that assumes that textures and render targets are the same thing. They don't have to be. Especially if there is the issue of swizzling involved in texture data storage.

I don't see a need to force them to be the same concept. Why should everyone who implements EGL be forced to run their OS's, compositors, and window managers exactly as you want? You'd basically be saying that anyone with code that doesn't work this way doesn't get to use your API.

And that's not a place one should put themselves. OpenGL ES and EGL lives on more than just smart phones and tablets. If going through a bit of boilerplate annoyance code means that other kinds of hardware can use those same APIs, why is that a bad thing? Should there be a special API for every single device? Is that somehow better for a programmer's life?

V-man
09-04-2011, 07:02 AM
Texture Image specification: the glTexImage/glTexSubImage commands in OpenGL ES (1 and 2) are awful. The internal format of the texture is essentially set by the 6'th and 7th arguments together (format and type), and even that is only sort-of-ish.


Personally, I have not understood GL's glTexImageXD's internal and external format thing. Example : Why can you upload a floating point texture and have GL convert it to whatever you want?

From what I understand, GL ES's glTexImage2D has the same parameter list as GL but you can only upload texture formats supported by the graphics card. GL ES doesn't do any conversions for you.

Is that good or is that bad?
The more important question is why did they design GL 1.0 that way?


One might say "So what? A game developer does not care..." But a system does. Roughly speaking, a user interface should be drawn with GL on an embedded platform, you get every possible win: higher performance and lower power consumption. In that regard sharing image data across process boundaries: fonts, themes, etc is really a good idea.


What do you mean by "a system does"?
Lower power consumption? Perhaps you are talking about embedded systems. I was talking about a gaming API for desktops. Also, I am not saying bring "GL ES" as is to the desktop.

kRogue
09-04-2011, 07:43 AM
So you want to increase the overhead of simply drawing something by 20%. And even moreso if you take the bold and unorthodox step of wanting 32-bit colordepth. And how does antialiasing fit into that?

All to make initializing your application and buffer swapping slightly easier? Is that really worth it?


Being almost lockless can be worth it. Additionally, the vast majority of embedded systems implement antialiasing at render time. To be precise: most of the embedded system are tile based renderers where the rasterization takes place on GPU in SRAM. Once a tile is completed that SRAM is blitted to the buffer target. For these systems the AA-resolve happens at that blit (roughly speaking). Going a step further, the color buffer can be AA for immediate mode renderers too (such as Tegra) with the knowledge that each color buffer is now 4-16 times as big, however this is likely a _bad_ thing. The wises thing would be that the color buffers are not-AA and an application does the AA-resolve themselves via a GL call. Also, keep in mind that very few(if really any) of the GPU's on embedded can even dream to handle AA rendering of large buffers or even intermediately sized buffers (the exception is ARM Mali where 4xAA is "free", but it is NOT to an AA-buffer, the resolve happens at the copy from SRAM to buffer target).



But that assumes that textures and render targets are the same thing. They don't have to be. Especially if there is the issue of swizzling involved in texture data storage.


What I am advocating does not at all assume they are the same thing. What I am advocating is to allow for textures and renderbuffers to be shareable across process boundaries. That makes no assumption about anything intrinsic to each of them. The functionality to use those is already part of a GL implementation.



I don't see a need to force them to be the same concept. Why should everyone who implements EGL be forced to run their OS's, compositors, and window managers exactly as you want? You'd basically be saying that anyone with code that doesn't work this way doesn't get to use your API.

..
Are you really saying that developers should compensate for the increased memory overhead of having a slightly easier to use API by making their apps look worse? This is a good idea?





I was not advocating that it MUST be a triple buffered approach. Rather I am advocating to view it as allocating buffers that are shared across process boundaries. The compositor and application agree on how. The triple buffering system is just one example. Another example could be single buffered where the compositor waits for for the application to finish a frame or even just uses it as is. The example of an app running in lower resolution that the present is another example. Right now with EGL, with API as is, it is single or double buffered with the same resolution as the presentation. As is, there is a difference between a surface and a pixmap. As is, for a compositor to work requires a number of odd things for a windowing system and GL implementation to provide which if not done *perfectly* will kill a system's performance. On the other hand, fessing up and saying it's about buffers (or really textures one renders to and renderbuffers) the whole jazz gets simpler for everyone: the system integrators, the application writes AND the driver makers. Compounding it, an application developer gets precise control too, via an API they are already using.




Personally, I have not understood GL's glTexImageXD's internal and external format thing. Example : Why can you upload a floating point texture and have GL convert it to whatever you want?


This was great in my eyes that a driver could choose to decide to change the format that helped it perform, again back in the days of fixed function pipeline, this was ok and cool. Now we have sized internal formats too, so we can potentially get the best of both worlds: precise control and/or let the driver decide. I freely admit the latter is not very popular or used often me thinks. The current GLES2 texture basically API killed this off: the understanding is that the format you feed the data into GL is the format of the texture but you cannot specify the texture internal format to guarantee what that format is.

As for converting data, my bet is that those that write drivers are likely going to do a much better job than most application developers. Additionally, a number of SoC's have dedicated silicon for blitting and converting formats. There is one place where format conversion I'd really rather let the driver do: to or from 16bit floating point formats. There are open source implementations out there, but really, who do you think is going to do a better job? My bet is on those that implement GL on the system especially in the embedded world where dedicated silicon for bits of functionality is the rule.



What do you mean by "a system does"?

Make a phone or tablet: you need a compositor, you need theme management, you likely want a browser. Each of these shares resources. The current model is that if you render the UI with GL, you need to replicate common data: fonts, images of themes, etc. That is insane. Throw in safer video decode the situation gets progressively worse.




I was talking about a gaming API for desktops. Also, I am not saying bring "GL ES" as is to the desktop.


Sorry for jumping the gun :p .. though I always figured GL3/4 core profile was meant to be that game-maker centric API.

aqnuep
09-04-2011, 09:11 AM
The depth and stencil buffers are single buffered, that is why in both the double and triple buffered situations there is only one 32bpp.

Ah, actually I was thinking the other way around. I thought the color buffer is the one that you've referred to as 32bpp, but yes, you're right, only color buffer is required to have multiple buffers for double and triple buffer, however, isn't it that today's devices like iPhones and modern Android devices use 32bpp color buffer?

kRogue
09-04-2011, 09:44 AM
Ah, actually I was thinking the other way around. I thought the color buffer is the one that you've referred to as 32bpp, but yes, you're right, only color buffer is required to have multiple buffers for double and triple buffer, however, isn't it that today's devices like iPhones and modern Android devices use 32bpp color buffer?

32bpp in portable world.. generally speaking that is for the high end (as of now). Indeed, the "color resolution" on many LCD screens are not exactly 24-bit :eek: . As one working in system integration, the vast majority times, it is RGB565, but that is for cheaper devices. For higher end we do/can see 24bit and for such devices there are occasion where RGBA8 runs faster than RGB565 (!).

For the Apple platform it was 18bit, then 18bit+dithering and now it is at 24bit take a gander at: http://www.edepot.com/iphone.html



Display Color depth

The number of bits used on the early iPhones to display a single pixel of color is 18 bits, with 6 bits used for each of the Red, Green, and Blue primary colors. 18 bits can provide a maximum of 262,144 colors (2^18).

Note that the standard on PC displays is True Color, using 8 bits for each of the primary colors, for a total of 24 bits per pixel. 24 bits can provide a maximum of 16,777,216 colors (2^24). The iPhone is outclassed by other mobile devices like the PlayStation Portable (PSP), which does use a 24 bit LCD display. The early iPhones are using very cheap LCD solutions to keep costs down.


The iPhone 3GS uses 18 bits plus hardware dithering. What this means is that compared to the iPhone 3G, it is still limited to 262,144 colors, but the iPhone 3GS has hardware that will try to place closely colored values in a pattern to "simulate" the intermediate value that it can't display directly. This will make the display "seem" to be able to display 24 bit True Color, when actually it can't. Starting with the iPhone 4 and the iPad, the display finally has the same quality of 24-bits per pixel seen on PC displays.

V-man
09-04-2011, 09:55 AM
This was great in my eyes that a driver could choose to decide to change the format that helped it perform, again back in the days of fixed function pipeline, this was ok and cool. Now we have sized internal formats too, so we can potentially get the best of both worlds: precise control and/or let the driver decide. I freely admit the latter is not very popular or used often me thinks. The current GLES2 texture basically API killed this off: the understanding is that the format you feed the data into GL is the format of the texture but you cannot specify the texture internal format to guarantee what that format is.

As for converting data, my bet is that those that write drivers are likely going to do a much better job than most application developers. Additionally, a number of SoC's have dedicated silicon for blitting and converting formats. There is one place where format conversion I'd really rather let the driver do: to or from 16bit floating point formats. There are open source implementations out there, but really, who do you think is going to do a better job? My bet is on those that implement GL on the system especially in the embedded world where dedicated silicon for bits of functionality is the rule.


You can upload floating point textures and have GL convert it to RGBA8888.
Or You can upload a RGBA texture and have it stored as GL_ALPHA.
And whatever crazy thing you want to do.
That doesn't make any sense. That is a job for FreeImage or the DirectX Utility or whatever 3rd party library.

D3D's approach was the better one. Expose what the hardware supports and you can query D3D to find out what the hardware supports. There is no guesswork.


GL is the format of the texture but you cannot specify the texture internal format to guarantee what that format is.

Sure, people want guarantees. I want to know what the hardware can do so that I can feed it the data it needs and get the result that I expect.

kRogue
09-04-2011, 10:01 AM
That is a job for FreeImage or the DirectX Utility or whatever 3rd party library.


and there is the rub that with which GL has always had to deal: there is not GL utility really to do it for you (in the DX case, Microsoft wrote the DX Utility, encourages you to use it and you can make a good bet they have optimized it). The current GL allows you to have precise control or to let the driver decide (but GLES does not really let you have either!). For me, I'd trust the driver writers more and in particular, in the embedded world there is likely a piece of silicon that might do the job faster than having the CPU do it.... also even if you do the conversion on CPU, in the embedded world the varieties of the ARM processor are staggering as such getting it highly optimized is not so easy.. but if the system provides it, then you can make a good bet that it will do the "right"/"fastest" thing for the conversion.

My 2 cents/opinion

V-man
09-04-2011, 05:07 PM
Well, ARM processors and mobiles and all that are a different beast and I don't have any suggestions for them.

My suggestion was for a gaming API for the desktop (Windows/Linux/Mac). Don't tell me khronos can't handle it. They have been making several major APIs over the years.

V-man
09-08-2011, 06:54 AM
The core profiles advantages at sucking
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=302389&page=1

xahir
05-02-2012, 06:12 AM
From EXT_direct_state_access (For TextureImageEXT etc.):

If the texture parameter is for an unused name, the name becomes used and the named texture object is set to a new state vector, comprising all the state values listed in...
So, even if the given texture handle(or name), is not generated by GenTextures function it is recognized as a texture!? Or, may be I am getting confused.


If the texture parameter is for a used name and that named texture object has a different target than the specified target parameter, the INVALID_OPERATION error is generated
If given handle's referred texture will not be updated according to the given target, why would I be forced to remember the target of the texture all the time? I mean instead of a single uint, I need to hold on to an integer target property too.

Oh, answer is a bit lines above:

If the texture parameter is zero, then the target parameter selects the default texture of the specified target to update
Hmm, so target is useful if I select zero handle. Ok, good for proxy textures. But using target as proxy texture but using handle other than zero also generates an error too.

Well, I'd go for two different sets of commands where, one set taking only handle for my textures:) and other set for default texture with only target parameter. Then what would be the target of my textures?

myTextureHandle = glCreateTexture(target, internalFormat, size, ...);
Just a create object scheme not so different from shader creation, plus getting texture_storage player into the game. nothing else.

kyle_
05-02-2012, 02:25 PM
From EXT_direct_state_access (For TextureImageEXT etc.):

So, even if the given texture handle(or name), is not generated by GenTextures function it is recognized as a texture!? Or, may be I am getting confused.

Your confusion arises from the fact that the extension is written against GL2.1 spec, where this was true. It continues to be true for some object types in compatibility profiles in newer GL versions. In core profile you always have to Gen all your objects.



If given handle's referred texture will not be updated according to the given target, why would I be forced to remember the target of the texture all the time? I mean instead of a single uint, I need to hold on to an integer target property too.

Oh, answer is a bit lines above:

Hmm, so target is useful if I select zero handle. Ok, good for proxy textures. But using target as proxy texture but using handle other than zero also generates an error too.

Well, I'd go for two different sets of commands where, one set taking only handle for my textures:) and other set for default texture with only target parameter. Then what would be the target of my textures?


Rewrite from scratch would probably avoid targets, as they are useless in most cases other then broken cube map uploads, which could be avoided in such case. Texture proxies aren't particularly good reason for api complication, as they are pretty much useless.



Just a create object scheme not so different from shader creation, plus getting texture_storage player into the game. nothing else.

A spark of brilliance, that was shyly followed with sync objects, but abandoned ever since. 'Gen' stuff is silly, really. I have never seen an actual application that would gen names en-masse (which was probably perceived perf. benefit by api designers). Probably there was much to be gained there in times of SGI (though i find it hard to believe). Right now its just a relic of the past kept in the api for compatibility reasons (there are a couple of things like that in GL).

To recap, you need to take history into account when trying to make sense of GL API. Otherwise you may be seriously surprised here and there.

xahir
05-08-2012, 05:35 AM
...that was shyly followed with sync objects...
What i actually wanted to mean was shader objects, like you give the type and create it. And use that object when you need it via attaching to program etc. Sync objects are a bit weird actually. You use it at the same time of the creation. It is a bit awkward tho. According to ApendixD of spec. sync objects are cross-context share-able, but if you want to wait on a sync object, you should actually have inserted it into the pipeline. Before ARB version (dont remember actually it was either APPLE or NV) it was like you were creating the object then insertion/waiting was taking place. In current way of fences second thread should also wait on the creation to be more precise. May be i am still confused here:)

Anyways, back to DSA, when I see matrix modes etc. i was a bit "hmm" on the weirdness but being written against 2.1 makes a lot of sense now:) But to be honest as DSA being an object creation/referencing scheme instead of the older state machine paradigm; it is highly appreciated by the OOP users or DX migraters. In that sense this extension is much more useful for core-profile users other than compatibility users in the sense that they have a very large codebase depending on old features and not gonna change in the near future. If they are not willing to change why not to serve such beauty to core profile instead:) I believe is long-time users/supporters of OpenGL will migrate to core profile when they are in need of change

kyle_
05-08-2012, 06:53 AM
What i actually wanted to mean was shader objects, like you give the type and create it.
Well, i didnt mean all the usage stuff of sync. Just that its creation model is optimal, and what should have been done with rest of objects (and probably would if it was possible by now), namely - create _single_ object and return a _pointer_ to it. The 'names' stuff is pretty pointless.