Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 3 of 5 FirstFirst 12345 LastLast
Results 21 to 30 of 44

Thread: Direct State Access

  1. #21
    Junior Member Regular Contributor malexander's Avatar
    Join Date
    Aug 2009
    Location
    Ontario
    Posts
    246

    Re: Direct State Access

    Who cares about a couple of antique software that uses the old-school OpenGL?
    The ARB should care. Many of these large applications that Groovounet mentioned helped carry OpenGL through its middle years. Simply dropping support for them would be a bad political move. It is unfortunate that these apps have created a lot of inertia, but I believe the fact that OpenGL is still around makes up for it

    These apps tend to have huge amounts of GL code in them, and there are instances where "updating" to the core profile simply has no positive performance impact. Forcing core on them would just make for busy work, which doesn't make customers or marketing departments very happy.

    Trash the compatibility profile and rewrite the specification, straight to the metal...version 4.5?
    The compatibility profile is useful for migrating an older GL application to more modern OpenGL. It allows for a more gradual transition of GL code. Otherwise, you'd be stuck making the switch to core, wading through tons of old GL code and hoping it all works out in the end. For a new application, it would make more sense to start out using the core profile, though (or at least adhere to its tenets).

    There is certainly no one telling driver developers that they must support the compatibility profile - it's optional. Developers that have had GL implementations for years will likely continue offering the Compatibility profile (AMD, Nvidia). Other developers have retired it, like Apple.

    I would like to see DSA as part of the core profile only. Since it's a fairly major API shift, it might possibly be the only change in a new GL version. Those people using the compatibility profile already have all their code built around the old bind-to-modify paradigm, so I can't imagine it'd be a great loss to not have in the compatibility profile.

    If opaque handles truly improve performance by a decent margin, I say bite the bullet and reissue all the newer GL3/GL4 direct-access functions with opaque handles, and do it all at once when implementing DSA. I would also say deprecate the int-versions, but I guess deprecation is passe nowadays, so they'd likely have to stay too.

  2. #22
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,716

    Re: Direct State Access

    And I think opaque handles are essential for an efficient API, so the introduction of DSA to core can be a bigger step in this slow change process without a radical break of the API.
    Except that this is a radical break. You would have two kinds of objects for everything.

    Look, I would be very happy if we could just snap our fingers and have it be done. But since 3.0, the ARB has shown an unwillingness to add multiple APIs for the same task. And that's exactly what this would be.

    DSA is hard enough to justify. Throwing opaque handles (aka: new object types) on top of that isn't going to make it easier.

    We know using an int for the handle is bad, it causes a lot of cache-misses in the worst possible place for the application (going by the bindless slides, and the Brink SIGGRAPH slides). I don't know if an intptr for handles would solve the issue (wouldn't the driver still have to deference that pointer to get the relevant information?)
    Using a GLuint for the name of a buffer object isn't (exactly) the reason for the cache issue. The problem is that the GL driver does not know that the buffer object:

    1: Exists.
    2: Has Storage.
    3: Isn't mapped.
    4: Is on the GPU.

    Therefore, it must verify all of these at either gl*Pointer time or glDraw* time. Changing from GLuint to an opaque handle that could be a pointer to a driver object only removes #1 (since it will start pointing at garbage if you delete the object). Even if you had immutable buffer objects, where creating them effectively did a glBufferData with a fixed size, that would only eliminate #2. The rest all still have to be done.

    The thing that makes bindless fast is the locking of the buffer object. You're telling the GPU that "I'm not changing this. And throw an error whenever I do. Oh, and don't check to see if I walk off the end of the buffer." Yes, the fact that locking the buffer gives you a GPU pointer rather than converting a GLuint into an object and getting a GPU pointer from that helps certainly, but that's the icing on the cake, not the main course.

    To get the lion's share of the advantages of bindless, you have to have some kind of locking API, where you are telling the driver that you will not change the buffer or anything, and thus the buffer becomes fixed and immobile.

    I can only see OpenGL heading for increased complexity from now on.
    Except that OpenGL has gotten simpler in recent years. It's not just deprecation, but moving data into the shader, separate shaders, etc. All of this has made writing GL applications simpler.

  3. #23
    Junior Member Regular Contributor
    Join Date
    Aug 2006
    Posts
    206

    Re: Direct State Access

    Quote Originally Posted by Alfonse Reinheart
    Using a GLuint for the name of a buffer object isn't (exactly) the reason for the cache issue. The problem is that the GL driver does not know that the buffer object:

    1: Exists.
    2: Has Storage.
    3: Isn't mapped.
    4: Is on the GPU.
    The point I was working from was in the bindless presentation:

    Most commands (Binds) make the driver fetch object state from sysmem.
    • The new bottleneck!
    • Hundreds of clocks per cache miss
    • Several Binds per Draw
    Quote Originally Posted by Alfonse Reinheart
    I can only see OpenGL heading for increased complexity from now on.
    Except that OpenGL has gotten simpler in recent years. It's not just deprecation, but moving data into the shader, separate shaders, etc. All of this has made writing GL applications simpler.
    Perhaps I badly worded that. It has gotten simpler to use, but to do so they added more APIs in the process. The complexity I meant was not how hard it is to get stuff done, but how many things interact with each other in the API, and how hard that makes it to add future extensions.

    Regards
    elFarto

  4. #24
    Super Moderator OpenGL Guru
    Join Date
    Feb 2000
    Location
    Montreal, Canada
    Posts
    4,421

    Re: Direct State Access

    OpenGL is not the kind of API that can go through a radical change.

    They need to make a gaming API. Can we consider OpenGL ES a gaming API? Perhaps make a desktop version of it = OpenGL NES (Non Embeded Systems) and only add the latest features (GL 4.x core).

    Also, the beauty of GL ES is that it has a the cross platorm egl.
    ------------------------------
    Sig: http://glhlib.sourceforge.net
    an open source GLU replacement library. Much more modern than GLU.
    float matrix[16], inverse_matrix[16];
    glhLoadIdentityf2(matrix);
    glhTranslatef2(matrix, 0.0, 0.0, 5.0);
    glhRotateAboutXf2(matrix, angleInRadians);
    glhScalef2(matrix, 1.0, 1.0, -1.0);
    glhQuickInvertMatrixf2(matrix, inverse_matrix);
    glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
    glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);

  5. #25
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,716

    Re: Direct State Access

    And what, we just forget about the mass of GL 3.x hardware out there?

  6. #26
    Super Moderator OpenGL Guru
    Join Date
    Feb 2000
    Location
    Montreal, Canada
    Posts
    4,421

    Re: Direct State Access

    I would leave that up to nvidia and AMD and khronos and Apple. It might take them years to agree and then to get things done.

    Actually, it would be better to have DSA in the next version of GL ES. They should work on that first.
    ------------------------------
    Sig: http://glhlib.sourceforge.net
    an open source GLU replacement library. Much more modern than GLU.
    float matrix[16], inverse_matrix[16];
    glhLoadIdentityf2(matrix);
    glhTranslatef2(matrix, 0.0, 0.0, 5.0);
    glhRotateAboutXf2(matrix, angleInRadians);
    glhScalef2(matrix, 1.0, 1.0, -1.0);
    glhQuickInvertMatrixf2(matrix, inverse_matrix);
    glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
    glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);

  7. #27
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    529

    Re: Direct State Access

    They need to make a gaming API. Can we consider OpenGL ES a gaming API? Perhaps make a desktop version of it = OpenGL NES (Non Embeded Systems) and only add the latest features (GL 4.x core).
    I would not want OpenGL ES to take over the world. I am going to give an idea how much OpenGL ES2 sucks (some of which I have complained about before):
    • Texture Image specification: the glTexImage/glTexSubImage commands in OpenGL ES (1 and 2) are awful. The internal format of the texture is essentially set by the 6'th and 7th arguments together (format and type), and even that is only sort-of-ish.
    • ClipPlanes: That *other* 3D API on mobile devices requires clip planes support. The OpenGL ES2 spec brilliantly thought that a developer would just emulate that with discard. Guess what, PowerVR SGX, likely the most widely used GPU has hardware clipping, just not exposed in the ES2 implementation (but they also have a full OpenGL 2 implementation which *does*, oh and that *other* 3D API). Deliberately cutting out the opportunity for a hardware optimization for a classical 3D operation is idiocy. if the hardware platform has to do it via discard, so be it and the GPU makers can warn the developers of it too... oh wait, Imagination Technologies made an extension for ES1(not ES2) for more clip plane support. For the record, discard on PowerVR SGX does horrible awful things to performance.
    • Read back from GL: In OpenGL ES (1 and 2) there is NO support for reading back texture or buffer object data. None. What is particularly offensive is that for almost all (if not all) embedded platforms, the memory model is unified which means that this support would be a no brainer to have.
    • Related to read from GL: OpenGL ES(1 and 2) specs does not support mapping of buffer objects. Worse, the extension for mapping said buffer objects is write only and only the entire thing. No flags, no ranges, no read back. When in a unified memory model, this is just *embarrassing*.
    • The GLES2 shading language is horribly awful when it comes the the precision qualifier ordering fiasco. This I ran into very recently: "in mediump float v" as an argument declaration for a function is WRONG, it should be "mediump in float v"... I found this out when I got my hands on a TEGRA device. The verdict here is that now because the spec is icky, the implementations have a harder time getting this right, so a developer gets to deal with checking what hardware and driver and then going to town. If the spec had been done right, it would have been simple and obvious what would be right in this case, or atleast allow for any order permutation.
    • Unextended GLES2 does not support textureLod functions in a fragment shader. Nevermind that all the hardware out there can do it. Fail. Worse, there is nothing for GLES2 to compute the LOD for you. Fail.



    As for EGL I have learned to hate that API more each day. As a reference. the most successful mobile platform that support OpenGL ES (1 and 2), Apple's iOS does NOT use EGL at all. They bypassed that fiasco. But here goes my complaints about EGL:
    • No core support for sharing GL data across process boundaries. As of now the only way to share image data across process boundaries is by making a pixmap and using a corresponding EGL extension to get an EGLImage from the pixmap. This path does not allow for sharing mipmaps or 1 or 2 channel texture data.
    • The EGLConfig/EGLsurface bit depths fiaco. You specify the bit depths in the EGLConfig, but, for many windowing systems (like X11[shudders]) those bit depths are specified by the XVisual made to make the window. Epic fail. Worse, if we really get into what the system will be doing: compositing. This is a combination of fails of X11 and EGL working together. All an application should have is a color buffer shared across process boundaries with the compositer where the application writes to the buffer and the compositer presents it [there is a relatively simple almost lockless way to do this with 3 buffers per application]. This fail is touching up to X11 fail, so I cannot blame EGL entirely for this.
    • eglGetProcAddress only gets those functions that are not core in the GLES version. Now this is epic ugly for the future. If we such to use EGL to to do the context creation jazz on desktop, we have a big issue: we will check the GL version (for today is it 3.x or 4.x?) and then go fetch function pointers... ah, but in EGL, eglGetProcAddress will not return those functions that are in the spec, only those of extensions. We get around this using direct loading of the function from the shared library.. but this is just foolish.


    One might say "So what? A game developer does not care..." But a system does. Roughly speaking, a user interface should be drawn with GL on an embedded platform, you get every possible win: higher performance and lower power consumption. In that regard sharing image data across process boundaries: fonts, themes, etc is really a good idea.



  8. #28
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,716

    Re: Direct State Access

    Unextended GLES2 does not support textureLod functions in a fragment shader. Nevermind that all the hardware out there can do it. Fail. Worse, there is nothing for GLES2 to compute the LOD for you. Fail.
    To be fair, OpenGL didn't support for the latter until GL 3.0. And it's not like there aren't bits of hardware that have required extensions under desktop GL to get at in the past.

    I'd say the big problem with GL ES is that they haven't made a version 3.0 yet. They've been stuck on 2.0 and just patching things with extensions.

    there is a relatively simple almost lockless way to do this with 3 buffers per application
    And you want to do that on an embedded platform, where memory is already at a premium?

    eglGetProcAddress only gets those functions that are not core in the GLES version. Now this is epic ugly for the future. If we such to use EGL to to do the context creation jazz on desktop, we have a big issue: we will check the GL version (for today is it 3.x or 4.x?) and then go fetch function pointers... ah, but in EGL, eglGetProcAddress will not return those functions that are in the spec, only those of extensions. We get around this using direct loading of the function from the shared library.. but this is just foolish.
    Well, wglGetProcAddress works exactly the same way. Besides, both that and the process issue could easily be cleared up with a couple of extensions. I don't think anyone is arguing that GL ES and EGL should be brought over exactly as is.

    Granted, I don't care much for the idea of bringing GL ES over because it still has the same non-object-based problems that desktop GL has. Indeed, core OpenGL 3.2+ is basically GL ES on the desktop. Well, obviously with more stuff, but that stuff actually matters.

  9. #29
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    529

    Re: Direct State Access

    Quote Originally Posted by Alfonse Reinheart
    And you want to do that on an embedded platform, where memory is already at a premium?
    Let's take a look at numbers for a moment: firstly the application needs to be double buffered. Secondly, it usually needs a depth and stencil buffer (and a 16bit depth buffer is almost never a good idea). Thirdly, the color buffer is usually 16bpp (typically RGB565), and that is the only buffer that is double and/or triple buffered. Adding up:
    • Double buffered: 2*16bpp+ 1*32bpp= 64bpp = 8 bytes/pixel
    • Triple buffered: 3*16bpp+ 1*32bpp= 80bpp = 10 bytes/pixel

    Not exactly a lot more relatively speaking. It gets better: such an application is NOT fullscreen even, it is often a small animated widget or "app". Resolutions of phones top at out eight-hundred something by four-hundred something, which is less that 400,000 pixels... so even at fullscreen (which is not the usual case really) we are talking less than 800KB, and such gizmos clock in with often atleast 256MB (and more often 512MB). For tablets, the typical resolutions are 1024x768 to 1280x800 which are both hovering about 1 million pixels.. the absolute worst case scenario is an extra 2MB... but it really is much, much less since such triple buffering is only needed for windowed apps. But wait it get better! An application can choose to render at a lower resolution and have the compositor display at a larger resolution (it will get a touch blurry). A number of iPhone apps running on the iPad act this way.

    Well, wglGetProcAddress works exactly the same way. Besides, both that and the process issue could easily be cleared up with a couple of extensions. I don't think anyone is arguing that GL ES and EGL should be brought over exactly as is.
    That wglGetProcAddress does the exact same thing is not an excuse. Actually it is worse for EGL because wgl jazz considers any GL function not in opengl32.lib (the .lib not the .dll) as an extension function, thus what functions are fetchable via wglGetProcAddress is set in stone, independent of the version of the GL context, the nature of EGL does not have that luxury.

    Additionally, when integrating a system, EGL is a burden to implement and if not done properly can leave a major headache (in truth the larger headaches come from X11). If we look at what is needed:

    • Allocate buffers to draw to (i.e. surfaces)
    • Allocate and bind a GL context(s)
    • Get function pointers


    That is all that is needed. EGL in trying to be general makes such simple operations unnecessarily painful. It also unnecessarily hides the fact that for (1) we are really talking about a global memory manager, and all is needed is cross-process texture support to make everyone happy. Nothing more. In that regards, the GL API _already_ has that in texture image specification (or if you wish render buffer storage).

  10. #30
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    941

    Re: Direct State Access

    Quote Originally Posted by kRogue
    • Double buffered: 2*16bpp+ 1*32bpp= 64bpp = 8 bytes/pixel
    • Triple buffered: 3*16bpp+ 1*32bpp= 80bpp = 10 bytes/pixel
    I don't quite understand this. If you meant 16bpp for depth and 32bpp for color then you are wrong as double buffered mode has also two color buffers and triple bufferred mode has three color buffers so it should be rather:
    • Double buffered: 2*16bpp+ 2*32bpp= 96bpp = 12 bytes/pixel
    • Triple buffered: 3*16bpp+ 3*32bpp= 144bpp = 18 bytes/pixel
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •