Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 23

Thread: More efficeint drawing of vertex data

  1. #11
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,050
    Quote Originally Posted by kRogue View Post
    No; OpenGL does not compete particularly well against other 3D API's. Tools are much poorer than other API's. Compliance of drivers are all over the place. Feature sets of drivers are also all over the place. I wish that was not the case, but it is. GL has more warts than a warthog.
    That's all very true. But you're rather missing the point.

    My point is that having bindless vertex arrays would not affect OpenGL's competitiveness with other APIs. Thus, his argument is bunk.

    Quote Originally Posted by kRogue View Post
    Um, the addresses are essentially per-context or per-context-share-group. For each context one wants to use a buffer, one needs to make it resident for -each- context it will be accessed. The address is not absolute and is virtual and all those magicks. (I admit I do not have hard proof, but it makes sense).
    Sure, the addresses are virtual. And they're not permanent.

    But there is no guarantee in the specification that virtual addresses are directly bound to a context or share group. The spec just says that if you try, it won't work and will possibly crash. Which sounds very much like it might still work. Hence the security concern.

    Quote Originally Posted by kRogue View Post
    Now, the above does not mean one cannot still use the address to feed the vertex buffers directly, a different extension essentially. Here the royal pain is that there is also VAO and it just makes the whole thing a giant mess.
    In what way does VAOs make things any more of a "giant mess" than you get in, say, Direct3D?

    Quote Originally Posted by kRogue View Post
    Moreover, some hardware does not have that the GPU addresses are 64-bit things anyways, they are .. something different and potentially quite wonky.
    In a fixed version of bindless vertex arrays, the handle would work like bindless textures. It's a number, but it's completely arbitrary and specified by the API. So the number can be whatever the driver needs it to be in order to be fast. And I would say that any offset should be user-provided, ala glBindVertexBuffer(s).

    It would have to be highly unconventional hardware indeed that couldn't generate some kind of value that:

    * 64-bits (or less) in size
    * Encodes whatever driver data is needed
    * Has a non-pointer-accessing transformation into the actual driver data
    * Has a constant-time transformation into the actual driver data
    * Is not a GPU address

    I can't imagine what the driver's data would have to be for that to not be possible.

  2. #12
    Senior Member OpenGL Pro
    Join Date
    Jan 2007
    Posts
    1,789
    Quote Originally Posted by Alfonse Reinheart View Post
    For all of NVIDIA's allegations of the "pain" of OpenGL name lookup performance-wise... typical use of the OpenGL API is still faster than D3D11 for the same scene. Valve discovered that one when they ported the Source engine to it...
    That's not quite true.

    What Valve ported was D3D9 code, and other benchmarks exist (e.g the Unigine benchmarks; see http://www.g-truc.net/post-0547.html) which show the opposite.

  3. #13
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    612
    Quote Originally Posted by Alfonse Reinheart View Post
    In what way does VAOs make things any more of a "giant mess" than you get in, say, Direct3D?
    The crutch of the issues are the following: there there would multiple ways to specify attribute sources: VAO using existing GL API's, and bindless. Another question that comes up is then is bindless part of VAO state? Which state takes precedence (the current API or bindless or whatever was called last)? Is it ok to mix (i.e. some attribute with bindless some with traditional API)? None of these questions are show stoppers, but the answers are a mess.


    In a fixed version of bindless vertex arrays, the handle would work like bindless textures. It's a number, but it's completely arbitrary and specified by the API. So the number can be whatever the driver needs it to be in order to be fast. And I would say that any offset should be user-provided, ala glBindVertexBuffer(s).

    It would have to be highly unconventional hardware indeed that couldn't generate some kind of value that:

    * 64-bits (or less) in size
    * Encodes whatever driver data is needed
    * Has a non-pointer-accessing transformation into the actual driver data
    * Has a constant-time transformation into the actual driver data
    * Is not a GPU address

    I can't imagine what the driver's data would have to be for that to not be possible.
    Essentially, a buffer object is then accessed by a handle instead of GLuint name. That is the only difference now essentially. What happens in practice is then that a driver could, recast the handle to a pointer type. If the handle can be used directly by a shader, then things are much more complicated. One could say, NO, I want that to be a magic number directly used by the GPU. From there is gets much more complicated and highly GPU architecture specific about how it can access memory from shaders. Once one goes there, the nature of caching and how to partition caches becomes more involved. In a sick way, bindless_texture is sort-of-easier because samper units do their own caching (on top of some other caches too)... but raw memory is uglier. If the caching is not right (especially for devices with shared memory with CPU), then this is nasty... not to mention the entire SIMD mess.

    Do not get me wrong, I love bindless attributes, I think it is great. I also think NV_shader_buffer_load is freaking awesome and sweet. However, before howling at the top of your lungs that hardware "surely works a way", I invite you to read the official docs of AMD and Intel on their hardware. I think you will find it fascinating and horrifying at the same time. They are so different in various places and memory access by GPU on shared memory systems (like Intel for example) is not simple.

  4. #14
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,050
    Quote Originally Posted by mhagain View Post
    That's not quite true.

    What Valve ported was D3D9 code, and other benchmarks exist (e.g the Unigine benchmarks; see http://www.g-truc.net/post-0547.html) which show the opposite.
    That benchmark includes a D3D9 test, which also appears faster than OpenGL code. So their results differ from Valve's even with the same test. Since they couldn't even reproduce Valve's results, it is entirely possible that they just wrote poor OpenGL code.

    Which wouldn't be surprising, since you're comparing an engine few people use to one of the most frequently used game engines on the market today. Valve has the pick of the litter when it comes to OpenGL professionals, while Unigine Corp... just makes an engine.

    Equally importantly, that test shows that D3D9 and D3D11... don't differ that much in performance. It's only a few FPS, less than 10 in most tests. Thus, assuming that the Unigine guys know how to write D3D9 and 11 code, an OpenGL engine that's as good as Valve's ought to be able to achieve performance parity.

    Quote Originally Posted by kRogue View Post
    The crutch of the issues are the following: there there would multiple ways to specify attribute sources: VAO using existing GL API's, and bindless.
    That's very true. But that ship has already sailed. We have two ways of creating textures. We already have two ways of specifying vertex data (with vertex_attrib_binding). We have two ways of using shader code (all-in-one-program vs. separate programs). We just got a new way of creating and modifying virtually every object type.

    Really, who'd notice one more at this point

    Quote Originally Posted by kRogue View Post
    Another question that comes up is then is bindless part of VAO state? Which state takes precedence (the current API or bindless or whatever was called last)? Is it ok to mix (i.e. some attribute with bindless some with traditional API)? None of these questions are show stoppers, but the answers are a mess.
    Those questions are answered by the NVIDIA extension (yes, they're VAO state; there is a VAO enable/disable for bindless VAs; see previous). I don't see how these answers create "a mess".

    Quote Originally Posted by kRogue View Post
    However, before howling at the top of your lungs that hardware "surely works a way", I invite you to read the official docs of AMD and Intel on their hardware. I think you will find it fascinating and horrifying at the same time. They are so different in various places and memory access by GPU on shared memory systems (like Intel for example) is not simple.
    Do you have any links I could look at? I'm not really sure what to Google around for.

  5. #15
    Intern Contributor
    Join Date
    Nov 2013
    Posts
    51
    @Alfonse Reinheart

    I might have exaggerated a bit with my "OpenGL is ridiculously ancient" claim, this claim is a subjective judgement of some features of the API and is a personal opinion. I'm so annoyed by ancient state machines.
    However, how the vertex processing is divided over the vertex units should have moved to the GPU driver instead of application logic layer a decade ago. Even before the arrival of newer GPU architectures such as GCN, putting that decision in the application logic is bad design. Very, very, very bad design!


    Although specifications of DirectX12, Metal, Mantle and glNext are scarce. There is some information about their features available.
    A common theme in the next generation API's however is reducing CPU overhaed.
    The idea of bindless being one element in the next gen API's.

    http://www.extremetech.com/computing...ctx-12-support
    http://www.g-truc.net/doc/Candidate%...OpenGL%205.pdf
    http://www.anandtech.com/show/7889/m...s-to-directx/2
    http://www.extremetech.com/gaming/18...ke-mantle-dx12

    https://www.gamingonlinux.com/articl...hics-apis.4753
    https//www.slideshare.net/slideshow/embed_code/42464487?startSlide=12

    Round 1:
    You misunderstand.
    The statement about competing with current API's (DirectX 11.0 - 11.2) and the statement about OpenGL being ridiculously ancient are two statements that are not suppose to have anything to do with each other.

    The ridiculously ancient statement is subjective and my own personal view on the API.

    Round 2:
    See above.

    NOTE: If you know how to specify bindless vertex processing better please do help out with commenting how it should be done.
    Last edited by Gedolo2; 02-18-2015 at 06:57 AM.

  6. #16
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,050
    I'm so annoyed by ancient state machines.
    ... OK. But that has nothing to do with bindless vertex arrays. Since, you know, that functionality uses the state machine.

    You seem to be ignorant on what NV_vertex_buffer_unified_memory does, so allow me to enlighten you. It is not "pass pointers to the shader, and let it figure out how to fetch its vertex data via gl_VertexID and gl_InstanceID." The only thing this extension does is replace the "bind buffer object" call with "bind GPU pointer". The shader itself is completely unaffected. It's all about turning a heavy-weight "turn object name into GPU pointer" operation into a 0-weight operation.

    Bindless vertex array calls set VAO state, just like binding buffers. So they're using the state machine. So why are you not annoyed with NVIDIA's use of "ancient state machines" in that API?

    However, how the vertex processing is divided over the vertex units should have moved to the GPU driver instead of application logic layer a decade ago. Even before the arrival of newer GPU architectures such as GCN, putting that decision in the application logic is bad design. Very, very, very bad design!
    Why is that bad design, exactly? As kRogue pointed out, some hardware has no dedicated vertex pulling logic, and some does. So if it were all done explicitly by the driver, you'd basically be screwing over all hardware that does. And for little reason.

    I for one don't want to have to tie my shaders into a specific vertex format. I don't want to have to write different shaders just to be able to use different vertex formats with the exact same programming logic as before. If a piece of hardware needs the VS to do that, it can generate that code efficiently and slip it in at the top of my shader when needed. Vertex format changes are heavyweight; that's why ARB_vertex_attrib_binding exists.

    Again, not that this has anything to do with bindless vertex arrays, since they still set state.

    The idea of bindless being one element in the next gen API's.
    Um, they are only "bindless" in the sense that they don't have a context to bind things to. The basic operations that are analogous to binding still exist in, for example, Apple Metal. When you're setting up your vertex data, you pass it MTLBuffer objects, not GPU pointers. Apple Metal doesn't even offer bindless texture support; you have to set up textures in the shader's environment, just like you do with its equivalent to UBOs.

    Now, as others have mentioned, Apple Metal targets mobile GPUs only, so it's technically older. But my point is this: you're assuming what will happen without any form of evidence. The only next-gen API that we've actually seen doesn't use bindless. So why are you so convinced that glNext will?

    Besides bindless textures, which is already near-core, of course.

  7. #17
    Intern Contributor
    Join Date
    Nov 2013
    Posts
    51
    It's more the overuse of state machines that bugs me.

    I have not read through all the information on the nvidia extension, was in a hurry.
    Looks like I made a mistake.
    (I am annoyed by their use of ancient state machines in that API. How dare they!)
    The lightening of the operation is a big step in the right direction. Should be more common in graphics API's.

    It's bad design because you can't expect application developers to start optimizing for all the different graphics cards. I'm assuming non bindless is having to manually specify which vertex unit you use for each batch of vertex operations.
    The whole point of a graphics API is to allow a unified way of sending instructions to the graphics cards. While not have to build in logic to handle each and every different piece of hardware in your application logic. This is good software design 101.
    If the hardware doesn't have dedicated logic write the logic in the software driver for the graphics card. The driver could check for the availability of hardware logic and use the hardware logic over the driver software logic. No hardware has to be screwed over. You can have the best results for all cases with no degradation in performance for the worst case scenario.
    The driver manufacturer knows best how to divide the work over the vertex processing hardware and how to optimize it for maximum performance without letting other hardware architectures get in the way. An application by definition cannot do this without starting to write code for each and every graphics card family!

    Quote Originally Posted by Alfonse Reinheart View Post

    Now, as others have mentioned, Apple Metal targets mobile GPUs only, so it's technically older. But my point is this: you're assuming what will happen without any form of evidence. The only next-gen API that we've actually seen doesn't use bindless. So why are you so convinced that glNext will?

    Besides bindless textures, which is already near-core, of course.
    trying to predict the future sometimes results in getting it wrong.
    Due to not having specifications out yet, I have to make some guesses.
    I'm not a hardware specialist, unlike you.
    Last edited by Gedolo2; 02-18-2015 at 08:37 AM.

  8. #18
    Member Regular Contributor
    Join Date
    Dec 2009
    Posts
    251
    Quote Originally Posted by Gedolo2 View Post
    It's more the overuse of state machines that bugs me.

    I have not read through all the information on the nvidia extension, was in a hurry.
    Looks like I made a mistake.
    (I am annoyed by their use of ancient state machines in that API. How dare they!)
    The lightening of the operation is a big step in the right direction. Should be more common in graphics API's.

    It's bad design because you can't expect application developers to start optimizing for all the different graphics cards. I'm assuming non bindless is having to manually specify which vertex unit you use for each batch of vertex operations.
    The whole point of a graphics API is to allow a unified way of sending instructions to the graphics cards. While not have to build in logic to handle each and every different piece of hardware in your application logic. This is good software design 101.
    If the hardware doesn't have dedicated logic write the logic in the software driver for the graphics card. The driver could check for the availability of hardware logic and use the hardware logic over the driver software logic. No hardware has to be screwed over. You can have the best results for all cases with no degradation in performance for the worst case scenario.
    The driver manufacturer knows best how to divide the work over the vertex processing hardware and how to optimize it for maximum performance without letting other hardware architectures get in the way. An application by definition cannot do this without starting to write code for each and every graphics card family!


    trying to predict the future sometimes results in getting it wrong.
    Due to not having specifications out yet, I have to make some guesses.
    I'm not a hardware specialist, unlike you.
    Well, actually the trend in graphics APIs (Metal, DX12, glNext) is to go away from abstraction, closer to the hardware.

  9. #19
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,050
    Well, actually the trend in graphics APIs (Metal, DX12, glNext) is to go away from abstraction, closer to the hardware.
    You can't really say that yet. D3D12 and glNext are (generally) unknown with regard to their level of abstraction. And Metal, no matter what Apples wants to claim, is generally speaking no closer to the hardware than OpenGL ES.

    The biggest structural change for Metal compared to OpenGL is the explicit control over command queues. And that's not so much a lowering of the abstraction so much as a different abstraction. It's a sideways move, going from an immediate abstraction to a buffered abstraction. After all, it's not like you're writing command tokens into memory buffers and then telling the GPU to execute them. You're still using API calls to set things like viewports, etc. You're just calling them in a different way.

    But the core abstractions we see in OpenGL are still there in Metal. You have state objects (admittedly immutable, but the object abstraction remains). You have resource objects. You have vertex formats definitions defined by the API. And so forth.

  10. #20
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,050
    Quote Originally Posted by Gedolo2 View Post
    I have not read through all the information on the nvidia extension, was in a hurry.
    Looks like I made a mistake.
    I... what?

    I want to recap what has just happened in this thread, so that you can fully understand the problem.

    You asked for bindless vertex arrays. You linked to various posts, articles, and videos about it. You made various claims about what it would do for OpenGL competitively and how OpenGL would be if this weren't available.

    And yet... you couldn't be bothered to research it yourself. You couldn't take 10 minutes of your life to learn exactly what it was you were asking for.

    In short, you have asked for something that, by your own admission, you don't even know what it is!

    We're not talking about digging into the micro-details of various hardware here. We're not talking about deep knowledge of various drivers and the way hardware works. We're not talking about being "a hardware specialist". We're talking about reading and understanding a publicly available extension specification.

    When it comes to asking for something from somebody else, a piece of advice: your time is not as valuable as theirs. So you should never be "in a hurry" to make a suggestion; this forum will still be here tomorrow. Research first, understand first; then bother someone once you have some understanding of the idea.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •