Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 4 FirstFirst 1234 LastLast
Results 11 to 20 of 31

Thread: Suggestions for OpenGL 5

  1. #11
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,726
    However, this model seems to work for D3D with most bugs being silly driver side things.
    You're conflating two different issues. The fact that D3D has fewer evident driver bugs is not because of how it compiles shaders. It's due to several factors:

    1: Writing D3D drivers is simpler than writing GL drivers. Simpler code means less bugs.

    2: D3D is more heavily used than OpenGL. Because of that, more bugs are found. And because D3D software is quite popular, they are more quickly responded to than GL bugs. The best way to find and squash bugs is to use something, and code that doesn't get used is more likely to be buggy.

    Changing the language that gets compiled will change very little about how many bugs you will encounter. Indeed, you'll likely get more bugs because driver developers will have to maintain their GLSL compilers too, for backwards compatibility reasons.

    If you want to decrease compiler bugs, then put together a real test suite for GLSL. Then find a way to make driver developers test and fix bugs based on it.

    You aren't guaranteed any form of optimization for your shaders. You can mitigate this by running the shaders through an offline "optimizer" that basically just moves text around... That idea isn't the best if you can just avoid the text distribution altogether. If you had your own shader bytecode generator, that you were in control of, you could implement any optimizations you like. (e.g., you could build your own work atop of systems like LLVM.) Not that you couldn't technically do that already, but the bytecode solution is a bit more "workable."
    What kind of optimizations are you talking about? Loop unrolling? Function inlining? Dead code removal? That's not very much in the grand scheme of shader logic; most of the real optimizations will have to be done by the driver.

    One hardware's optimization is another's pessimization. Unrolling a loop on one piece of hardware can give a performance boost; on another, it can make things slower. The driver knows which is better because it's hardware-specific. Better to rely on the driver to do the right thing than to rely on your personal hope that you can out-think the people who actually know their hardware.

    However, SM2 (which is a large chunk of the target market for indie developers currently) should be supported. (That also corresponds roughly to the feature set available of mobile devices currently, if I'm not mistaken.)
    If hardware isn't being supported, it won't get new OpenGL APIs. New APIs like this shader language of yours. Therefore, even if it could run it, it won't because the IHV isn't supporting the hardware anymore.

    The "large chunk of the target market for indie developers" is primarily hardware that isn't being supported. Integrated Intel chips and any of AMD's hardware pre-HD models. NVIDIA is still supporting the GeForce 6xxx and 7xxx lines, but outside of that, you've got nothing.

    Thus, any effort in this regard is going to help less than half of the "target market for indie developers." So why bother?

    It probably could with intrinsics, so to speak.
    At which point, you simply have a more cumbersome way of specifying the blending equation. That's not particularly helpful.

  2. #12
    Intern Contributor
    Join Date
    May 2012
    Posts
    98
    I think programmable blending will open the "forbidden" feature of reading from and writing to specific pixels inside the fragment shader.

  3. #13
    Advanced Member Frequent Contributor
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    896
    I think programmable blending will open the "forbidden" feature of reading from and writing to specific pixels inside the fragment shader.
    What do you think image_load_store allows you to do?

  4. #14
    Intern Contributor
    Join Date
    May 2012
    Posts
    98
    Yeah but you cannot directly read and write the front buffer.

  5. #15
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    941
    Quote Originally Posted by Janika View Post
    Yeah but you cannot directly read and write the front buffer.
    You cannot load/store the window system provided color buffers, that's true, but that's rather because you cannot access the default framebuffer color buffers as textures in general (like you can in D3D). That's definitely an issue with OpenGL and was requested for a long time. But that's another story.
    For programmable blending, the fact that you have to work with an FBO and then you have to copy the results to the default framebuffer won't cause you any problems, it would be ultra-fast.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  6. #16
    Junior Member Newbie AaronMiller's Avatar
    Join Date
    May 2012
    Posts
    5
    @Alfonse Reinheart
    You're conflating two different issues. The fact that D3D has fewer evident driver bugs is not because of how it compiles shaders. It's due to several factors:

    1: Writing D3D drivers is simpler than writing GL drivers. Simpler code means less bugs.

    2: D3D is more heavily used than OpenGL. Because of that, more bugs are found. And because D3D software is quite popular, they are more quickly responded to than GL bugs. The best way to find and squash bugs is to use something, and code that doesn't get used is more likely to be buggy.

    Changing the language that gets compiled will change very little about how many bugs you will encounter. Indeed, you'll likely get more bugs because driver developers will have to maintain their GLSL compilers too, for backwards compatibility reasons.
    Okay. Have a look here: http://aras-p.info/texts/VertexShaderTnL.html I ran into a similar need to emulate fixed-function support in my code, between different states (at the contractor's request). These sorts of cases are few and far between, but having bytecode support made things much easier. I could generate the bytecode and have that recompiled dynamically. Not something you normally need to do. Compare that to GLSL which takes much longer (link provided in a prior post within this thread). Feasible for Direct3D. Not for OpenGL.

    What kind of optimizations are you talking about? Loop unrolling? Function inlining? Dead code removal? That's not very much in the grand scheme of shader logic; most of the real optimizations will have to be done by the driver.

    One hardware's optimization is another's pessimization. Unrolling a loop on one piece of hardware can give a performance boost; on another, it can make things slower. The driver knows which is better because it's hardware-specific. Better to rely on the driver to do the right thing than to rely on your personal hope that you can out-think the people who actually know their hardware.
    Mobile devices come to mind. Not all mobile devices have an offline machine code compiler for deploying with your project. For the devices that do not, the code has to be generated on the device, then sent back to you for later deployment (to avoid initial compilation). Additionally, there are cases where you may be able to implement a bytecode compiler that executes faster than the default GLSL one. So, not just optimization in the sense of code produced, but in time executed. You can control exactly which is more important. Keep in mind that driver writers must choose a balance between the two and may make a choice completely opposite of what you like.

    Optimizations that are specific to the hardware can still be done. For example, D3D bytecode has a "rep" and "endrep" instruction pair. These could be unrolled by the driver if it determines it's a good idea to do so. Likewise, instructions that were generated in an unrolled state could be "rolled up again." Also, don't trust the driver writers... Especially if they're from Intel. So, in some cases, yes, I can get more optimized results for the hardware than the driver if I have the ability to do so. With GLSL it's a free for all and you don't know what you're going to get back. The same is still true for the bytecode variant, but you at least have more control of it then.

    A benefit of having a standardized bytecode (which would likely represent how the underlying hardware ISAs work anyway) is that you would know what that bytecode is. For some people that's not important. They don't care how something gets done as long as it gets done. Those people aren't affected by this proposed extension. Then there are the people who like GLSL specifically, the presence of an IR binary format wouldn't affect them. A lot of people seem to want a standardized bytecode format, myself included.

    Your argument, to me, seems mostly like "Java shouldn't exist because I don't use it," in a manner of speaking. For better communication between the two of us, I request that you respond to the following inquiries in great detail.
    Why do you prefer GLSL does not have a standardized IR?
    What is your ideal communication mechanism between OpenGL shader representations and the GPU?
    What is it that you think I'm suggesting (in terms of this bytecode extension), exactly?


    And, just a footnote for you. If I had to write an OpenGL and Direct3D driver for some GPU (that supports shaders), this is how I would do it.
    1. Convert GLSL source to D3D bytecode. (Possibly marked with a "special version token" to indicate that GL extensions can be supported.)
    2. Optimize the bytecode.
    3. Convert to binary form for whichever GPU backend I'm supporting.

    If I had the same job, but with my proposed extension, this is how I would do it.
    1. Two frontends. One for GLSL. One for D3D bytecode.
    2. Generate to the IR (which could support per-vendor/driver extensions, just as GLSL does).
    3. Convert to binary form for whichever GPU backend I'm supporting.

    The steps don't change that much, and they both support some codebase sharing. (Yes, I realize that drivers are separated between D3D and GL. Source code can still be shared.) The benefit is that two stages of the pipeline no longer have to be done at runtime, but the IR can still be generated at runtime, dynamically. As mentioned above, this is for certain code injection techniques for shaders.

    If OpenGL were designed to use the bytecode IR from the start, would you disagree with the pipeline even with GLSL support?
    If so, please provide a strict example as to why my version is less optimal, or why it could not possibly help any developer.

    Personally, I feel that GLSL should be handled offline anyway. You said it yourself, "1: Writing D3D drivers is simpler than writing GL drivers. Simpler code means less bugs." I would push for Khronos to release an offline GLSL compilation kit if it were possible for them to do so. Then the drivers would be simpler if only for the fact that there's a common IR to share and no need for language compilation. Reference drivers could be written that target software specifically using the IR. Interpreting an IR is simpler than interpreting almost free-form text. The driver no longer has to maintain decisions like register allocation for variables. (Though, they would have to do so from the IR. They would have to anyway because of D3D's presence. Or they could use LLVM.) Again, IR compilation is simpler than language compilation. This would make writing drivers for GL simpler, which would introduce less bugs.

    @aqnuep, Janika
    I found the link mentioning programmable blending. Bending The Graphics Pipeline (SIGGRAPH 2010). See page 13. The GL_ARB_shader_image_load_store is more recent (2012) so it may be used to emulate such support. That said, I do think a more separate shader stage for more elegant code could be implemented. I haven't used this, so it's not something I can really comment on.

    Cheers,
    Aaron

  7. #17
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,726
    Your argument, to me, seems mostly like "Java shouldn't exist because I don't use it," in a manner of speaking.
    No, my argument is, "some of your arguments don't make sense."

    Would it be nice to have an intermediate language? Yes. Should we have one because it makes it easier to optimize the code? No, because it doesn't make optimizing code easier; it only gives the illusion of that. Should we have one because it makes the driver less buggy? No, because it doesn't make the driver less buggy.

    See the difference? I'm not arguing against your position per-se; I'm arguing that some of your arguments for this are of dubious merit.

    Okay. Have a look here: http://aras-p.info/texts/VertexShaderTnL.html I ran into a similar need to emulate fixed-function support in my code, between different states (at the contractor's request). These sorts of cases are few and far between, but having bytecode support made things much easier. I could generate the bytecode and have that recompiled dynamically. Not something you normally need to do. Compare that to GLSL which takes much longer (link provided in a prior post within this thread). Feasible for Direct3D. Not for OpenGL.
    That's nice but... how does this have anything to do with what I said? This has nothing to do with compiler bugs, which was what the part you quoted from me was talking about.

    You seem to be arguing points that I'm not making. I never said that it wouldn't make this situation easier. I said that it wouldn't cause fewer compiler bugs.

    Optimizations that are specific to the hardware can still be done. For example, D3D bytecode has a "rep" and "endrep" instruction pair. These could be unrolled by the driver if it determines it's a good idea to do so. Likewise, instructions that were generated in an unrolled state could be "rolled up again."
    That would seem to work against the whole "making the compiler faster" issue, since now every time you load the bytecode, it has to scan it and decide to re-optimize things that you mistakenly de-optimized for it. Not only that, because it's in assembly, it has much weaker semantics for it than if it were GLSL.

    Also, don't trust the driver writers... Especially if they're from Intel. So, in some cases, yes, I can get more optimized results for the hardware than the driver if I have the ability to do so. With GLSL it's a free for all and you don't know what you're going to get back. The same is still true for the bytecode variant, but you at least have more control of it then.
    First, what cases are you talking about? Do you have any specific examples?

    Secondly and more importantly, you seem to be arguing against yourself here. On the one hand, you're saying that you can optimize better than the driver. But you just said that the driver can basically override anything you were doing. It can re-roll loops you unrolled.

    So really, you have no more control either way. You're ultimately trusting the compiler not to do something stupid. Personally, if I'm going to put my faith in a compiler, I'd rather give it more semantics and information to work with rather than less.

    If I had to write an OpenGL and Direct3D driver for some GPU (that supports shaders), this is how I would do it.

    1. Convert GLSL source to D3D bytecode.
    Quite frankly, that's a horrible idea.

    The whole point of shoving GLSL down the driver's throat is so that we can provide more semantic information to the compiler. And with greater semantic information comes more chances for hardware-specific optimization. Real data structures, functions, parameter passing, etc, all are crucial bits of information that can be used when making hardware-specific optimizations.

    By forcing this two-stage compilation model on the system, you're basically throwing vital information away. You're taking the advantages that native GLSL provides and just pretending they don't exist just to make your code slightly easier to write.

    I hope, for the sake of hardware optimizations, that you aren't hired to write GL drivers anywhere.

    If OpenGL were designed to use the bytecode IR from the start, would you disagree with the pipeline even with GLSL support?
    If the ARB had just kept using and updating ARB assembly, they would never have created GLSL in the first place. Odds are, HLSL would have simply become a universal standard, and there'd be some SourceForge project that contains a HLSL-to-ARB_assembly compiler that most people who don't want to code to the assembly uses.

    So your hypothetical question is moot. Indeed, if they kept up with ARB assembly, and someone suggests GLSL now, I'd tell them to take a hike.

    OpenGL only should have one shading language. And quite frankly, that's my biggest argument against this:

    Personally, I feel that GLSL should be handled offline anyway. You said it yourself, "1: Writing D3D drivers is simpler than writing GL drivers. Simpler code means less bugs."
    Yes, making drivers simpler would lead to less bugs. But the ARB is clearly reluctant to make backwards incompatible changes to OpenGL. Even getting rid of immediate mode rendering was the equivalent of pulling teeth, and it's not like most GL implementations don't still support all the old junk that was ostensibly ripped out. So the only way this proposal would actually be implemented is if you now have two compilers in the driver.

    Two compilers is, pretty much by definition, less simple than one. Even if you internally make your GLSL compiler go to the bytecode, that's still two compilers you have to support. That means two places where a failure can happen.

    Intel struggles with supporting one compiler; how can you expect them to work with two?

    The only way this could make things simpler is if you rewound time and made the ARB stick with ARB assembly instead of using 3D Labs' asinine GLSL proposal. However, given that we're already in this mess, and we can't suddenly magic ourselves out of this mess, what you're suggesting isn't helping.

    Ultimately, the best course of action is to just live with it. OpenGL is imperfect, and trying to make it perfect is only going to make the imperfections worse.

    Oh, and let's not forget a simple, practical fact: your proposal is nothing the ARB hasn't heard dozens of times before. Go ahead; search this forum. It's been suggested over and over since GLSL was adopted. It hasn't happened in almost 10 years. The arguments for it haven't changed a bit.

    And yet, it still hasn't been done. It took almost a decade to get separate shaders and program binaries, and those are also things people asked for even before GL 2.0. So I wouldn't hold my breath.

  8. #18
    Junior Member Newbie AaronMiller's Avatar
    Join Date
    May 2012
    Posts
    5
    No, my argument is, "some of your arguments don't make sense."

    Would it be nice to have an intermediate language? Yes. Should we have one because it makes it easier to optimize the code? No, because it doesn't make optimizing code easier; it only gives the illusion of that. Should we have one because it makes the driver less buggy? No, because it doesn't make the driver less buggy.

    See the difference? I'm not arguing against your position per-se; I'm arguing that some of your arguments for this are of dubious merit.
    Ah, okay. I had noticed you seemed to target select statements, but I didn't think anything of it.

    You seem to be arguing points that I'm not making. I never said that it wouldn't make this situation easier. I said that it wouldn't cause fewer compiler bugs.
    I was cramming stuff into certain "sections" of the post, which is why they didn't seem correlated. I forgot to move it all out to a separate "general purpose" section. Anyway, my point there was in argument for the overall feature. Irrelevant now.

    First, what cases are you talking about? Do you have any specific examples?
    Mostly mobile targets. Drivers can't spend too much time optimizing output bytecode or there would be a noticeable hiccup when you try using apps on the (already terribly slow) processors (CPU and GPU included). So, it must find a balance. I think OpenGL may eventually exist as an "embedded" profile (instead of just GL ES), so it would make sense for having bytecode support in such cases.

    Secondly and more importantly, you seem to be arguing against yourself here. On the one hand, you're saying that you can optimize better than the driver. But you just said that the driver can basically override anything you were doing. It can re-roll loops you unrolled.

    So really, you have no more control either way. You're ultimately trusting the compiler not to do something stupid. Personally, if I'm going to put my faith in a compiler, I'd rather give it more semantics and information to work with rather than less.
    I was providing separate arguments/POVs in favor of the bytecode there. They weren't meant to go together, necessarily. In a sense, I'm arguing the IR will already have many of the basic optimizations covered, but the driver can still do hardware specific optimizations. Loop unrolling being one of them.

    Quite frankly, that's a horrible idea.

    The whole point of shoving GLSL down the driver's throat is so that we can provide more semantic information to the compiler. And with greater semantic information comes more chances for hardware-specific optimization. Real data structures, functions, parameter passing, etc, all are crucial bits of information that can be used when making hardware-specific optimizations.

    By forcing this two-stage compilation model on the system, you're basically throwing vital information away. You're taking the advantages that native GLSL provides and just pretending they don't exist just to make your code slightly easier to write.

    I hope, for the sake of hardware optimizations, that you aren't hired to write GL drivers anywhere.
    I should probably clarify what I meant by that a bit more. The D3D bytecode could be modified to support certain semantics if necessary, and metadata could be attached as well. (Just like my proposal for extensions.) By baking that information in, the runtime can still make hardware optimizations. I think it makes sense to only translate one IR to the equivalent hardware backend. I'm not sure what benefit GLSL (or any high-level language, for that matter) would have if I could bake in "intents" into the bytecode as metadata anyway. Regardless, I'm not sure it's a bad idea. It seems to me that OpenGL and Direct3D render shaders at about the same speed anyway. (It would be difficult to benchmark this accurately I think, though.) Realizing that the same metadata can be (though is not necessarily required) do you still think this approach would be a horrible idea? I believe it to be reasonable. Any information the driver can use for improving performance can be encoded as optional metadata, and only one IR would have to be supported.

    Two compilers is, pretty much by definition, less simple than one. Even if you internally make your GLSL compiler go to the bytecode, that's still two compilers you have to support. That means two places where a failure can happen.

    Intel struggles with supporting one compiler; how can you expect them to work with two?
    lol! That was hilarious. And, good point!

    Ultimately, the best course of action is to just live with it. OpenGL is imperfect, and trying to make it perfect is only going to make the imperfections worse.
    I figured as much, but...

    Oh, and let's not forget a simple, practical fact: your proposal is nothing the ARB hasn't heard dozens of times before. Go ahead; search this forum. It's been suggested over and over since GLSL was adopted. It hasn't happened in almost 10 years. The arguments for it haven't changed a bit.

    And yet, it still hasn't been done. It took almost a decade to get separate shaders and program binaries, and those are also things people asked for even before GL 2.0. So I wouldn't hold my breath.
    ... I was hoping that maybe, just maybe, the ARB may consider it. Even if just as a bolt-on. Hell, even updating the ARB assembly would be acceptable to me.




    I don't mind GLSL (as a language). But there are a couple of things I'd like to see from its evolution, if that is what we must deal with...

    1. The ability to embed some form of ARB assembly, maybe. (I haven't though this one through, but it's an interesting idea.)
    2. Better GL interfacing in terms of "info logs." (It takes a bit of extra work to get filenames from the errors presented. It's not difficult to hack it in, but it would be nice if I could specify how errors are formatted.) This may pose some security risks, but so do extensions like GL_AMD_pinned_memory.
    3. The ability to specify, from the GL, as well as in GLSL, how much optimization to apply to certain routines. Some form of control via "pragma" directives, or whatever, would be beneficial.

    Cheers,
    Aaron

  9. #19
    Member Regular Contributor
    Join Date
    Apr 2009
    Posts
    258
    Quote Originally Posted by Alfonse Reinheart View Post
    Quite frankly, that's a horrible idea.

    The whole point of shoving GLSL down the driver's throat is so that we can provide more semantic information to the compiler. And with greater semantic information comes more chances for hardware-specific optimization. Real data structures, functions, parameter passing, etc, all are crucial bits of information that can be used when making hardware-specific optimizations.
    I dont buy it. At least with argumentation that GLSL semantics enable some meaningful optimizatinos. GLSL is usefull, but manily for the shader developers - we would be better of with extending arb (or just basing bytecode on it) and having Khronos provide compiller GLSL -> ARB. This would eliminate only some parsing effort, probably very minor, bigger gain would be standarized syntax that is the same everywhere, its pretty much hard to screw up on ARB programs syntax (but yeah, khronos doesnt do code so that is impossible scenario).

    As far as i know nvidia does exactly this with their compiler. Their ARB (NV) programs are some intermediate form they use for GLSL shaders (when writing particularily convulted structures or hitting compiler bug in GLSL they will often present you with shader in such a form).

  10. #20
    Super Moderator OpenGL Guru
    Join Date
    Feb 2000
    Location
    Montreal, Canada
    Posts
    4,421
    Quote Originally Posted by aqnuep View Post
    You cannot load/store the window system provided color buffers, that's true, but that's rather because you cannot access the default framebuffer color buffers as textures in general (like you can in D3D). That's definitely an issue with OpenGL and was requested for a long time. But that's another story.
    For programmable blending, the fact that you have to work with an FBO and then you have to copy the results to the default framebuffer won't cause you any problems, it would be ultra-fast.
    Also, you can't attach the backbuffer depth buffer to your FBO, which is something that has existed in D3D since version 8.
    http://www.opengl.org/wiki/Framebuff...in_framebuffer
    Last edited by V-man; 06-20-2012 at 04:11 AM.
    ------------------------------
    Sig: http://glhlib.sourceforge.net
    an open source GLU replacement library. Much more modern than GLU.
    float matrix[16], inverse_matrix[16];
    glhLoadIdentityf2(matrix);
    glhTranslatef2(matrix, 0.0, 0.0, 5.0);
    glhRotateAboutXf2(matrix, angleInRadians);
    glhScalef2(matrix, 1.0, 1.0, -1.0);
    glhQuickInvertMatrixf2(matrix, inverse_matrix);
    glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
    glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •