Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: subroutines and multiple render targets

  1. #1
    Junior Member Newbie
    Join Date
    Oct 2012
    Posts
    24

    subroutines and multiple render targets

    I wonder how to use a shader with subroutines and multiple render targets. I have a fragment shader which contains a few subroutines. (I posted only relevant pieces of the code).

    Code :
    #version 420
     
    in vec3 data1;
    in vec3 data2;
     
    layout (location = 0) out vec4 fragColor;
    layout (location = 1) out vec3 target1;
    layout (location = 2) out vec3 target2;
     
     
    subroutine void RenderPassType();
    subroutine uniform RenderPassType RenderPass;
     
    subroutine (RenderPassType)
    void first()
    {
     
    }
     
    subroutine (RenderPassType)
    void second()
    {
    	target1 = data1;
    	target2 = data2;
    }
     
    subroutine (RenderPassType)
    void third()
    {
    	//some computations
    	fragColor = result
    }

    I created two fbo's which contain rendering targets (textures).

    First:
    Code :
    glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, depthTex, 0);
    Second:
    Code :
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, tex1, 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, tex2, 0);

    A data for glDrawBuffers:
    Code :
    GLenum drawBuffers1[] = {GL_NONE, GL_NONE, GL_NONE,);
    GLenum drawBuffers2[] = {GL_NONE, GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1);

    Rendering:

    Code :
    //bind the first fbo
    //bind the first subroutine
    glDrawBuffers(3, drawBuffers1);
    ...
    //unbind the first fbo
     
     
    //bind the second fbo 
    //bind the second subroutine
    glDrawBuffers(3, drawBuffers2);
    ...
    //unbind the second fbo
     
    ...

    The above code works, but I wonder if it is the only right way to use subroutines together
    with multiple render targets. Is something which I can do better (more efficient) ?

    Do I have to use (location = 0) for the default framebuffer output ?

    When I first bind the second fbo and subroutine and next the first fbo and subroutine glDrawBuffers
    clears all textures. What can I do about it ?

  2. #2
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    If you have multiple color outputs then you have to write a value to all of them, otherwise the ones that you don't write any value to become undefined.
    Think about it this way: if you don't output any color to a particular color output, a write to that color output still will happen, just an implementation dependent value value may be written there.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  3. #3
    Junior Member Newbie
    Join Date
    Oct 2012
    Posts
    24
    Ok, but I want to use my fragment shader with subroutines for a deferred rendernig. Then a render loop looks like this:

    1. gbuffer stage
    2 for each light:
    (additive blending)
    - shadow map stage
    - shading stage

    The first subroutine destroys render targets(GL_NONE). How to solve it ?

  4. #4
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    First, why would you like to use the same shader for all of these? I think it's fine if you switch shaders 3 times in a frame. You try to be overzealous on batching things together.

    In fact, for the shadow map rendering stage you don't even need a fragment shader, thus if you have one, that will actually cost you a lot in performance.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  5. #5
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    In fact, for the shadow map rendering stage you don't even need a fragment shader, thus if you have one, that will actually cost you a lot in performance.
    Not unless you're using a compatibility profile.

    Quote Originally Posted by OpenGL 4.3 core specification, Ch 15:
    If the current fragment stage program object has no fragment shader, or no fragment program object is current for the fragment stage, the results of fragment shader execution are undefined.
    The fragment depth is part of the results of fragment shader execution. And thus it is undefined. And having an empty fragment shader doesn't lose you any performance.

  6. #6
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    Quote Originally Posted by Alfonse Reinheart View Post
    If the current fragment stage program object has no fragment shader, or no fragment program object is current for the fragment stage, the results of fragment shader execution are undefined.
    Exactly, fragment shader execution is undefined, but per-fragment operations are not, thus depth testing and depth writing IS defined.

    You don't need a fragment shader for depth-only rendering, not even in core profile.

    Quote Originally Posted by Alfonse Reinheart View Post
    The fragment depth is part of the results of fragment shader execution. And thus it is undefined. And having an empty fragment shader doesn't lose you any performance.
    The fragment depth output by a fragment shader is, of course, makes no sense without a fragment shader, but you don't need a fragment shader to output a depth, you have a fixed function one.

    Also, having an empty fragment shader DOES cost your performance. Depth testing and depth write is done by fixed function hardware which can have a throughput of many pixels per clock, especially with hierarchical Z, not to mention that no shader cores are needed to be used. While if you have a fragment shader you'll have to launch shaders on your shader engines and even if they do nothing, it will still cost you several clocks per tile (unless the driver is smart enough to just ignore your empty fragment shader in which case you'll get the same results).
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  7. #7
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    The fragment depth output by a fragment shader is, of course, makes no sense without a fragment shader, but you don't need a fragment shader to output a depth, you have a fixed function one.
    If you're right, please point to the part of the OpenGL 4.3 specification that states that the input to the depth comparison does not have to come from the fragment shader. Because it clearly says:

    Quote Originally Posted by GL 4.3 Ch 15
    The processed fragments resulting from fragment shader execution are then further processed and written written to the framebuffer as described in chapter 17
    Those "fragments resulting from fragment shader execution" contain undefined data, as previously stated. If you're right, you should be able to show me where the resulting fragments will get defined data from.

    Also, having an empty fragment shader DOES cost your performance. Depth testing and depth write is done by fixed function hardware which can have a throughput of many pixels per clock, especially with hierarchical Z, not to mention that no shader cores are needed to be used. While if you have a fragment shader you'll have to launch shaders on your shader engines and even if they do nothing, it will still cost you several clocks per tile (unless the driver is smart enough to just ignore your empty fragment shader in which case you'll get the same results).
    You're making some pretty big assumptions here. Like the assumption that fragment processing can be skipped by the hardware at all. That it can copy data directly from the rasterizer to the ROPs without some kind of per-fragment shader happening to intervene.

    I'd like to see proof that this is true. Preferably in the form of performance tests on the difference between an empty fragment shader and not having one. On multiple different kinds of hardware.

  8. #8
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    Quote Originally Posted by Alfonse Reinheart View Post
    If you're right, please point to the part of the OpenGL 4.3 specification that states that the input to the depth comparison does not have to come from the fragment shader.
    The description of per-fragment operations starts as follows:
    Quote Originally Posted by GL 4.3 Ch 17.3
    A fragment produced by rasterization with window coordinates of (xw; yw) modifies the pixel in the framebuffer at that location based on a number of parameters and conditions. We describe these modifications and tests, diagrammed in figure 17.1, in the order in which they are performed. Figure 17.1 diagrams these modifications and tests.
    Thus you can see that fragments are produced by the rasterization, not by fragment shaders.

    Also, what chapter 15 tells is actually:
    Quote Originally Posted by GL 4.3 Ch 15
    When the program object currently in use for the fragment stage (see section 7.3) includes a fragment shader, its shader is considered active and is used to process fragments resulting from rasterization (see section 14).If the current fragment stage program object has no fragment shader, or no fragment program object is current for the fragment stage, the results of fragment shader execution are undefined.
    The processed fragments resulting from fragment shader execution are then further processed and written written to the framebuffer as described in chapter 17.
    Thus despite the results of fragment shader execution are undefined, most data required for per-fragment operations is not affected by the fragment shader, namely pixel ownership and scissor test, multisample operations, depth and stencil test (unless depth or stencil export is used, which is obviously not the case if there is no fragment shader) and occlusion query.

    Quote Originally Posted by Alfonse Reinheart View Post
    Those "fragments resulting from fragment shader execution" contain undefined data, as previously stated. If you're right, you should be able to show me where the resulting fragments will get defined data from.
    Once again, the results of fragment shader execution are undefined, not the fragments. By default the results of fragment shader execution are the color outputs, unless an other explicit mechanism is used like depth or stencil export.

    Quote Originally Posted by Alfonse Reinheart View Post
    You're making some pretty big assumptions here. Like the assumption that fragment processing can be skipped by the hardware at all. That it can copy data directly from the rasterizer to the ROPs without some kind of per-fragment shader happening to intervene.
    ROPs deal with blending, sRGB conversion and logic op. Obviously those will get undefined data, thus you cannot expect anything good to be in your color buffers after all. But depth/stencil is not handled by the same piece of hardware. Neither are e.g. scissor and pixel ownership tests.

    Quote Originally Posted by Alfonse Reinheart View Post
    I'd like to see proof that this is true. Preferably in the form of performance tests on the difference between an empty fragment shader and not having one. On multiple different kinds of hardware.
    The fact that a lack of a fragment shader doesn't result in an error, but just the results of its execution are undefined is already a good enough reason, I believe. Don't you think it wasn't an oversight from the ARB but they did define it this way intentionally?

    Not to mention that in most cases the depth and stencil test happens before the fragment shader is executed (if there is one), these are the so called "early tests" now even explicitly mentioned in the spec, and if the early tests fail then no fragment shaders are executed even if there is one. So if you think hardware cannot avoid the execution of fragment shaders then why do you think the ARB cared writing about it in the spec or why they introduced a mechanism to force/disable early depth in the fragment shader?
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  9. #9
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    Quote Originally Posted by aqnuep View Post
    Thus you can see that fragments are produced by the rasterization, not by fragment shaders.
    So how does `discard` work?

    Fragments are produced by the rasterizer, and modified by the fragment shader. Just like vertices are produced by Vertex Specification and modified by the vertex shader. Later stages work based on the vertices output by the vertex shader, just as later stages work based on fragments output by the fragment shader.

    Quote Originally Posted by aqnuep View Post
    Thus despite the results of fragment shader execution are undefined, most data required for per-fragment operations is not affected by the fragment shader, namely pixel ownership and scissor test, multisample operations, depth and stencil test (unless depth or stencil export is used, which is obviously not the case if there is no fragment shader) and occlusion query.
    The fragment shader does not output the X or Y position. It does output the depth value, and therefore it will output an undefined value. You seem to have missed this important part of what you quoted:

    Quote Originally Posted by The Spec
    The processed fragments resulting from fragment shader execution are then further processed and written written to the framebuffer as described in chapter 17.
    "The processed fragments resulting from fragment shader execution" have undefined data. The depth output from the fragment shader is part of that fragment. And it has an undefined value.

    The fact that a lack of a fragment shader doesn't result in an error, but just the results of its execution are undefined is already a good enough reason, I believe. Don't you think it wasn't an oversight from the ARB but they did define it this way intentionally?
    How does that prove anything? Lack of a vertex shader also doesn't produce an error, but there's almost nothing useful you can do with that.

    these are the so called "early tests" now even explicitly mentioned in the spec
    Ahem:

    Quote Originally Posted by GL 4.3, 14.9
    The other operations are performed if and only if early fragment tests are en-abled in the active fragment shader
    They are explicitly mentioned solely for the Image Load/Store feature of being able to force early tests so that you can get more guaranteed behavior. And you need a fragment shader to activate it.

    So if you think hardware cannot avoid the execution of fragment shaders then why do you think the ARB cared writing about it in the spec or why they introduced a mechanism to force/disable early depth in the fragment shader?
    You seem to be misunderstanding the difference between "discarding the fragment before the fragment shader" and "processing the fragment without a fragment shader and getting defined results". The latter is what you're alleging that OpenGL allows; the former is what OpenGL actually allows.

    Also, you haven't provided any evidence that not providing a fragment shader is faster in any way than providing an empty one. Which is what you claimed and what I asked you to provide.

  10. #10
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    Quote Originally Posted by Alfonse Reinheart View Post
    So how does `discard` work?
    Well, guess what, if discard is used by the shader then it is very likely that early depth/stencil tests are disabled automatically, because otherwise it might not result in proper results, yeah? Or actually the early tests still can happen, but the depth/stencil writes cannot as they might get discarded.

    Quote Originally Posted by Alfonse Reinheart View Post
    Fragments are produced by the rasterizer, and modified by the fragment shader.
    This is the important part, they are modified only.

    Quote Originally Posted by Alfonse Reinheart View Post
    It does output the depth value, and therefore it will output an undefined value.
    It just optionally outputs a depth. Just because these are all transparent from the user's point of view, it doesn't mean it won't matter. If you output depth in your fragment shader, once again, those early depth/stencil tests will be disabled, unless you force early tests using the functionality introduced by ARB_shader_image_load_store or you use ARB_conservative_depth properly.

    Quote Originally Posted by Alfonse Reinheart View Post
    You seem to have missed this important part of what you quoted:
    No, I didn't miss it. Once again, fragment shader only modifies some data, namely it outputs color values and optionally modifies depth.

    Quote Originally Posted by Alfonse Reinheart View Post
    Lack of a vertex shader also doesn't produce an error
    Lack of vertex shader DOES produce an INVALID_OPERATION error at draw time in core profile.

    Quote Originally Posted by Alfonse Reinheart View Post
    Also, you haven't provided any evidence that not providing a fragment shader is faster in any way than providing an empty one. Which is what you claimed and what I asked you to provide.
    Come on, if you don't have to run fragment shaders on the shader cores, more vertex shaders can be in flight at once. How wouldn't be that faster? Think about it.

    You get defined results for depth and stencil, even without a fragment shader. The spec is unfortunately pretty vague on this, but you can try it out anytime if you don't believe me. Just create a core profile context, setup a vertex shader-only program, set draw buffers to none, attach a depth texture to your framebuffer and let it go. I bet you it will work.

    Also, if you don't believe in driver behavior, you can anytime ask the vendors on their opinion, they are the ARB, they can tell it for sure. I'm not willing to continue arguing about facts.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •