optimization: ... color picking from FBO attachment without stalling the pipeline

john_connor · December 29, 2018, 9:57am

hi,

i have a FBO with several texture attachments:
– color rgba
– glow 1
– glow 2 blurred (double buffered)
– depth stencil
– 32-bit integer texture for “object ID”

for now, i just read back every frame the ID of the last attachment to identify the object the cursor is currently over. but as far as i know, that stalls the pipeline. what options do i have to optimize all of that ??

resizing the view port to 1 pixel when rendering into the “object ID”
creating a double-buffered pixel buffer object (large enough) and read the PBO of the previous frame (to get the object ID)

but on point 1:
isnt that state change also a bit slowing down, and all those drawcalls have to be made twice …

are there other options to improve that process ?

GClements · December 29, 2018, 11:59am

Are you storing the IDs in a separate render pass? If you’re updating all of the buffers as part of a single render pass, it’s debatable whether a separate pass would be an optimisation even if it was only to a 1x1 framebuffer. Unless you’re going to perform a separate frustum cull, you’ll be processing the geometry twice. You can’t re-use gl_Position directly if you change the viewport, although you could replace the general matrix transformation with something simpler. And the implementation still needs to perform primitive assembly and clipping.

The PBO only needs to hold one pixel per frame (or at most a small neighbourhood around the cursor). There’s no point in reading the entire attachment then ignoring most of it.

If you only want one pixel, you don’t need an entire framebuffer attachment. You could just have the fragment shader check whether gl_FragCoord is equal to the cursor position and store the ID in a buffer variable if so. This relies on being able to enable early fragment tests, as you wouldn’t want the variable to be updated if the depth test fails.

That’s more work for the fragment shader but less memory bandwidth. As to whether it’s faster, you’d have to test it.

john_connor · December 29, 2018, 1:05pm

nope, all tied together … object ID is an “uniform int ID” i change for each mesh/material pair

thats an very interesing alternative!! i think i’ll give that one a try … for now i just need 1 pixel to check whats under the cursor

thank you very much!

Alfonse_Reinheart · December 29, 2018, 2:06pm

It should be noted that this does not work. Not unless you do a full depth pre-pass first. Even rendering things in farthest-to-nearest order guarantees nothing about the order the fragment shaders/early depth test hardware are invoked in. It will still be possible for the farther value to overwrite a value created by a nearer one. Or rather, OpenGL offers no guarantees about it.

Now, you can still get the meat of this idea by doing the equivalent of a single-pixel version of the linked-list, order-independent-transparency technique. Basically, every time you hit that pixel, you bump an atomic counter and add an entry into an linked list (the “pointers” in the list are the values from the atomic counter, and the list itself is really just array indices index by that counter).

john_connor · December 30, 2018, 5:35pm

… could you explain that a bit more ?

what i’ve done yesterday is binding a shader storage buffer (size = 1 int), and in my fragment shader that draws the scene:


uniform int ID;
uniform ivec2 cursor_position;
layout (std430, binding = 2) buffer ObjectID_Block { int objectID; };

void main()
{
    /* ... */

    if (ivec2(gl_FragCoord.xy) == cursor_position)
        objectID = ID;
}

… it works as i expect, depth testing enabled, no early_test at all needed (ok, the scene is currently very simple atm) … undefined behavior ?

for now, i only need the nearest pixel’s ID (uniform, constant for each mesh), transparency is not yet implemented

GClements · December 31, 2018, 2:39am

Without early fragment tests, the fragment shader will be executed even if the depth test fails. But even with early fragment tests, if you overwrite a pixel with a closer pixel (so the fragment shader is run more than once), there’s no guarantee as to the order in which the stores will be performed. The framebuffer is guaranteed to contain the fragment from the last primitive rendered (in the order they’re passed to glDrawElements() etc), but there is no such guarantee regarding stores to buffer variables, images, etc. IOW, the implementation isn’t required to process primitives in any particular order, only to ensure that the resulting framebuffer contents are as if primitives were processed in the order given.

So in order to use a buffer variable, you’d need to use a depth pre-pass so that the fragment shader is only executed for the closest fragment. Even then, if you had two fragments with equal depth, the value in the buffer variable might come from a different primitive to the value in the framebuffer (a depth pre-pass requires that the subsequent pass uses GL_LEQUAL, as GL_LESS would always fail).

The alternative is to record every fragment rendered at the cursor’s coordinates, then find the closest one afterwards.