AMD conditional rendering broken

so conditional render always passes if i use GL_QUERY_NO_WAIT, or stalls the GPU driver if i use GL_QUERY_WAIT as a parameter. i couldn’t find much info about this issue, except for: OpenGL 4.0 drivers status (updated)
in my case, it acting differently. i have HD6670 with Catalyst 13.1 Core Profile Forward-Compatible Context 9.12.0.0

example of rendering(simplified):
//render to occlusion query

void renderToOcclusion()
    {
        if(!isEnabled || !isInFrustum || !isDiscardable) {
            return;
        }
        glBeginQueryARB(GL_SAMPLES_PASSED_ARB, occQuery);
        modelViewMatrix = currentViewMatrix * modelMatrix;

        //RENDER
        lodAvailable = modelStorage[modelId].lodAvailable;
        glBindVertexArray(modelStorage[modelId].data[lodAvailable].vertexArrayObject);

        for(unsigned s = 0, off = 0; s < modelStorage[modelId].data[lodAvailable].numSurfaces; s++)
        {
            SShader[3].applyProgram();

            glUniformMatrix4fv(SShader[shaderId].shaderSet[programId].uniform_modelViewMatrix, 1, 0, glm::value_ptr(modelViewMatrix));
            glUniformMatrix4fv(SShader[shaderId].shaderSet[programId].uniform_projectionMatrix, 1, 0, glm::value_ptr(currentProjectionMatrix));

            glDrawElements(GL_TRIANGLES, modelStorage[modelId].data[lodAvailable].numIndices[s], GL_UNSIGNED_SHORT, BUFFER_OFFSET(off));
            off += modelStorage[modelId].data[lodAvailable].numIndices[s] * sizeof(short);
        }

        glEndQueryARB(GL_SAMPLES_PASSED_ARB);

        //debug:
        unsigned numSamples = 0;
        unsigned occQueryAvailable = 0;
        while(!occQueryAvailable) {
            glGetQueryObjectuiv(occQuery,GL_QUERY_RESULT_AVAILABLE, &occQueryAvailable); 
        }
        glGetQueryObjectuiv(occQuery, GL_QUERY_RESULT, &numSamples);
        LOG << numSamples << endl;   //!THAT OUTPUTS CORRECT NUMBER OF SAMPLES!
        }
    }

//render:

if(isEnabled && isInFrustum)
        {
            if(isDiscardable)
                glBeginConditionalRender(occQuery, GL_QUERY_NO_WAIT);

            modelViewMatrix = defaultViewMatrix * modelMatrix;
            normalMatrix = glm::transpose(glm::inverse(glm::mat3(modelViewMatrix)));

            //RENDER
            drawGeometry();
            if(isDiscardable)
                glEndConditionalRender();
        }

as described above, that results in always rendering all objects with GL_QUERY_NO_WAIT, and freeze with GL_QUERY_WAIT. but glGetQueryObjectuiv(occQuery, GL_QUERY_RESULT, &numSamples) generates correct values and if instead of conditional render i use it’s results, i get correct occlusion. but old occlusion query is such a pain in the ass to synchronize. i just don’t want to use it anymore. i did expect AMD still having minor problems with their OpenGL implementation, but this is ridiculous. i’m in the debugging nightmare.

If you use GL_QUERY_NO_WAIT and the draw command that you perform the occlusion query on didn’t finish at the time you call BeginConditionalRender, then it should in fact perform the conditional draws. That’s how it should work.

If you use GL_QUERY_WAIT and the draw command that you perform the occlusion query on didn’t finish at the time you call BeginConditionalRender, then it should wait for the result and only continue afterwards (this wait however happens most of the cases on the GPU, not in the driver). Once again, that’s how it should work.

I guarantee you if you would put a glFinish between the queries and when you use it with BeginConditionalRender with GL_QUERY_NO_WAIT (just as an experiment) it would not draw any occluded object.

Once again, don’t forget that GL_QUERY_NO_WAIT will be a no-op if the results are not available at the time on the GPU, but simply will draw everything as usual. This is the expected behavior.

ok, it’s my fault. i was too optimistic, because i used to test my application on GF 560Ti, which performs 5 to 7 times better and where’s, actually, no need to wait for occlusion query. and i thought it was because of conditional rendering handles it better. i assumed that conditional rendering is using something like “last available query result” so if latest result for query object is not available yet, it uses previous available cached result… but that would be sane.

and now some interesting results:

putting glFinish(); after each glEndQuery(…); does nothing. it behaves exactly the same way - all objects pass or gpu stall.
i also tried glFinish(); after the whole occlusion query pass, no result.

i also tried to keep

unsigned occQueryAvailable = 0;
        while(!occQueryAvailable) {
            glGetQueryObjectuiv(occQuery,GL_QUERY_RESULT_AVAILABLE, &occQueryAvailable); 
        }

after each query. same, doesn’t fix conditional rendering.

and i think you misunderstood me. GL_QUERY_WAIT doesn’t just cause temporal freeze or slowdown, i would expect that. it causes GPU stall long enough to crash the driver.

and if it works like you describe, then what do can you achieve with it exactly? i fail to see the point of that functionality. in common rendering it makes some cpu work unavoidable(choosing renderpath, binding textures, passing uniforms), but what does it provide in exchange?

that results in always rendering all objects with GL_QUERY_NO_WAIT

I’m curious: how do you detect that it is being rendered?

If you use GL_QUERY_NO_WAIT and the draw command that you perform the occlusion query on didn’t finish at the time you call BeginConditionalRender

That should be “at the time the GPU executes the BeginConditionalRender part”. The query shouldn’t have to be finished yet.

The general idea with conditional render with NO_WAIT is similar to PBOs; as long as you can put sufficient distance between the query and the conditional part, you can get something useful out of it. If you render them one right after the other, it’ll never be useful.

i assumed that conditional rendering is using something like “last available query result” so if latest result for query object is not available yet, it uses previous available cached result… but that would be sane.

Sane? No, that would be disastrous. There’s no guarantee that the user uses the same query object for the same rendered object. Users can, and in many cases do, have circular buffers of query objects that they rotate through. Query objects do not have to be associated with a particular “object”.

and i think you misunderstood me. GL_QUERY_WAIT doesn’t just cause temporal freeze or slowdown, i would expect that. it causes GPU stall long enough to crash the driver.

Does this happen when you don’t query the number of samples passed?

If the query does not finish in time, the practice is to put more work between performing the query and using its result for conditional rendering. Also, it could depend on when the driver actually decides to submit the commands to the GPU, thus there is no apple-to-apple comparison here.

This sounds like a potential driver bug (if you did everything properly).

Yes, I misunderstood you. This definitely sounds like a driver bug (once again, assuming you did everything properly).

Yes, sorry I used wrong wording, I did mean when BeginConditionalRender is actually processed on the GPU.

wireframe mode, i also have some lens-flares and my occlusion FBO is very low-res, so small distant objects should disappear

[QUOTE=Alfonse Reinheart;1248264]
The general idea with conditional render with NO_WAIT is similar to PBOs; as long as you can put sufficient distance between the query and the conditional part, you can get something useful out of it. If you render them one right after the other, it’ll never be useful.[/QUOTE]

that was my B-plan. if it doesn’t handle things in a magic way i expected, at least i had several passes between rendering to queries and actually using those.

well… i just tried putting Sleep(1000); after my occlusion query pass. it renders several frames and then stalls with GL_QUERY_WAIT. and the fact that manually waiting for GL_QUERY_RESULT_AVAILABLE doesn’t fix issue(it stalls on a 1st frame without sleep) - makes it weird. i have no idea, what’s going on there. but i am biased towards conclusion that it is my mistake, because i’m quite confused and tired now.

I haven’t read the entire thread so sorry if I am off topic but I wonder whether the issue isn’t that you expect too much of conditional rendering.

With conditional rendering, only Draw and Clear commands are affected not all commands.

I recently noticed an AMD OpenGL implementation bug where Clear commands are not affected by conditional rendering.

no, the issue was, in short: no matter what i did, conditional render always passed(didn’t affect glDrawElements) if i used it with GL_NO_WAIT, and it always stalled GPU with infinite wait if i used GL_WAIT. the same code with the same scene worked if i requested occlusion query result the normal way with glGetQueryObjectuiv.

now i switched back to normal occlusion query. and i’m “happy” to report, that it is mostly broken for AMD cards too. but the issue is different. it has HUGE delay. if i render to occlusion about 12 objects of varying size, using 256x256 FBO with color mask disabled, i get to wait 25ms until occlusion query is ready. this is already an awful result, i was testing it with HD 6670 on latest drivers(Catalyst 13.1). it seems heavily fillrate limited, because testing about 50 small billboards to test lens flares for light sources takes about 1-2ms and rendering bounding boxes instead of objects doesn’t help at all. but bare with me, i get 25ms if i do this after occlusion query finished:


while(!occQueryAvailable){     
    glGetQueryObjectuiv(Objects[lastObject].occQuery,GL_QUERY_RESULT_AVAILABLE, &occQueryAvailable);  
}

you don’t get occlusion query results like that, right? you do some stuff and then request result. maybe it will be alright…
so i’ve made my code to check if occlusion query is ready to execute on the next frame. with 30 fps, it had 33 ms to finish. but it didn’t. and it didn’t after 2 frames. it took 3-4 frames until occlusion query was ready. it’s 105ms average. if 25 would be ok for GeForce FX5200, i don’t have words to express how ridiculous is 105 ms. so it seems like occlusion query is totally broken on AMD cards. it seems like it gets delayed more and more with rendering commands pushed to pipeline. same occlusion query algorithm takes less than 1 ms on 560Ti with glFinish(to block until occlusion query is ready). at this point i expect either tomatoes being thrown at me for doing something catastrophically wrong, either AMD employee showing up and taking care of it.

if someone else is also interested in this topic, a little update: http://devgurus.amd.com/message/1287564#1287564

maybe i get some explanation here, what the hell is going on? maybe i am ignorant in this situation? but i don’t get, what is the purpose of fixing conditional rendering when occlusion query is so uselessly slow. and judging by lack of reaction on my posts about it - AMD employer thinks it’s totally ok to have occlusion query in a very simple scene, low-resolution framebuffer to finish in 100+ ms(or 25ms if you force it with glFinish, but it’s not an acceptable thing to do) it takes multiple frames to finish, it’s not acceptable. i don’t get how can they ignore the fact that on proper GPU it finishes in about 1 ms.

but i don’t get, what is the purpose of fixing conditional rendering when occlusion query is so uselessly slow.

Because you’re not supposed to query it. This is in fact the entire point of conditional rendering: so that you don’t have to induce a CPU/GPU synchronization, thus allowing the GPU to be very out of sync with the CPU.

What Graham seems to be saying is that AMD’s drivers like running very out of sync with the CPU, regardless of the load. That’s their prerogative. Thus, inducing a synchronization is something that should be avoided. Conditional rendering is a means of avoiding it.

In short, don’t query for occlusion; use conditional rendering to determine whether to render something.

ok, i’ll wait until fix and try to see if it will actually work effectively. the reason i didn’t understand it myself is because i’ve never seen conditional rendering working effectively. with modern nVidia GPU’s occlusion query is ready in 0-1ms, so i couldn’t see any benefit. and with AMD GPU’s, where occlusion query result is really delayed, i’ve never seen it working. but i assumed it would always render objcects because query result is not ready by the time i try to render. with 100ms delay i’m not sure what is expected behavior.

so really, i should expect that conditional rendering will respond significantly faster? and delay of GL_QUERY_RESULT_AVAILABLE is caused solely by synchronizing it with CPU? in that case, i see more point in this extension.

with modern nVidia GPU’s occlusion query is ready in 0-1ms

But this comes from poor profiling, which you readily admit because you’re not putting the scene under load. How long a query takes when you’re not rendering much is irrelevant; what you need to know is how long a query will take when you render actual scenes. That’s why it’s always important to profile using data that is as close to the real thing as possible.

I’m not saying that in a real scene, NVIDIA’s response time will jump to 100ms. But odds are good it’s going to be rather more than 0-1ms.

[QUOTE=Nowhere-01;1248721]
so really, i should expect that conditional rendering will respond significantly faster? and delay of GL_QUERY_RESULT_AVAILABLE is caused solely by synchronizing it with CPU? in that case, i see more point in this extension.[/QUOTE]

i’d like you to confirm or deny those statements directly.

at this point i cannot reproduce “real scene” because of the state my editor is currently in… but i did experiment and ran a test in which i had 300 objects scattered around scene, most of them were 8k triangles, each divided into 4 surfaces(means 4 glDrawElements calls per object). most of them were in the frustum, occluding each-other and being occluded by bigger objects. occlusion framebuffer was 256x256 depth-only. with GeForce 560Ti, it worked nicely, occlusion query took about 10-12ms to finish(with glFinish immediately after rendering to occlusion query).i find that satisfying, and expected from modern GPU. HD 6670 took 25ms with glFinish for about 10-12 objects in the frustum and 100ms+ normal way, if i just wait until query is ready. because in that case, it takes several frames.

occlusion query took about 10-12ms to finish. i find that satisfying, and expected from modern GPU.

For whatever application you’re using, it’s OK for you to be sitting there waiting on the query, doing nothing else for 12ms? You’re basically stalling your CPU for 3/4ths of a 60FPS frame. If a 12ms CPU stall (not to mention the GPU bubble you’re creating by not feeding it rendering data on time) is acceptable, are you sure you wouldn’t get faster performance by just rendering the object without the occlusion test?

i never said i plan to use it this way. that was kind of extreme test to see, how it performs with a lot of big objects. i don’t plan to have such amount of objects in the frustum and test all of them all the time. and no, i’m not using glFinish anywhere in final code, it was just a lazy way to check, how fast occlusion query becomes available. normally i render to occlusion query, do stuff and when ask for query results. without glFinish, in this test on nVidia, occlusion query still ready by the next frame. and for amd even about 10-12 objects takes 3-4 frames to be ready.

but i wouldn’t care if conditional rendering was much faster(by faster, i mean able to use occlusion query result much earlier, than i do with GL_QUERY_RESULT). but i don’t know what to expect from it, you didn’t answer my questions about how it should perform, or how would you expect it to perform on AMD card after they’ll fix it. do you expect speed similar to nVidia implementation?

What you are saying is non-sense. Just because occlusion queries return their values later on one card than on another not necessarily has to do anything with the performance of the graphics processor. Maybe one queues up more work on the CPU before submitting it to the GPU.

What is this 100+ ms? I suppose it’s CPU time. Well, guess what, you should use timer queries to figure out how much occlusion queries do cost in performance as that measures GPU time, not CPU time. I’ll tell you that they probably don’t consume any visible performance.

You are confusing the speed of an operation with the latency of an operation. These are two different things. Just because you have less latency doesn’t mean that the GPU is faster. Not to mention that at least in D3D it is very common that the GPU is lagging two or three frames behind the CPU, in which case the occlusion query would also finish two frames late. Once again, this has nothing to do with speed but with latency.

the question i repeat for about last three posts is:
should i expect conditional render to be able to access occlusion query result significantly earlier, than i do with glGetQueryObject? in the last post i specified - “if conditional rendering was much faster(by faster, i mean able to use occlusion query result much earlier, than i do with GL_QUERY_RESULT)”. is what somehow ambiguous?

why do i ask it? because i’ve never seen how does it perform on AMD GPU(it’s confirmed broken currently). and i’m not sure what to expect with such latency in occlusion query.

i don’t confuse anything, i’ve already learned that it doesn’t direcltly depend on performance much. i just compared 2 GPU’s to state how huge is the difference. is that somehow nonsensical? my older post may have been less sensible because i wasn’t sure. but i asked and got corrected.

[QUOTE=aqnuep;1248736]
What is this 100+ ms? I suppose it’s CPU time. Well, guess what, you should use timer queries to figure out how much occlusion queries do cost in performance as that measures GPU time, not CPU time. I’ll tell you that they probably don’t consume any visible performance.[/QUOTE]

100ms is the time occlusion query results becomes available in a very simple test(10-12 medium objects in frustum, 256x256 occlusion FBO) on AMD GPU. how did i get that? well i’ve rendered objects to occlusion query, then did the rest of scene processing and then at whe beginning of the next frame asked if query results are available for last rendered object with glGetQueryObject(…, GL_QUERY_RESULT_AVAILABLE, …). they were available in 3-4 frames. with 30 fps, that means it had averagely about 100 ms of latency. that’s what i care about, not about how much time it takes to render(i knwo it’s neglectable). and i don’t know how could you interpret my messages in such unreasonable way. maybe because i was not getting an answer and was trying to reformulate things constantly to specify in response to ridiculous pedantic misinterpretations. but it only made things confusing.

Not to mention that at least in D3D it is very common that the GPU is lagging two or three frames behind the CPU, in which case the occlusion query would also finish two frames late. Once again, this has nothing to do with speed but with latency.

if i had 3-frames lag in rendering, would i see how objects affected by occlusion query pop-up? i may be incorrect, but i assume that if my application had 3-frames lag in render, then 3-4 frames delayed occlusion query results would be visually fine. but it’s not, if i move camera, it’s obvious that occlusion query result lags several frames behind rendering(i’m talking about AMD card).

if you decide to answer, could you interpret my messages in a more reasonable way? don’t choose most backwards interpretation. english is not my native language, but you’re ok understanding chinese, who uses google translate. i don’t think i’m worse.

if i’m talking about occlusion query delay\speed\performance i mean the most important factor - how much time it takes for results to become available after i submit rendering commands.

Okay, here’s the things you should consider:

  1. After performing all the occlusion queries (i.e. the glBegin/EndQuery part) you can call glFlush. That will most likely ensure that no further commands will be accumulated before it is sent to the GPU, thus the latency of glGetQueryObject(…, GL_QUERY_RESULT, …) will be smaller.

  2. When you use conditional rendering the decision is not done on the CPU, thus the CPU-GPU latency (in your case this huge 100ms) doesn’t matter at all. All that will matter is the GPU-GPU latency, i.e. the latency between a) the GPU processing all commands between glBegin/EndQuery, and b) the GPU processing glConditionalRender(…, GL_QUERY_WAIT). You can expect that this latency is way smaller than what you’ve mentioned as it’s not a GPU-CPU sync, but only a GPU-GPU sync. Sure, if you use GL_QUERY_NO_WAIT then if you didn’t have enough work between glBegin/EndQuery and glConditionalRender, and here I mean GPU work, not CPU work, you still might not have enough time to get the query result, but as the GPU-GPU latency is way smaller, it’s not really likely to cause a problem, and if it does, simply stick with GL_QUERY_WAIT.

Once again, just to emphasize it, getting the results back to the CPU (i.e. glGetQueryObject) is very different from getting the results on the GPU (i.e. glConditionalRender, or the functionality introduced by AMD_query_buffer_object). The key difference is that you don’t have to worry about how “late” is the GPU compared to the CPU. It can easily happen that while your CPU-GPU latency is in fact 100ms, the GPU-GPU latecy could be way less than 1ms.

thank you, that was the answer i was awaiting. i wanted to be sure, that i understood everything correctly. and i did, now i’m going to wait for fixed drivers and i will report results here.

i did experement with glFlush and glFinish. glFlush didn’t affect latency noticeably in my case.