Occlusion query, issue

Hello there.

I’m trying to implement occlusion culling in my engine to improve performance of rendering of meshes.

But, I have one problem that I don’t understand ^_^.

I decompose my rendering in two parts.

The first part get occlusion queries from drawing cubes (Bounding boxes of meshes component all of my scene).

The second part print the number of sample passed (per query object). Normally, I have to see many differents numbers, but, I saw only the same number, it’s the last result in the query vector…

Otherwise, if I test to get all query in the first loop (where I render cube and not mesh), it works, so I don’t understand…

void Model::mRenderOcclusion(u32 bindingOcclusionUniform, std::shared_ptr<Shader> &occlusion)
{
    mat4 *matAABB = (mat4*)mOcclusionBuffers.map(3);
    mOcclusionBuffers.bind(2, BufferType::DRAW_INDIRECT);
    mOcclusionBuffers.bindBase(3, BufferType::UNIFORM, bindingOcclusionUniform);

    glMemoryBarrier(GL_COMMAND_BARRIER_BIT);

    mOcclusionVao.bind(true);
    occlusion->bind(true);

    glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
    glDepthMask(GL_FALSE);

    // Render all Bounding Boxes with queries
    for(u32 i = 0; i < mNumMeshes; ++i)
    {
        *matAABB = mMatrixAABB[i];

        glMemoryBarrier(GL_UNIFORM_BARRIER_BIT);

        glBeginQuery(GL_SAMPLES_PASSED, mOcclusionQuery[i]);
            glDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, (DrawElementCommand*)nullptr + i);
        glEndQuery(GL_SAMPLES_PASSED);
        
        // If I uncomment this part, it's working
        /*s64 nQuery = 0;

        while(nQuery == 0)
            glGetQueryObjecti64v(mOcclusionQuery[i], GL_QUERY_RESULT_AVAILABLE, &nQuery);*/
    }

    glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
    glDepthMask(GL_TRUE);
}

void Model::render(u32 nInstance, u32 bindingMaterialOcclusionUniform, std::shared_ptr<Shader> &occlusion, std::shared_ptr<Shader> &render)
{
    DrawElementCommand *command = (DrawElementCommand*)mBuffers.map(2);
    DrawElementCommand *commandOcclusion = (DrawElementCommand*)mOcclusionBuffers.map(2);

    for(u32 i = 0; i < mNumMeshes; ++i)
        (command++)->primCount = (commandOcclusion++)->primCount = nInstance;

    mRenderOcclusion(bindingMaterialOcclusionUniform, occlusion);
    mBuffers.bind(2, BufferType::DRAW_INDIRECT);

    glMemoryBarrier(GL_COMMAND_BARRIER_BIT);

    render->bind(true);
    mVao.bind(true);

    glFlush();

    int n = mNumMeshes * 3 / 4;

    s64 available = 0;

    while(available == 0)
        glGetQueryObjecti64v(mOcclusionQuery[n], GL_QUERY_RESULT_AVAILABLE, &available);

    for(u32 i = 0; i < mNumMeshes; ++i)
    {
        s64 nQuery = 0;

        glGetQueryObjecti64v(mOcclusionQuery[i], GL_QUERY_RESULT, &nQuery);

        std::cout << nQuery << std::endl;
    }
}

Thanks :). And sorry for my english…

First, I don’t understand why you use glMemoryBarrier as I don’t see you doing any load/store images or SSBOs used in your code, so you probably use that incorrectly.

Second, you should never immediately poll on the CPU for the query results to become available as you do it inside mRenderOcclusion, in fact, in this particular case you can just remove that code all together. Waiting immediately on the CPU side for the result of a query will kill your performance as you stall the GPU.

Third, in the while loop in Model::render you only wait for a single occlusion query to become ready (the one with index “n”), I don’t know why you do that.

I’m not sure why you get wrong results as I don’t see any immediately visible issue with the result acquisition itself, but considering that the code is full of incorrect/bad uses of the API I’d for go back and try to fix and understand those first as I’m pretty sure the issue you’re experiencing is also a result of lack of understanding.

Hello :).

Thanks for your answer :).

For the first point, it’s really possible I don’t use correctly glMemoryBarrier …
I use Buffer Object with glBufferStorage, so I have to use this function, else I have issues.
I had tried to use glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BIT); but it didn’t work… Or precisely,
I didn’t use this function correctly ^^. Maybe can you give me more informations about glMemoryBarrier?

So, when I change data in my buffer, I have to perform one MemoryBarrier with UNIFORM or COMMAND, else I have isses…

For the second point, I know, but it’s just one test, obviously, in the future, I will use conditionnalRendering and / or query buffer ^^.

For the third point, I just tried to rewrite example from : https://www.opengl.org/registry/specs/ARB/occlusion_query.txt
But, wih thinking, You are right, it’s probably one bad idea to wait only one.

Thank you so much :).

Qnoper :slight_smile:

[QUOTE=qnoper;1263750]For the first point, it’s really possible I don’t use correctly glMemoryBarrier …
I use Buffer Object with glBufferStorage, so I have to use this function, else I have issues.
I had tried to use glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BIT); but it didn’t work… Or precisely,
I didn’t use this function correctly ^^. Maybe can you give me more informations about glMemoryBarrier?[/QUOTE]

glBufferStorage has nothing to do with glMemoryBarrier. glMemoryBarrier is only to synchronize image stores and SSBO writes with other operations, which you apparently don’t use thus you don’t need to call glMemoryBarrier at all.
For information on glMemoryBarrier read the extension spec for ARB_shader_image_load_store, but you definitely don’t need it at all if all you do is update buffers from the CPU-side.

I’m not sure how you update your buffers but even if you use a persistent mapped buffer the most you may have to do is call glFlushMappedBufferRange.

I’d suggest you to go back and study how the individual OpenGL features you use actually work as it seems that you just use random unrelated functions without understanding their purpose.

So I don’t understand one thing.

I know that glMemoryBarrier is often used by SSBO and image store, but, when I read the specifications of ARB Buffer Storage

If MAP_COHERENT_BIT is not set and the client performs a write
followed by a call to the MemoryBarrier command with the
CLIENT_MAPPED_BUFFER_BARRIER_BIT set, then in subsequent commands
the server will see the writes.

But, the problem is, when I use CLIENT_MAPPED_BUFFER_BARRIER_BIT, I have issues that I don’t have when using SHADER_STORAGE_BUFFER_BIT / UNIFORM_BUFFER_BIT or other, which depend of my buffer…

So, after read GLAPI, I think I have found the solution :slight_smile:

Send whatever commands will write to the buffer.
Issue the memory barrier: glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT)​
Create a fence.
Do something else for a while, so that you don’t waste precious CPU time waiting for the GPU to be done.
Wait for the fence sync to complete.
Read from the mapped pointer.

Thanks to tell me this error :).

Qnoper :slight_smile: