GlBufferStorage and Fencing / synchronisation

imported_qnoper · September 26, 2014, 7:08am

Hello there.

I’m sorry for my very bad english, but I will try to do a maximum effort for you can understand what I want to say :).

So, I want to create a little rendering engine with OpenGL 4.4.

Why OpenGL 4.4? Simply because we can improve performance with Bindless Texture, Buffer storage, and other :).

I work on Ubuntu 14.4 LTS.

I have a many problems with buffer storage for update my buffers.
I think it’s a problem of synchronisation, because when I make my render, I have a display issue. Indeed, I have the impression the data used during my render is a “older” data. For example, when I render my object with a matrix, the matrix is a light matrix or inverse. I think it’s a synchronisation issue too, because when I test a “old school” (glBufferData and glMapBuffer and glUnmapBuffer), I don’t have this problem…

So, this is my code.

It’s my way to set and “map” my buffer

void Buffer::allocate(u32 index, u32 size)
{
    if(index >= mId.size())
        throw "Buffer : Index out of rang";

    u32 flags = GL_MAP_COHERENT_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT;

    glNamedBufferStorageEXT(mId[index], size, nullptr, flags);
    mPtr[index] = glMapNamedBufferRangeEXT(mId[index], 0, size, flags);
    mSize[index] = size;
}

void *Buffer::map(u32 index)
{
    return mPtr[index];
}

And when I want update my buffer, I use this :

GLsync sync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
    MVPandM *matrixPtr = manager->mapMVPandM();
    TexHandle *texturePtr = manager->mapTexHandle();

    auto it = m.begin();

    for(u32 i = 0; i < mNumMeshes; ++i)
        texturePtr++->handle = mTexHandle[i];

    for(u32 i = 0; i < nMatrix; ++it, ++i)
        *matrixPtr++ = *it;

    glClientWaitSync(sync, GL_SYNC_FLUSH_COMMANDS_BIT, 1000000000);
    glDeleteSync(sync);

mapMVPandM and mapTexHandle is a map function in a other object.

Thanks to you, and if you can’t understand one sentence, you can ask to me :).

Thanks !

Osbios · September 27, 2014, 5:15am

If you write data on the client side to a mapped buffer and then want to use it on the server side you use:

[code=“cpp”]glMemoryBarrier(CLIENT_MAPPED_BUFFER_BARRIER_BIT);



If you change data on the server side and then want to access it from the client you use:
[code="cpp"]glMemoryBarrier(CLIENT_MAPPED_BUFFER_BARRIER_BIT);
GLsync fence = glFenceSync(SYNC_GPU_COMMANDS_COMPLETE,  0);
...
glClientWaitSync(fence);

EDIT:
Never mind, with GL_MAP_COHERENT_BIT and the range mapping, the client changes should be visible immediately.

imported_qnoper · September 27, 2014, 7:49am

Hello.

Thanks for your answer.

I use a GL_MAP_COHERENT_BIT, but I have problems… When I use that, it’s not the good data used in my glDrawElements…
For example, if I use a matrix with scale 1000 for my lights, the object is render with matrix with scale 1000 instead the matrix with scale 1…

Now, I can try without GL_MAP_COHERENT_BIT

My buffer loading is :

void Buffer::allocate(u32 index, u32 size)
{
    if(index >= mId.size())
        throw "Buffer : Index out of rang";

    glNamedBufferStorageEXT(mId[index], size, nullptr, GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT);
    mPtr[index] = glMapNamedBufferRangeEXT(mId[index], 0, size, GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT);
    mSize[index] = size;
}

void *Buffer::map(u32 index)
{
    //return glMapNamedBufferEXT(mId[index], GL_READ_WRITE);
    return mPtr[index];
}

Now, I’m going to try to explain better my situation.

I have the instance matrix and Handle in the client side, and I want to “write” this data in my Shader_Storage_Buffer.

I send my data in my buffer, and I draw my model in three textures with FBO (Diffuse, Normal and Position).
After, I do a loop for send a matrix and data of my lights and draw my cube which contains my lights (Power of Deferred Shading) in the texture with Blending with a add function.

After, I render my light Texture with the diffuse texture in the screen.

So, it’s my code for render my model

void Model::render(MatrixTextureHandle *manager, std::vector<MVPandM> const &m, u32 nMatrix)
{
    glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT);

    // FBO Is bind before
    MVPandM *matrixPtr = manager->mapMVPandM();
    TexHandle *texturePtr = manager->mapTexHandle();

    // MVP and M is a struct with a matrix ModelViewProjection and Model
    auto it = m.begin();

    for(u32 i = 0; i < mNumMeshes; ++i)
        texturePtr++->handle = mTexHandle[i];

    for(u32 i = 0; i < nMatrix; ++it, ++i)
        *matrixPtr++ = *it;

    mVao.bind(true);
    mVertexElementIndirect.bind(2, BufferType::INDIRECT);

    glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, nullptr, mNumMeshes, 0);
}

And for my light pass

void LightPass::render(mat4 const &projectionView,
                       MatrixTextureHandle *matrixTexture)
{
    // Textures and FBO is bind before
    glEnable(GL_BLEND);
    glBlendFunc(GL_ONE, GL_ONE);
    glBlendEquation(GL_FUNC_ADD);

    auto itPl = mPointLight.begin();

    glEnable(GL_STENCIL_TEST);
    glStencilFunc(GL_EQUAL, 0, 0xff);
    glStencilOp(GL_INCR, GL_INCR, GL_INCR);

    mVao.bind(true);
    mLightBuffer.bind(2, BufferType::INDIRECT);

    for(u32 i = 0; i < mNumberOfPointLight; ++i, ++itPl)
    {
        glClear(GL_STENCIL_BUFFER_BIT);
        glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT);

        MVPandM *matrixPtr = matrixTexture->mapMVPandM();
        PointLight *plPtr = static_cast<PointLight*> (mLightBuffer.map(3));

        matrixPtr->MVP = projectionView * scale(translate(mat4(1.0), itPl->posRadius.xyz()), itPl->posRadius.www());
        *plPtr = *itPl;

        glDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, nullptr);
    }

    glDisable(GL_STENCIL_TEST);
    glDisable(GL_BLEND);
}

Same with glmemorybarrier I have the problem… I don’t understand why…

Maybe The problem is coming of glDrawElements ?

I don’t have this problem with glMapBuffer and glUnmapBuffer and glBufferData ^^

Thank you for your help :).

Osbios · September 27, 2014, 11:49am

I don’t fully understand your code, but I guess you are reusing the buffer memory before the draw command actually was finished or even started.
You have to make sure that a draw command (or any other command) that uses a buffer is finished, before you reuse that buffer memory for other stuff.

Because you don’t really want to wait for a fence except if it is 1-2 frames old (otherwise you kill your performance), make one entry for everything that changes during one frame.
Also make it a ring buffer with the size of two! So the GPU can work with one data set and you can fill the second one.

Some freestyle example code:

[code=“cpp”]struct objData
{
MVPandM matrix;
TexHandle texHandler;
}
struct lightData
{
MVPandM matrix;
PointLight pointLight ;
}
struct singleFrameBufferMemory
{
LightData lightData;
ObjData objData;
}

singleFrameBufferMemory ringBuffer[2];

GLsync ringBufferFence[2];
int ringPosition;
void doSomething()
{
ringBufferFence[0] = glFenceSync(SYNC_GPU_COMMANDS_COMPLETE, 0);
ringBufferFence[1] = glFenceSync(SYNC_GPU_COMMANDS_COMPLETE, 0);
while (1)
{
ringPosition = !ringPosition;
glClientWaitSync(ringBufferFence[ringPosition]);
ringBuffer[ringPosition] = XXX; //write all the data to

drawObj();
drawLight();

glDeleteSync(ringBufferFence[ringPosition]);
ringBufferFence[ringPosition] = glFenceSync(SYNC_GPU_COMMANDS_COMPLETE,  0);

}
}


Of course with more complexer stuff you will use more then one buffer object for this.

imported_qnoper · September 27, 2014, 2:28pm

Hello :).

Thanks again for your answer.

I will see later for the ring buffer because it’s a optimisation features, and now, I’m just happy because my code works .

But, I don’t understand why I need to use glMemoryBarrier with GL_SHADER_STORAGE_BARRIER_BIT and No CLIENT_MAPPED_BUFFER_BARRIER_BIT.

Maybe it’s cos I use Shader Storage Buffer ? But maybe it’s more sure to use GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT too, no?

Thanks a lot for your help .

PS : The picture is useless