PDA

View Full Version : GlBufferStorage and Fencing / synchronisation



qnoper
09-26-2014, 07:08 AM
Hello there.

I'm sorry for my very bad english, but I will try to do a maximum effort for you can understand what I want to say :).

So, I want to create a little rendering engine with OpenGL 4.4.

Why OpenGL 4.4? Simply because we can improve performance with Bindless Texture, Buffer storage, and other :).

I work on Ubuntu 14.4 LTS.

I have a many problems with buffer storage for update my buffers.
I think it's a problem of synchronisation, because when I make my render, I have a display issue. Indeed, I have the impression the data used during my render is a "older" data. For example, when I render my object with a matrix, the matrix is a light matrix or inverse. I think it's a synchronisation issue too, because when I test a "old school" (glBufferData and glMapBuffer and glUnmapBuffer), I don't have this problem...

So, this is my code.

It's my way to set and "map" my buffer


void Buffer::allocate(u32 index, u32 size)
{
if(index >= mId.size())
throw "Buffer : Index out of rang";

u32 flags = GL_MAP_COHERENT_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT;

glNamedBufferStorageEXT(mId[index], size, nullptr, flags);
mPtr[index] = glMapNamedBufferRangeEXT(mId[index], 0, size, flags);
mSize[index] = size;
}

void *Buffer::map(u32 index)
{
return mPtr[index];
}

And when I want update my buffer, I use this :


GLsync sync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
MVPandM *matrixPtr = manager->mapMVPandM();
TexHandle *texturePtr = manager->mapTexHandle();

auto it = m.begin();

for(u32 i = 0; i < mNumMeshes; ++i)
texturePtr++->handle = mTexHandle[i];

for(u32 i = 0; i < nMatrix; ++it, ++i)
*matrixPtr++ = *it;

glClientWaitSync(sync, GL_SYNC_FLUSH_COMMANDS_BIT, 1000000000);
glDeleteSync(sync);

mapMVPandM and mapTexHandle is a map function in a other object.

Thanks to you, and if you can't understand one sentence, you can ask to me :).

Thanks !

Osbios
09-27-2014, 05:15 AM
If you write data on the client side to a mapped buffer and then want to use it on the server side you use:

glMemoryBarrier(CLIENT_MAPPED_BUFFER_BARRIER_BIT);

If you change data on the server side and then want to access it from the client you use:

glMemoryBarrier(CLIENT_MAPPED_BUFFER_BARRIER_BIT);
GLsync fence = glFenceSync(SYNC_GPU_COMMANDS_COMPLETE, 0);
...
glClientWaitSync(fence);

EDIT:
Never mind, with GL_MAP_COHERENT_BIT and the range mapping, the client changes should be visible immediately.

qnoper
09-27-2014, 07:49 AM
Hello.

Thanks for your answer.

I use a GL_MAP_COHERENT_BIT, but I have problems... When I use that, it's not the good data used in my glDrawElements...
For example, if I use a matrix with scale 1000 for my lights, the object is render with matrix with scale 1000 instead the matrix with scale 1...

Now, I can try without GL_MAP_COHERENT_BIT

My buffer loading is :

void Buffer::allocate(u32 index, u32 size)
{
if(index >= mId.size())
throw "Buffer : Index out of rang";

glNamedBufferStorageEXT(mId[index], size, nullptr, GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT);
mPtr[index] = glMapNamedBufferRangeEXT(mId[index], 0, size, GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT);
mSize[index] = size;
}

void *Buffer::map(u32 index)
{
//return glMapNamedBufferEXT(mId[index], GL_READ_WRITE);
return mPtr[index];
}

Now, I'm going to try to explain better my situation.

I have the instance matrix and Handle in the client side, and I want to "write" this data in my Shader_Storage_Buffer.

I send my data in my buffer, and I draw my model in three textures with FBO (Diffuse, Normal and Position).
After, I do a loop for send a matrix and data of my lights and draw my cube which contains my lights (Power of Deferred Shading) in the texture with Blending with a add function.

After, I render my light Texture with the diffuse texture in the screen.

So, it's my code for render my model


void Model::render(MatrixTextureHandle *manager, std::vector<MVPandM> const &m, u32 nMatrix)
{
glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BI T);

// FBO Is bind before
MVPandM *matrixPtr = manager->mapMVPandM();
TexHandle *texturePtr = manager->mapTexHandle();

// MVP and M is a struct with a matrix ModelViewProjection and Model
auto it = m.begin();

for(u32 i = 0; i < mNumMeshes; ++i)
texturePtr++->handle = mTexHandle[i];

for(u32 i = 0; i < nMatrix; ++it, ++i)
*matrixPtr++ = *it;

mVao.bind(true);
mVertexElementIndirect.bind(2, BufferType::INDIRECT);

glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, nullptr, mNumMeshes, 0);
}

And for my light pass


void LightPass::render(mat4 const &projectionView,
MatrixTextureHandle *matrixTexture)
{
// Textures and FBO is bind before
glEnable(GL_BLEND);
glBlendFunc(GL_ONE, GL_ONE);
glBlendEquation(GL_FUNC_ADD);

auto itPl = mPointLight.begin();

glEnable(GL_STENCIL_TEST);
glStencilFunc(GL_EQUAL, 0, 0xff);
glStencilOp(GL_INCR, GL_INCR, GL_INCR);

mVao.bind(true);
mLightBuffer.bind(2, BufferType::INDIRECT);

for(u32 i = 0; i < mNumberOfPointLight; ++i, ++itPl)
{
glClear(GL_STENCIL_BUFFER_BIT);
glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BI T);

MVPandM *matrixPtr = matrixTexture->mapMVPandM();
PointLight *plPtr = static_cast<PointLight*> (mLightBuffer.map(3));

matrixPtr->MVP = projectionView * scale(translate(mat4(1.0), itPl->posRadius.xyz()), itPl->posRadius.www());
*plPtr = *itPl;

glDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, nullptr);
}

glDisable(GL_STENCIL_TEST);
glDisable(GL_BLEND);
}

Same with glmemorybarrier I have the problem... I don't understand why...

Maybe The problem is coming of glDrawElements ?

I don't have this problem with glMapBuffer and glUnmapBuffer and glBufferData ^^

Thank you for your help :).

Osbios
09-27-2014, 11:49 AM
I don't fully understand your code, but I guess you are reusing the buffer memory before the draw command actually was finished or even started.
You have to make sure that a draw command (or any other command) that uses a buffer is finished, before you reuse that buffer memory for other stuff.

Because you don't really want to wait for a fence except if it is 1-2 frames old (otherwise you kill your performance), make one entry for everything that changes during one frame.
Also make it a ring buffer with the size of two! So the GPU can work with one data set and you can fill the second one.

Some freestyle example code:

struct objData
{
MVPandM matrix;
TexHandle texHandler;
}
struct lightData
{
MVPandM matrix;
PointLight pointLight ;
}
struct singleFrameBufferMemory
{
LightData lightData;
ObjData objData;
}

singleFrameBufferMemory ringBuffer[2];

GLsync ringBufferFence[2];
int ringPosition;
void doSomething()
{
ringBufferFence[0] = glFenceSync(SYNC_GPU_COMMANDS_COMPLETE, 0);
ringBufferFence[1] = glFenceSync(SYNC_GPU_COMMANDS_COMPLETE, 0);
while (1)
{
ringPosition = !ringPosition;
glClientWaitSync(ringBufferFence[ringPosition]);
ringBuffer[ringPosition] = XXX; //write all the data to

drawObj();
drawLight();

glDeleteSync(ringBufferFence[ringPosition]);
ringBufferFence[ringPosition] = glFenceSync(SYNC_GPU_COMMANDS_COMPLETE, 0);
}
}

Of course with more complexer stuff you will use more then one buffer object for this.

qnoper
09-27-2014, 02:28 PM
Hello :).

Thanks again for your answer.

I will see later for the ring buffer because it's a optimisation features, and now, I'm just happy because my code works :-) .

But, I don't understand why I need to use glMemoryBarrier with GL_SHADER_STORAGE_BARRIER_BIT and No CLIENT_MAPPED_BUFFER_BARRIER_BIT.

Maybe it's cos I use Shader Storage Buffer ? But maybe it's more sure to use GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT too, no?

Thanks a lot for your help :-) .

PS : The picture is useless