Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 18

Thread: Unsynchronize SSBO

  1. #1
    Junior Member Newbie
    Join Date
    Aug 2015
    Posts
    19

    Unsynchronize SSBO

    Hi I have a huge performance issue on the cpu that I am 99% certain that it has to do with synchonization. With 4 triangles it takes 30% cpu on a quad core and around 50 it takes 60% and the fps drops from 60 to about 40.


    I am retrieving a single unsigned int from the gpu with a SSBO. I use this to see what the mouse is howering and it works perfect. Except for the performance issue. I'm pretty sure I could make it work without any synchonization. In the meaning that we wait for the gpu to finish all draw calls before retrieving the uint from the gpu.

    I have the following code:

    Code :
    	glDrawArrays(GL_TRIANGLES, 0, m_vertex_buffer.size());
     
    	GLuint vertexIndex = _engine->retrieveHoweredVertexIndex(); //this takes time.
    	if (vertexIndex != 0xffffffff) 
    		_engine->setHoweredVertexObserver(m_triangles[vertexIndex/3]);
    Here is the function were I suppose the loops and waits for the gpu to finish.
    Code :
    	inline GLuint retrieveHoweredVertexIndex()
    	{
    		glBindBuffer(GL_SHADER_STORAGE_BUFFER, m_vertex_index_SSBO_ID);
    		GLvoid* p = glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE); //This is probably were something like glFinish() is called which I don't want.
    		GLuint vertexIndex = *(GLuint*)p;
    		*(GLuint*)p = 0xffffffff;
    		glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
    		return vertexIndex;
    	}

    I have tried some glMapBufferRange without any improvements. (I may very well have used it wrong though).

    So my question is how do I make the glMapBuffer not to wait for the drawcall to finish? Or is that even the problem?


    By the way read on the wiki "the smallest required SSBO size is 16MB" does that mean this single uint will take 16MB of graphic memory?

  2. #2
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,155
    Quote Originally Posted by Trionet View Post
    So my question is how do I make the glMapBuffer not to wait for the drawcall to finish? Or is that even the problem?
    To the latter, don't read back from the GPU. Unless done very carefully, that will kill your performance, even on a desktop GPU (sort-last), but especially on a mobile GPU (most are sort-middle).

    To accelerate readbacks, you can sometimes use buffer object intermediates with delayed fetching of the data by the CPU from the buffer object to give the GPU/driver time to finish the data and copy the data across to the client side in the background. However, best case is you don't read back from the GPU. I think there's an article in OpenGL Insights on doing fast transfers using PBOs. Alternatively search for fast readbacks using PBOs on the net. See also mentions of PBOs and Download in the wiki:

    * https://www.opengl.org/wiki/Pixel_Buffer_Object

    To the former, I suggest you read this:

    * https://www.opengl.org/wiki/Buffer_Object_Streaming

    and pay attention to any mention of synchronization. However, this is mainly written for the desired case where your data is all moving in the CPU->GPU direction.
    Last edited by Dark Photon; 10-22-2015 at 05:40 PM.

  3. #3
    Junior Member Newbie
    Join Date
    Aug 2015
    Posts
    19
    Quote Originally Posted by Dark Photon View Post
    To the latter, don't read back from the GPU. Unless done very carefully, that will kill your performance, even on a desktop GPU (sort-last), but especially on a mobile GPU (sort-middle).
    Yes but what I want is simply for the glMapBuffer not to wait for the drawing. I tried GL_MAP_UNSYNCHRONIZED_BIT but the program crashes when trying to access (GLuint vertexIndex = *(GLuint*)p. So maybe that is my question why does this crash. For what I guess this does is that it reads from a place in graphic memory and puts it were p point and then when you unmap it, it uploads that value to graphics memory again?

    With that said is this valid code? Cuz it crashes.
    Code :
    	inline GLuint retrieveHoweredVertexIndex()
    	{
    		glBindBuffer(GL_SHADER_STORAGE_BUFFER, m_vertex_index_SSBO_ID);
    		//GLvoid* p = glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE);
    		GLvoid* p = glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, sizeof(GLuint), GL_MAP_UNSYNCHRONIZED_BIT);
    		GLuint vertexIndex = *(GLuint*)p;
    		*(GLuint*)p = 0xffffffff;
    		glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
    		return vertexIndex;
    	}

  4. #4
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,155
    Quote Originally Posted by Trionet View Post
    Yes but what I want is simply for the glMapBuffer not to wait for the drawing. I tried GL_MAP_UNSYNCHRONIZED_BIT but the program crashes when trying to access ( GLuint vertexIndex = *(GLuint*)p; ). So maybe that is my question why does this crash.
    Are you checking for a NULL pointer?

    Check for GL errors (for development purposes only) after the glMapBufferRange() call (link). Your call should be throwing one of those. See Errors under glMapBufferRange for details. Also, you cannot do unsynchronized reads, only writes.

  5. #5
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,470
    Quote Originally Posted by Trionet View Post
    For what I guess this does is that it reads from a place in graphic memory and puts it were p point and then when you unmap it, it uploads that value to graphics memory again?
    It might do that, or it might physically map the video RAM into the process address space, so that writing to it modifies video RAM directly.

    Quote Originally Posted by Trionet View Post
    With that said is this valid code? Cuz it crashes.
    Code :
    		GLvoid* p = glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, sizeof(GLuint), GL_MAP_UNSYNCHRONIZED_BIT);
    		GLuint vertexIndex = *(GLuint*)p;
    No, it's not valid. The glMapBufferRange() call should generate a GL_INVALID_OPERATION error and return a null pointer, because neither GL_MAP_READ_BIT nor GL_MAP_WRITE_BIT is set in the access parameter.

    Also, you're reading from a mapped region which was mapped without GL_MAP_READ_BIT. Even if the call did actually map the region (i.e. didn't return a null pointer), there's no reason to believe that the region can be read from.

  6. #6
    Junior Member Newbie
    Join Date
    Aug 2015
    Posts
    19
    Quote Originally Posted by Dark Photon View Post
    Are you checking for a NULL pointer?

    Check for GL errors (for development purposes only) after the glMapBufferRange() call (link). Your call should be throwing one of those. See Errors under glMapBufferRange for details. Also, you cannot do unsynchronized reads, only writes.

    Ooh if I can't unsynchronize reads then I quess this problem of mine can't be fixed

    Just a little curious why isn't it possible to read unsynchronized? Since for my problem the uint only changes in very few cases and any undefined behavior I think I could handle that even if I read the uint while the gpu processes that value.

  7. #7
    Member Regular Contributor
    Join Date
    Dec 2009
    Posts
    251
    Quote Originally Posted by Trionet View Post
    By the way read on the wiki "the smallest required SSBO size is 16MB" does that mean this single uint will take 16MB of graphic memory?
    No that means that an OpenGL implementation that advertises SSBO support must return at least 16MB when you ask for GL_MAX_SHADER_STORAGE_BLOCK_SIZE.
    You can assume that the memory allocated for the buffer will be much less, but it could be a few kB.

    Also note that reading unsynchronized from a buffer object you will not have any guarantee that the shader did write to the SSBO yet, so the value in the SSBO may be garbage.


  8. #8
    Junior Member Newbie
    Join Date
    Aug 2015
    Posts
    19
    Okay so I had an idea. I'm using glfw and have a game loop updating 60 times per second. I won't use my uint value until the frame/update after the one I was rendering. Is it garanteed that everything is draw call I have made this far are done when glfwSwapBuffers is done and the next frame/update begin? Because then glMapBuffer would not have too wait for the gpu to be done if I call it directly after glSwapBuffer and everything would go smooth?

    Or is it such that glMapBuffer waits for all of the computer's program's gpu calls to be done?

  9. #9
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,470
    Quote Originally Posted by Trionet View Post
    Okay so I had an idea. I'm using glfw and have a game loop updating 60 times per second. I won't use my uint value until the frame/update after the one I was rendering. Is it garanteed that everything is draw call I have made this far are done when glfwSwapBuffers is done and the next frame/update begin?
    No. Buffer swaps can be pipelined.

    Quote Originally Posted by Trionet View Post
    Or is it such that glMapBuffer waits for all of the computer's program's gpu calls to be done?
    If you don't specify an unsynchronised mapping, it will have to wait for any commands which might modify the buffer, and there will be limitations as to how "smart" this is (e.g. if the buffer is bound as an SSBO, the implementation isn't necessarily going to analyse whether a particular shader will write to it).

    You could try using a query or a sync object to detect when specific commands have completed, and poll that to determine whether it's safe to map the buffer.

  10. #10
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,155
    Quote Originally Posted by GClements View Post
    Quote Originally Posted by Trionet
    Okay so I had an idea. I'm using glfw and have a game loop updating 60 times per second. I won't use my uint value until the frame/update after the one I was rendering. Is it garanteed that everything is draw call I have made this far are done when glfwSwapBuffers is done and the next frame/update begin?
    No. Buffer swaps can be pipelined.
    For a desktop GPU, the conventional way to deal with that is to put a glFinish after your SwapBuffers call ("and only" after your SwapBuffers call). After that glFinish, you then know that the GPU has processed all of the calls for the previous frame, and performed any frame post-processing required to present the frame you just submitted to the user.

    This also has the benefit of synchronizing your draw thread with the frame clock, which has some nice benefits in terms of providing consistent end-to-end latency through the system.

    Don't do this on most mobile GPUs though.

    Alternatively, use the sync object method GClements suggested.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •