PDA

View Full Version : Transform Feedback: batch several feedbacks together



Utumno
05-10-2017, 09:23 AM
Target: OpenGL ES >= 3.0.

Here's what my app does:


generateSeveralMeshes()
setupStuff();

for (each Mesh)
{
glBindBufferBase(GLES30.GL_TRANSFORM_FEEDBACK_BUFF ER, 0, myBuf);
glBeginTransformFeedback( GLES30.GL_POINTS);
callOpenGLToGetTransformFeedback();
glMapBufferRange(GLES30.GL_TRANSFORM_FEEDBACK_BUFF ER, ...) // THE PROBLEM
computeStuffDependantOnVertexAttribsGottenBack();
glUnmapBuffer(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER) ;
glEndTransformFeedback();
glBindBufferBase(GLES30.GL_TRANSFORM_FEEDBACK_BUFF ER, 0, 0);

renderTheMeshAsNormal();
}

i.e. for each Mesh, it first uses the Vertex Shader to compute some per-vertex stuff, gets the stuff back to CPU, based on that makes some decisions, and only then renders the Mesh.

This works, the problem is speed. We've been testing on several OpenGL ES 3.0, 3.1, 3.2-based devices, and on each one the story looks the same: the 'glMapBufferRange()' call cuts the FPS to about half!

I suspect that without glMapBufferRange(), OpenGL can render 'lazily' , i.e. batch up several renders together and do them at its own convenience, whereas if we call glMapBufferRange(), it really needs to render now which probably makes it slow (the amount of data that we get back is quite small, I really don't think this is the problem).

Thus, I'd like to batch up my Transform Feedback as well, like this:


generateSeveralMeshes()
setupStuff();

for (each Mesh)
{
glBindBufferBase(GLES30.GL_TRANSFORM_FEEDBACK_BUFF ER, 0, myLargerBuf);
glBeginTransformFeedback( GLES30.GL_POINTS);
setupOpenGLtoSaveTransformFeedbackToSpecificOffset ();
callOpenGLToGetTransformFeedback();
advanceOffset();
glEndTransformFeedback();
glBindBufferBase(GLES30.GL_TRANSFORM_FEEDBACK_BUFF ER, 0, 0);

renderTheMeshAsNormal();
}

glMapBufferRange(GLES30.GL_TRANSFORM_FEEDBACK_BUFF ER, ...)
computeStuffDependantOnVertexAttribsGottenBackInOn eBatch();
glUnmapBuffer(GLES30.GL_TRANSFORM_FEEDBACK_BUFFER) ;

The problem is that I don't know how to tell OpenGL to save the Transform Feedback output not to the beginning, but to a specific offset in the TRANSFORM_FEEDBACK_BUFFER (so that I can later on, after the loop, lay my hands on all TF data gotten back in one go).

Any advice?

Utumno
05-10-2017, 10:07 AM
Hmm, reading the OpenGL ES spec - wouldn't that simply be a call to 'glBindBufferRange(GLES30.GL_TRANSFORM_FEEDBACK_BU FFER, 0, myLargerBuf, offset, bufferSizeGoodForCurrentMesh)' ? (in place of 'glBindBufferBase()' )

john_connor
05-10-2017, 11:27 AM
for each Mesh, it first uses the Vertex Shader to compute some per-vertex stuff, gets the stuff back to CPU, based on that makes some decisions, and only then renders the Mesh.

This works, the problem is speed. We've been testing on several OpenGL ES 3.0, 3.1, 3.2-based devices, and on each one the story looks the same: the 'glMapBufferRange()' call cuts the FPS to about half!

downloading data from OpenGL buffers can be very expensive due to implicit syncronization. OpenGL runs mostly asyncronous, when you want to download bufferdata from a buffer that you used to render stuff into previously, the buffer MUST reflect the data written by shaders, so OpenGL is forced to complete the rendering before it gives you the bufferdata you want

1 solution to that would be double-buffering: not the framebuffer, but the transform feedback buffer. each odd frame, feedback into buffer A and read from buffer B, each even frame feedback into buffer B and read from buffer A



I suspect that without glMapBufferRange(), OpenGL can render 'lazily' ...

without glMapBufferRange(), glMapBuffer() or glGetBufferSubData() you're not downloading bufferdata (and not forcing GL to syncronize here)

https://www.khronos.org/opengl/wiki/Synchronization#Implicit_synchronization

Any attempt to read from a framebuffer to CPU memory (not to a buffer object) will halt until all rendering commands affecting that framebuffer have completed. Most attempts to write to a buffer object, either with glBufferSubData or mapping, will halt until all rendering commands using that part of the buffer object have completed. However, if you invalidate the buffer object before uploading to it, the implementation will be able to allocate new storage for the buffer and simply orphan the old one (deleting it later when it is no longer used). This will allow the buffer object to be immediately available for uploading new data. For more details, see this page on buffer streaming. https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming#Buffer_re-specification



The problem is that I don't know how to tell OpenGL to save the Transform Feedback output not to the beginning, but to a specific offset in the TRANSFORM_FEEDBACK_BUFFER

you could use glBindBufferRange() using the offset when you generate you transform feedback object. by the way, if you dont glEndTransformFeedback(), all the data will be captured consecutively

Utumno
05-10-2017, 05:03 PM
1 solution to that would be double-buffering: not the framebuffer, but the transform feedback buffer. each odd frame, feedback into buffer A and read from buffer B, each even frame feedback into buffer B and read from buffer A


Most excellent idea!

Utumno
05-11-2017, 03:29 AM
John's point about synchronization is very nicely expanded upon here:

https://community.arm.com/graphics/b/blog/posts/the-mali-gpu-an-abstract-machine-part-1---frame-pipelining