Concerning glEndTransformFeedback() performance

I’m experiencing an issue where glEndTransformFeedback() is severely limiting my performance. It appears to be blocking until all previously submitted commands are completed. In fact, if I call glFinish() just before the transform feedback operations, then the timing of glEndTransformFeedback() becomes much more reasonable. I’ve simplified my usage of the Transform Feedback to not depend on any GL state other than a tiny 1 element float vbo for input and another 1 element float vbo for output. I’m ping ponging between two copies of these vbos after each frame.

I’m running on OSX 10.11.6 with a Nvidia GT 750M. Here’s an example of the timing information I got from OpenGL Profiler on mac:

1.44 µs glBindVertexArray(13);
0.52 µs glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER_EXT, 0, 14);
9.62 µs glBeginTransformFeedback(GL_ZERO);

35.86 µs glDrawArrays(GL_POINTS, 0, 1);
23336.92 µs glEndTransformFeedback();

Some of the calls take less time:
19576.58 µs glEndTransformFeedback();
6494.19 µs glEndTransformFeedback();
6464.62 µs glEndTransformFeedback();
3634.65 µs glEndTransformFeedback();
20847.59 µs glEndTransformFeedback();

I’m not writing or reading from the transform feedback buffer at all on the CPU, and I’m ping ponging between two different buffers each frame. The buffers are initialized with GL_STATIC_DRAW.

Anyone have any ideas why glEndTransformFeedback() appear to trigger the CPU to wait for the GPU commands to complete? I don’t see this behavior documented anywhere.

Kris

From the specs, transform feedback should operate asynchronously. From https://www.opengl.org/registry/specs/EXT/transform_feedback.txt:

“This extension introduces new query object support to allow transform feedback mode to operate asynchronously. Query objects allow applications to determine when transform feedback results are complete, as well as the number of primitives processed and written back to buffer objects while in transform feedback mode.”

[QUOTE=Silence;1283528]From the specs, transform feedback should operate asynchronously. From https://www.opengl.org/registry/specs/EXT/transform_feedback.txt:

“This extension introduces new query object support to allow transform feedback mode to operate asynchronously. Query objects allow applications to determine when transform feedback results are complete, as well as the number of primitives processed and written back to buffer objects while in transform feedback mode.”[/QUOTE]

Thanks for clarifying. I saw that too. Unfortunately, I think it turns out it’s just really bad on mac. I created this test case and someone in ##OpenGL ran it on linux.

On my machine it takes 5-30 milliseconds per iteration, and it only took 55 microseconds on their machine.