Transform feedback question

Hello,

I try to use feedback buffer to transform vertices that would be sent afterwards for rendering. I want to render points in the feedback buffer than get the transformed vertices back and render triangles with an empty vertex shader.

The reason for this is that I have a pretty intensive vertex shader and a lot of triangles. My GPU seems to have a very poor caching system, as I did the following test: sent a triangle STRIP indexed primitive buffer with 4225 vertices, 8448 indexes (in those 4225 vertices) and noticed 24000+ calls to the vertex shader ( about 2.88 invocations per triangle ). This is not my current geometry, but only a test one.

I intend to send the vertices as points to the transform feedback, invoke the vertex shader and store the outputs in the feedback buffer. Then, set the output feedback buffer to my triangle list renderer with an empty vertex shader just for primitive assembly.
For this to work, I need the vertices, sent to transform feedback via

glBeginTransformFeedback(GL_POINTS);
glDrawArrays(GL_POINTS, nFirst, nCount);

to write the transformed vertexes exactly in the same order they where received.

Or, in other words, if I’d write the gl_VertexID in the output buffer and send GL_POINTS to the transform feedback, when examining the output buffer, I would see a consecutive sequence of 0, 1, 2, … nCount - 1

I want to know whether the OpenGL standard specifically guarantees the order written to the feedback buffer as being exactly the order in which the vertexes are sent when drawing GL_POINTS in feedback mode.

P.S.

"The attributes of the first vertex received after
BeginTransformFeedback are written at the starting offsets of the bound
buffer objects set by BindBufferRange, and subsequent vertex attributes
are appended to the buffer object. "

This was confusing. I need an explicit confirmation for my problem.

“2.1 Execution model: … each vertex is processed independently, in order, and in the same way”

So yes, vertices will be exactly in order in the transform feedback buffer.

Hello,

thank you for the answer. Just did a test in this matter and found no performance gains, on the contrary, about 10% speed drop.

The “2.88” vertex shader invocations per triangle may be explained by the triangle strip primitive having degenerate triangle indices ( such as 2,3,2 ), otherwise would get 3.

Outputting N times of the data of the same vertex does not necessarily mean the vertex shader was invoked N times but may be the result of one invocation and N-1 copies of the data in the transform feedback buffer.

I wanted to post this just for clarification.

Arrange your vertices and indices to optimize vertex cache usage, for example see http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html. Remove degenerate primitives. If you need to start a new triangle strip, use primitive restart instead.