Tessellation shader: Order of patch processing and data transfer

Hi people!

This is my first thread, so please be patient :wink: also I do not have much experience with openGL in general, but had to learn several parts for a project at my university. Thus, my problem may seem very basic for you.

I have written a tessellation shader, making use of PN-Triangles, to make too coarse meshes look more round and organic. There is a paper called “Curved PN triangles” describing the mathematical background, if anyone is interested. Basically, a given triangle is subdivided into many small triangles and the new vertices are placed on a baggy cubic surface, which is fitted to the original triangle’s edges. Its basically a bezier surface patch, if I am not mistaking. My implementation tessellates the given triangles (or patches) depending on their distance from the camera. This all works great so far, but is very expensive for scenes with many primitives (I work with triangles only, as you might have already guessed ;)) .

Now I want to save some processing power by pre-computing the control points on the CPU and transferring it to the GPU once. Since my mesh does not deform or stuff, the control points do not change and thus must not be recomputed by the TCS in every frame. My problems are:

  • How can I transfer the array with all my control points (and the control normals) to the GPU memory? The array would have numberFaces * 16 * 3 elements, because per triangle I need 10 float vec3 for the control points and 6 float vec3 for the control normals. Uniforms don’t seem to be able to hold that much data, is a Texture buffer the right approach?
  • How can I determine in the TES, which triangle (or patch) is processed? I need this information to address the right elements in my control points array. Is glPrimitiveID the right answer?

Well, I hope I could explain my problems understandingly and I really appreciate any help.

Regards,
wickeD

How can I transfer the array with all my control points (and the control normals) to the GPU memory?

The way you currently do. Rendering with tessellation on is like rendering with tessellation off, save for the fact that you’re using the GL_PATCH primitive type.

The array would have numberFaces * 16 * 3 elements, because per triangle I need 10 float vec3 for the control points and 6 float vec3 for the control normals. Uniforms don’t seem to be able to hold that much data, is a Texture buffer the right approach?

Given that, each patch would need to have 16 vec3 values. So… do that. Each vertex is just one vec3, so your vertex shader has only one input value. And it simply passes the value through (unless you want to transform it in some way before tessellation, in which case you’d need to detect whether the value was a normal or a position).

You would set glPatchParameter to be 16, and you would have your glDraw call render 16 * numberFaces triangles.

How can I determine in the TES, which triangle (or patch) is processed? I need this information to address the right elements in my control points array.

The “elements in my control points array” should be the input values to the TES. Therefore, your TES intrinsically knows which patch it is in.

Each TES invocation in your case would get 16 vec3 values, which represents the data for the entire patch. It gets its current location within the abstract patch from [var]gl_TessCoord[/var].

The correct approach is to use vertex attributes.

Bear in mind that with your approach, some of your control points are vertices and some are normals. The vertex shader will need to know which is which (e.g. by using the value of gl_VertexID%16).

You might be better off passing 10 vertices per patch, each with a position and a normal (4 of the normals would be unused). This way, you only have 10 invocations of the vertex shader per patch rather than16. The amount of work per invocation probably won’t be any different; if you end up having to distinguish vertices from normals with e.g. “if (gl_VertexID%16<10)”, you’ll probably end up executing both branches for each invocation anyhow.

You don’t need to know. All patches should be processed in the same manner. Each invocation of the TES gets all of the outputs from the TCS, both per-patch outputs and per-invocation outputs, and generates one vertex.

You might be better off passing 10 vertices per patch, each with a position and a normal (4 of the normals would be unused). This way, you only have 10 invocations of the vertex shader per patch rather than16. The amount of work per invocation probably won’t be any different; if you end up having to distinguish vertices from normals with e.g. “if (gl_VertexID%16<10)”, you’ll probably end up executing both branches for each invocation anyhow.

Good idea. Also, normals tend to compress pretty well. So you might even get away with sending them as signed bytes, so that each normal fits into 4 bytes. This means that each vertex consists of 16 bytes: 3 4-byte floats and 3 1-byte signed values. One byte goes unused.

That will help minimize the unused space.