Determining Geometry Shader Output Size

Hello,

Does anyone know how to determine the number of scalars output from a geometry shader? In their programming guide, NVIDIA recommends making this number no greater than 20 for peak performance on the GeForce 8800 GTX. More recent hardware must have a larger upper bound.

I’m not sure if the number of scalars output is as simple as the number of vertices times the size of each vertex. How do you determine the size of each vertex? Does an output with the flat qualifier count? Does every member in gl_PerVertex count? Or just the ones I write to?


out gl_PerVertex {
  vec4 gl_Position;
  float gl_PointSize;
  float gl_ClipDistance[];
};

The reason I ask is because if you are doing something simple like point sprites in a geometry shader, the output is at least gl_Position (4 scalars) and a texture coordinate (2 scalars) per vertex, for a total of 24 scalars - above NVIDIA’s limit of 20. Unless, you can compute the texture coordinate in the fragment shader and this results outputting less.

Regards,
Patrick

my geometry shader is:

layout(lines_adjacency, invocations =yyy) in;
layout(lines, max_vertices = xxx) out;

out geomdata
{
vec4 color;//4 components
vec4 texcoord;//4 components
}GeomData;

void main()
{
float step = 1.0/(xxx/2);
float t=0;
for(int i=0; i<xxx/2; i++)
{
gl_Position = gl_in[0].gl_Position+tgl_in[1].gl_Position+
t
tgl_in[2].gl_Position+tttgl_in[3].gl_Position;
GeomData.color = vec4(1,0,0,1);
GeomData.texcoord = vec4(t,t,0,1);
EmitVertex();

   gl_Position = gl_in[0].gl_Position+t*gl_in[1].gl_Position+
       t*t*gl_in[2].gl_Position+t*t*t*gl_in[3].gl_Position;
   GeomData.color = vec4(1,0,0,1);
   GeomData.texcoord = vec4(t,t,0,1);
   EmitVertex();
}

}

number of all output varing variables is
color texcoord gl_Position total
4+ 4+ 4 = 12

query MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS is 1024, then
max_vertices xxx = MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS/12 = 85.

if you want xxx = 240, you can use invocations yyy = 3 and max_vertices xxx = 80.

the output is at least gl_Position (4 scalars) and a texture coordinate (2 scalars) per vertex, for a total of 24 scalars - above NVIDIA’s limit of 20.

That’s more of an GLSL problem. GLSL defines gl_Position as a vec4, while HLSL allows the user to define it as a vec3 (presumably the hardware fills in a 1.0 for you).

Also, 20 is their limit for “peak performance.” So it’s a question of what you’re giving up in one location for another.