Geometry Shader

From OpenGL.org
(Redirected from Geometry shader)
Jump to: navigation, search
Geometry Shader
Core in version 4.4
Core since version 3.2
ARB extension ARB_geometry_shader4

A Geometry Shader (GS) is a Shader program written in GLSL that governs the processing of Primitives. Geometry shaders reside between the Vertex Shaders (or the optional Tessellation stage) and the fixed-function Vertex Post-Processing stage.

A geometry shader is optional and does not have to be used.

Geometry shader invocations take a single Primitive as input and may output zero or more primitives. There are implementation-defined limits on how many primitives can be generated from a single GS invocation. GS's are written to accept a specific input primitive type and to output a specific primitive type.

While the GS can be used to amplify geometry, thus implementing a crude form of tessellation, this is generally not a good use of a GS. The main reasons to use a GS are:

  • Layered rendering: taking one primitive and rendering it to multiple images without having to change bound rendertargets and so forth.
  • Transform Feedback: This is often employed for doing computational tasks on the GPU (obviously pre-Compute Shader).

In OpenGL 4.0, GS's gained two new features. One was the ability to write to multiple output streams. This is used exclusively with transform feedback, such that different feedback buffer sets can get different transform feedback data.

The other feature was GS instancing, which allows multiple invocations to operate over the same input primitive. This makes layered rendering easier to implement and possibly faster performing, as each layer's primitive(s) can be computed by a separate GS instance.

Note: While geometry shaders have had previous extensions like GL_EXT_geometry_shader4 and GL_ARB_geometry_shader4, these extensions expose the API and GLSL functionality in very different ways from the core feature. This page describes only the core feature.

Primitive in/out specification

Each geometry shader is designed to accept a specific Primitive type as input and to output a specific primitive type. The accepted input primitive type is defined in the shader:

layout(input_primitive​) in;

The input_primitive​ type must match the primitive type used with the rendering command that renders with this shader program. The valid values for input_primitive​, along with the valid OpenGL primitive types, are:

GS input OpenGL primitives vertex count
points​ GL_POINTS​ 1
lines​ GL_LINES​, GL_LINE_STRIP​, GL_LINE_LIST​ 2
lines_adjacency​ GL_LINES_ADJACENCY​, GL_LINE_STRIP_ADJACENCY​ 4
triangles​ GL_TRIANGLES​, GL_TRIANGLE_STRIP​, GL_TRIANGLE_FAN​ 3
triangles_adjacency​ GL_TRIANGLES_ADJACENCY​, GL_TRIANGLE_STRIP_ADJACENCY​ 6

The vertex count is the number of vertices that the GS receives per-input primitive.

The output primitive type is defined as follows:

layout(output_primitive​, max_vertices = vert_count​) out;

The output_primitive​ may be one of the following:

  • points​
  • line_strip​
  • triangle_strip​

These work exactly the same way their counterpart OpenGL rendering modes do. To output individual triangles or lines, simply use EndPrimitive​ (see below) after emitting each set of 3 or 2 vertices.

There must be a max_vertices​ declaration for the output. The number must be a compile-time constant, and it defines the maximum number of vertices that will be written by a single invocation of the GS. It may be no larger than the implementation-defined limit of MAX_GEOMETRY_OUTPUT_VERTICES​. The minimum value for this limit is 256. See the limitations below.

Instancing

GS Instancing
Core in version 4.4
Core since version 4.0
Core ARB extension ARB_gpu_shader5

The GS can also be instanced (this is separate from instanced rendering, as this is localized to the GS). This causes the GS to execute multiple times for the same input primitive. Each invocation of the GS for a particular input primitive will get a different gl_InvocationID​ value. This is useful for layered rendering and outputs to multiple streams (see below).

To use instancing, there must be an input layout qualifier:

layout(invocations = num_instances​) in;

The value of num_instances​ must not be larger than MAX_GEOMETRY_SHADER_INVOCATIONS​ (this will be at least 32). The built-in value gl_InvocationID​ specifies the particular instance of this shader; it will be on the half-open range [0, num_instances​).

The output primitives from instances are ordered by the gl_InvocationID​. So if the user renders two primitives, and has num_instances​ set to 3, then the GS will be called effectively in this order: (prim0, inst0), (prim0, inst1), (prim0, inst2), (prim1, inst0), ... The output primitives from the GS's will be ordered based on that input sequence. All invocations of the first input primitive will execute before any invocations from the second primitive.

Inputs

Geometry shaders take a primitive as input; each primitive is composed of some number of vertices, as defined by the input primitive type in the shader.

The outputs of the vertex shader (or Tessellation Stage, as appropriate) are thus fed to the GS as arrays of variables. These can be organized as individual values or as part of an interface block. Each user-defined input will be an array of the length of the primitive's vertex count. The order of vertices in the input arrays corresponds to the order of the vertices specified by prior shader stages.

Geometry shader inputs may have interpolation qualifiers on them. If they do, then the prior stage's outputs must use the same qualifier.

V · E

Geometry Shaders provide the following built-in input variables:

in gl_PerVertex
{
  vec4 gl_Position;
  float gl_PointSize;
  float gl_ClipDistance[];
} gl_in[];

These variables have only the meaning the prior shader stage(s) that passed them gave them.

There are some GS input values that are based on primitives, not vertices. These are not aggregated into arrays. These are:

in int gl_PrimitiveIDIn;
in int gl_InvocationID;  //Requires GLSL 4.0 or ARB_gpu_shader5
gl_PrimitiveIDIn​
the current input primitive's ID, based on the number of primitives processed by the GS since the current rendering command started.
gl_InvocationID​
the current instance, as defined when instancing geometry shaders.

Layered rendering

Layered rendering is the process of having the GS send specific primitives to different layers of a layered framebuffer. This can be useful for doing cube-based shadow mapping, or even for rendering cube environment maps without having to render the entire scene multiple times.

V · E

Layered rendering in the GS works via two special output variables:

out int gl_Layer;
out int gl_ViewportIndex; //Requires GL 4.1 or ARB_viewport_array.
gl_Layer​
output defines which layer in the layered image the primitive goes to. Each vertex in the primitive must get the same layer index. Note that when rendering to cubemap arrays, the gl_Layer​ value represents layer-faces (the faces within a layer), not the layers of cubemaps.
gl_ViewportIndex​
specifies which viewport index to use with this primitive. This requires GL 4.1 or ARB_viewport_array.
Note: ARB_viewport_array, while technically a 4.1 feature, is widely available on 3.3 hardware, from both NVIDIA and AMD.

Layered rendering can be more efficient with GS instancing, as different GS invocations can process instances in parallel. However, while ARB_viewport_array is often implemented in 3.3 hardware, no 3.3 hardware provides ARB_gpu_shader5 support.

Outputs

Geometry shaders can output as many vertices as they wish (up to the maximum specified by the max_vertices​ layout specification). To provide this, output values in geometry shaders are not arrays. Instead, a function-based interface is used.

GS code writes all of the output values for a vertex, then calls EmitVertex()​. This tells the system to write those output values to where ever it is that output vertices get written. After calling this function, all output variables contain undefined values. So you will need to write to them all again before emitting the next vertex (if there is a next vertex).

Note: You must write to each output variable before every EmitVertex()​ call (for all outputs for a stream for each EmitStreamVertex()​ call).

The GS defines what kind of primitive these vertex outputs represent. The GS can also end a primitive and start a new one, by calling the EndPrimitive()​ function. This does not emit a vertex.

In order to write two independent triangles from a GS, you must write three separate vertices with EmitVertex()​ for the first three vertices, then call EndPrimitive()​ to end the strip and start a new one. Then you write three more vertices with EmitVertex()​.

Output variables are defined as normal for GLSL. They can be grouped into interface blocks or be single values, as appropriate. Output variables can be defined with interpolation qualifiers. The Fragment Shader equivalent interface variables should define the same variables with the same qualifiers.

V · E

Geometry Shaders have the following built-in outputs.

out gl_PerVertex
{
  vec4 gl_Position;
  float gl_PointSize;
  float gl_ClipDistance[];
};

gl_PerVertex​ defines an interface block for outputs. The block is defined without an instance name, so that prefixing the names is not required.

The GS is the final Vertex Processing stage. Therefore, unless rasterization is being turned off, you must write to some of these values. These outputs are always associated with stream 0. So if you're emitting vertices to a different stream, you don't have to write to them.

gl_Position​
the clip-space output position of the current vertex. This value must be written if you are emitting a vertex to stream 0, unless rasterization is off.
gl_PointSize​
the pixel width/height of the point being rasterized. It is only necessary to write to this when outputting point primitives.
gl_ClipDistance​
allows the shader to set the distance from the vertex to each clip plane. A positive distance means that the vertex is inside/behind the clip plane, and a negative distance means it is outside/in front of the clip plane. In order to use this variable, the user must manually redeclare it (and therefore the interface block) with an explicit size.

Certain predefined outputs have special meaning and semantics.

out int gl_PrimitiveID;

The primitive ID will be passed to the fragment shader. The primitive ID for a particular line/triangle will be taken from the provoking vertex of that line/triangle, so make sure that you are writing the correct value for the right provoking vertex.

The meaning for this value is whatever you want it to be. However, if you want to match the standard OpenGL meaning (ie: what the Fragment Shader would get if no GS were used), just do this for each vertex before emitting it.:

gl_PrimitiveID = gl_PrimitiveIDIn;

Layered rendering

Layered rendering is the process of having the GS send specific primitives to different layers of a layered framebuffer. This can be useful for doing cube-based shadow mapping, or even for rendering cube environment maps without having to render the entire scene multiple times.

V · E

Layered rendering in the GS works via two special output variables:

out int gl_Layer;
out int gl_ViewportIndex; //Requires GL 4.1 or ARB_viewport_array.

The gl_Layer​ output defines which layer in the layered image the primitive goes to. Each vertex in the primitive must get the same layer index. Note that when rendering to cubemap arrays, the gl_Layer​ value represents layer-faces (the faces within a layer), not the layers of cubemaps.

gl_ViewportIndex​, which requires GL 4.1 or ARB_viewport_array, specifies which viewport index to use with this primitive.

Note: ARB_viewport_array, while technically a 4.1 feature, is widely available on 3.3 hardware, from both NVIDIA and AMD.

Layered rendering can be more efficient with GS instancing, as different GS invocations can process instances in parallel. However, while ARB_viewport_array is often implemented in 3.3 hardware, no 3.3 hardware provides ARB_gpu_shader5 support.

Which vertex

gl_Layer​ and gl_ViewportIndex​ are per-vertex parameters, but they specify a property that applies to the entire primitive. Therefore, a question arises: which vertex in a particular primitive defines that primitive's layer and viewport index?

The answer is that it is implementation-dependent. However, OpenGL does have two queries to determine which one the current implementation uses: GL_LAYER_PROVOKING_VERTEX​ and GL_VIEWPORT_INDEX_PROVOKING_VERTEX​.

The value returned from glGetIntegerv will be one of the following enumerators:

  • GL_PROVOKING_VERTEX​: The vertex used will track the current provoking vertex convention.
  • GL_LAST_VERTEX_CONVENTION​​: The vertex used will be the one defined by the last-vertex provoking vertex convention.
  • GL_FIRST_VERTEX_CONVENTION​​: The vertex used will be the one defined by the first-vertex provoking vertex convention.
  • GL_UNDEFINED_VERTEX​: The implementation isn't saying.

For maximum portability, you will have to provide the same layer and viewport index to each primitive. So if you wanted to output a triangle strip, where different triangles had different indices, too bad. You have to split it into different primitives.

Output streams

Output streams
Core in version 4.4
Core since version 4.0
Core ARB extension ARB_transform_feedback3

When using Transform Feedback to compute values, it is often useful to be able to send different sets of vertices to different buffers at different rates. For example, GS's can send vertex data to one stream, while building per-instance data in another stream. The vertex data and per-instance data will be of different lengths, written at different speeds.

Multiple stream output requires that the output primitive type be points​. You can still take whatever input you prefer.

To provide this, output variables can be given a stream index with a layout qualifier:

layout(stream = stream_index​) out vec4 some_output;

The stream_index​ ranges from 0 to GL_MAX_VERTEX_STREAMS​ - 1.

A default value for the stream can be set with:

layout(stream = 2) out;

All following out​ variables will use stream 2 unless they specify a stream. The default can be changed later. The initial default is 0.

To write a vertex to a particular stream, the function EmitStreamVertex​ is used. This function takes a stream index; only those output variables are written. Similarly, EndStreamPrimitive​ ends a particular stream's primitive. However, since multiple stream output requires using points​ primitives, the latter function is not very useful.

Only values passed to stream 0 will actually be rendered; the rest of the streams will only matter if transform feedback is being used. Calling EmitVertex​ or EndPrimitive​ is equivalent to calling their stream counterparts with stream 0.

Output limitations

There are two competing limitations on the output of a geometry shader:

  1. The maximum number of vertices that a single invocation of a GS can output.
  2. The total maximum number of output components that a single invocation of a GS can output.

The first limit, defined by GL_MAX_GEOMETRY_OUTPUT_VERTICES​, is the maximum number that can be provided to the max_vertices​ output layout qualifier. No single geometry shader invocation can exceed this number.

The other limit, defined by GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS​ is, in layman's terms, the total amount of stuff that a single GS invocation can write. It is the total number of output values (a component, in GLSL terms, is a component of a vector. So a float​ is one component; a vec3​ is 3 components) that a single GS invocation can write to. This is different from GL_MAX_GEOMETRY_OUTPUT_COMPONENTS​ (the maximum allowed number of components in out​ variables). The total output component is the total number of components + vertices that can be written.

For example, if the total output component count is 1024 (the smallest maximum value from GL 4.3), and the output stream writes to 12 components, the total number of vertices that can be written is 1024/12 = 85. This is the absolute hard limit to the number of vertices that can be written. Even if GL_MAX_GEOMETRY_OUTPUT_VERTICES​ is larger than 85, because this vertex shader writes 12 components per vertex, the true maximum that this geometry shader can write is 85 vertices. If the geometry shader instead wrote only 8 components per vertex, then it could write 128 (subject to the output vertices limit, of course).

See also