Difference between revisions of "Transform Feedback"

From OpenGL.org
Jump to: navigation, search
m (moving stub.)
(Major update. Still unfinished.)
Line 11: Line 11:
 
}}
 
}}
  
'''Transform Feedback''' is the process of altering the rendering pipeline so that primitives processed by a [[Vertex Shader]] and optionally a [[Geometry Shader]] will be written to [[Buffer Objects|buffer objects]]. This allows one to preserve the post-transform rendering state of an object and resubmit this data multiple times.
+
'''Transform Feedback''' is the process of altering the rendering pipeline so that primitives generated by [[Vertex Processing]] will be written to [[Buffer Objects|buffer objects]]. This allows one to preserve the post-transform rendering state of an object and resubmit this data multiple times.
 +
 
 +
{{note|Mention will be made of various functions that deal with multiple stream output. Feedback into multiple streams requires access to {{require|4.0|transform_feedback3}} and {{extref|gpu_shader5}}. So if that is not available to you, ignore any such discussion.}}
  
 
{{stub}}
 
{{stub}}
  
== Overview ==
+
== Shader setup ==
  
{{note|Mention will be made of various functions that deal with multiple stream output. Feedback into multiple streams requires access to OpenGL 4.0 or {{extref|transform_feedback3}} and {{extref|gpu_shader5}}. So if that is not available to you, ignore any such discussion.}}
+
In order to capture primitives, the program containing the final [[Vertex Processing]] shader stage must be linked with a number of parameters. These parameters must be set ''before'' linking the program, not after. So if you want to use {{apifunc|glCreateShaderProgram}}, you will have to use the [[#In-shader specification|in-shader specification API]], which is only available with {{require|4.4|enhanced_layouts}}.
  
== Shader setup ==
+
{{note|{{extref|transform_feedback|NV}} allows these parameters to be set dynamically after linking.}}
 +
 
 +
The only program object that matters for transform feedback is the one that provides the last [[Vertex Processing]] shader stage. These settings can be on any other program in a separate program, but they will only affect the last active [[Vertex Processing]] shader stage.
 +
 
 +
Transform feedback can operate in one of two capturing modes. In interleaved mode, all captured outputs go into the same buffer, interleaved with one another. In separate mode, each captured output goes into a separate buffer. This must be selected on a program object at link time.
 +
 
 +
{{note|separate capturing mode does not necessarily mean that it captures into different [[Buffer Object]]s. It simply means that they are captured into separate transform feedback binding points. Different regions of the same buffer can be bound to the different binding points.}}
 +
 
 +
To define the capture settings for a program, as well as which output variables are captured, use the following function:
 +
 
 +
{{funcdef|void {{apifunc|glTransformFeedbackVaryings}}(GLuint {{param|program}}, GLsizei {{param|count}}, const char **{{param|varyings}}, GLenum {{param|bufferMode}});}}
 +
 
 +
The {{param|bufferMode}} is the capturing mode. It must be {{enum|GL_INTERLEAVED_ATTRIBS}} or {{enum|GL_SEPARATE_ATTRIBS}}
 +
 
 +
The {{param|count}} is the number of strings in the {{param|varyings}} array. This array specifies the name of the output variables of the appropriate shader stage to capture. The order that the variables are provided in defines the order in which they are captured. The names of these variables [[Program Introspection#Naming|conform to the standard rules for naming GLSL variables]].
 +
 
 +
There are [[#Limitations|limitations]] on the number of outputs that can be captured, as well as how many total components can be captured. Within these limits, any output variables can be captured, including members of [[Interface Block (GLSL)|interface blocks]] and structs. Array outputs can be captured too.
 +
 
 +
=== Captured data format ===
 +
 
 +
Transform feedback explicitly captures [[Primitive]]s. While it does capture data for each vertex, it does so after spiting each primitive up into its separate primitives. So if you're rendering using {{enum|GL_TRIANGLE_STRIP}}, and you render with 6 vertices, that yields 4 triangles. Transform feedback will capture 4 triangles worth of data. Since each triangle has 3 vertices, TF will capture ''12'' vertices, not the 6 you might expect from the rendering command.
 +
 
 +
The order that the vertex data is written in is in the vertex order after primitive assembly. This means that, for example, when drawing with triangle strips, the captured primitive will switch the winding for vertex components every other triangle. This ensures that [[Face Culling]] will still respect the ordering provided in the initial array of vertex data.
 +
 
 +
Within each vertex, the data is written in the order specified by the {{param|varyings}} array (when doing interleaved rendering). If an output variable is an aggregate (struct/array), then each member of the aggregate is written in order. Each component of the basic type of the output is written in order. Ultimately, all of the data is written tightly packed together (though capturing double-precision floats may cause padding).
 +
 
 +
A component will always be a float/double, signed integer, or unsigned integer, using the [[OpenGL Type|sizes for {{code|GLfloat}}/{{code|GLdouble}}, {{code|GLint}}, and {{code|GLuint}}]]. No packing or normalization is performed. The transform feedback system does not have an automated analog to the much more flexible [[Vertex Specification#Vertex format|vertex format specification]] system.
 +
 
 +
This of course does not prevent you from ''manually'' packing bits into unsigned integers in your shader.
  
 
=== Advanced interleaving ===
 
=== Advanced interleaving ===
 
{{infobox feature
 
{{infobox feature
| name = Advanced Feedback Interleaving
+
| name = Advanced Interleaving
| version = 4.0
+
| core = 4.0
 
| core_extension = {{extref|transform_feedback3}}
 
| core_extension = {{extref|transform_feedback3}}
 
}}
 
}}
  
=== Multi-stream output ===
+
The above settings only provide two modes: either all attributes go into separate buffers or all attributes go into the same buffer. In many cases, it is useful to be able to be able to write several components to one buffer, while writing other components to others.
{{clear float}}
+
 
 +
Also, captured interleaved data is tightly packed, with each variable's components coming immediately after the previous components. It is often useful to be able to skip writing over certain data, if some data changes and other data does not.
 +
 
 +
These can be achieved by the use of special "varying" names in the {{param|varyings}} array. These special names do not name actual output variables; they only cause some particular effect on subsequent writes.
 +
 
 +
These names and their effects are:
 +
 
 +
; gl_NextBuffer
 +
: This causes all subsequent outputs to be routed to the next buffer index. The buffers start at 0 and increment by one each time this is in the {{param|varyings}} list.
 +
: There must not be more of these than the number of buffers that can be bound for use in transform feedback. Therefore, the number of these must be strictly less than {{enum|GL_MAX_TRANSFORM_FEEDBACK_BUFFERS}}.
 +
; gl_SkipComponents#
 +
: This causes the system to skip writing # number of components, where # may be from 1 to 4. The memory covered by the skipped components will not be modified. Each component in this case is the size of a float.
 +
: Note that components skipped in this way ''still count'' the limitation on the number of components being output ''still counts'' skipped components.
 +
 
 +
Output variables in the [[Geometry Shader]] can be declared to go to a [[Geometry_Shader#Output_streams|particular stream]]. This is controlled via an in-shader specification, but there are certain limitations that affect advanced component interleaving.
 +
 
 +
No two outputs that go to different streams can be captured by the same buffer. Attempting to do so will result in a linker error. So using multiple streams with interleaved writing requires using advanced interleaving to route attributes to different buffers.
 +
 
 +
Note that this ability effectively makes separate capture mode superfluous. It is a functional superset of what separate mode can do, since it can capture one output to each buffer individually.
 +
 
 +
=== Doubles and alignment ===
 
{{infobox feature
 
{{infobox feature
| name = Multi-Stream Output
+
| name = Double-precision Alignment
| version = 4.0
+
| core = 4.0
| core_extension = {{extref|transform_feedback3}}
+
| core_extension = {{extref|gpu_shader_fp64}}
 
}}
 
}}
 +
 +
The alignment of single-precision floats and integers is 4 bytes. However, the alignment of double-precision values is 8 bytes. This causes a problem when it comes to capturing transform feedback data.
 +
 +
The alignment of components ''must'' be ensured. This is trivially ensured with floats and integers, but doubles require special care. Exactly what you have to do has changed in various OpenGL versions.
 +
 +
From OpenGL 4.0 (when doubles were first introduced to GLSL) to OpenGL 4.3, the specification states that it is up to the user to ensure 8-byte alignment of double precision data. Specifically, you must ensure two things:
 +
 +
* Every double-precision component begins on an 8-byte boundary. You may need to insert padding where needed, using the [[#Advanced interleaving|spacing functionality below]].
 +
* All of the vertex data going to a particular buffer that includes a double-precision component must have a total vertex data size aligned to 8 bytes. This ensures that the second vertex will start on an 8 byte boundary. You may therefore need to add padding to the end of vertex data.
 +
 +
For example, if you want to capture the following, in the order defined here:
 +
 +
<source lang="cpp">
 +
out DataBlock
 +
{
 +
  float var1;
 +
  dvec2 someDoubles;
 +
  float var3;
 +
};
 +
</source>
 +
 +
This is the sequence of strings that you will need in your {{param|varyings}} data if you want to capture it in the order of definition:
 +
 +
* "DataBlock.var1"
 +
* "gl_SkipComponents1": Padding the next component to 8-byte alignment.
 +
* "DataBlock.someDoubles"
 +
* "DataBlock.var3"
 +
* "gl_SkipComponents1": Padding out the entire vertex structure to 8-byte alignment.
 +
 +
If you do not do this, you get undefined behavior. You could avoid the padding just by changing the order you capture them. You don't have to change the order you define them in the shader.
 +
 +
OpenGL 4.4 does all of this padding ''automatically''. It will insert component skipping and so forth for you.
 +
 +
This will only break your existing 4.3 code if your weren't inserting the needed padding yourself. Which means your code was already relying on undefined behavior, and therefore was broken.
 +
 +
=== In-shader specification ===
 +
{{infobox feature
 +
| name = In-shader Specification
 +
| core = 4.4
 +
| core_extension = {{extref|enhanced_layout}}
 +
}}
 +
 +
Shaders can define which outputs are captured by transform feedback and exactly how they are captured. When a shader defines them, querying the program for the mode of transform feedback will return interleaved mode (since the [[#Advacned interleaving|advanced interleaving]] makes separate mode a complete subset of interleaved mode).
 +
 +
{{pagelink|Layout Qualifiers (GLSL)|Layout qualifiers}} can be used to define which output variables are captured in {{pagelink|Transform Feedback}} operations. When these qualifiers are set in a shader, they completely override any attempt to set the transform feedback outputs from OpenGL via {{apifunc|glTransformFeedbackVaryings}}.
 +
 +
Any output variable or output [[Interface Block (GLSL)|interface block]] declared with the {{code|xfb_offset}} will be part of the transform feedback output. This qualifier must be specified with an integer byte offset. The offset is the number of bytes from the beginning of a vertex to be written to the current buffer to this particular output variable.
 +
 +
The offsets of contained values (whether in arrays, structs, or members of the interface block) are computed, based on the sizes of prior components to pack them in the order specified. Any explicitly provided offsets are not allowed to violate alignment restrictions. So if a definition contains a double (either directly or indirectly), the offset must be 8-byte aligned.
 +
 +
Members of interface blocks can have their offsets specified directly on them, which overrides any computed offsets. Also, all members of an interface block are not required to be written to outputs (though that will happens if you set the {{code|xfb_offset}} on the block itself). Stream assignments for a geometry shader are required to be the same for all members of a block, but offsets are not.
 +
 +
Different variables being captured are assigned to buffer binding indices. Offset assignments are separate for the separate buffers. It is a linker error for two variables captured by the same buffer to have overlapping byte offsets, whether automatically computed or explicitly assigned.
 +
 +
 +
 +
 +
{{note|When using {{extref|enhanced_layout}}, if {{extref|transform_feedback3}} is not also available, you may only output to a single buffer. You can still use offsets to put space between vertex attribute data, but you cannot set {{code|xbf_buffer}} to any value other than 0.}}
  
 
== Buffer binding ==
 
== Buffer binding ==
 +
 +
{{note|If the captured values for a buffer include double-precision values, you ''must'' make sure that the offset provided to {{apifunc|glBindBufferRange}} is aligned to 8 bytes.}}
  
 
== Feedback process ==
 
== Feedback process ==
Line 50: Line 160:
  
 
=== Feedback pausing and resuming ===
 
=== Feedback pausing and resuming ===
 +
 +
=== Limitations ===
 +
 +
When using separate capture, there is a limitation on the total number of variables that can be captured. This is {{enum|GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS}}, which will be at least 4. Also, there is a limit to the number of components that any particular variable can contain. This is {{enum|GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS}}, which will be at least 4. If these limits are exceeded, a program linking error will result.
 +
 +
When using interleaved capture, the limit is the total number of components that can be captured. This is {{enum|GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS}}, which must be at least 64.
 +
 +
When using [[#Advanced interleaving|advanced interleaving]] to route different variables to different buffers, the limit on the number of available buffers is {{enum|GL_MAX_TRANSFORM_FEEDBACK_BUFFERS}}.
 +
 +
Before {{require|4.0|transform_feedback3}}, the limit on the binding index to {{apifunc|glBindBufferRange}} for {{enum|GL_TRANSFORM_FEEDBACK}} was {{enum|GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS}}, since more than one buffer can only be used with separate outputs. With {{require|4.0|transform_feedback3}}, it is {{enum|GL_MAX_TRANSFORM_FEEDBACK_BUFFERS}}.
  
 
== Reference ==
 
== Reference ==

Revision as of 11:07, 19 August 2013

Transform Feedback
Core in version 4.5
Core since version 3.0
Core ARB extension

ARB_transform_feedback2, ARB_transform_feedback3,

ARB_transform_feedback_instanced
EXT extension EXT_transform_feedback
Vendor extension NV_transform_feedback

Transform Feedback is the process of altering the rendering pipeline so that primitives generated by Vertex Processing will be written to buffer objects. This allows one to preserve the post-transform rendering state of an object and resubmit this data multiple times.

Note: Mention will be made of various functions that deal with multiple stream output. Feedback into multiple streams requires access to OpenGL 4.0 or ARB_transform_feedback3 and ARB_gpu_shader5. So if that is not available to you, ignore any such discussion.

Shader setup

In order to capture primitives, the program containing the final Vertex Processing shader stage must be linked with a number of parameters. These parameters must be set before linking the program, not after. So if you want to use glCreateShaderProgram, you will have to use the in-shader specification API, which is only available with OpenGL 4.4 or ARB_enhanced_layouts.

Note: NV_transform_feedback allows these parameters to be set dynamically after linking.

The only program object that matters for transform feedback is the one that provides the last Vertex Processing shader stage. These settings can be on any other program in a separate program, but they will only affect the last active Vertex Processing shader stage.

Transform feedback can operate in one of two capturing modes. In interleaved mode, all captured outputs go into the same buffer, interleaved with one another. In separate mode, each captured output goes into a separate buffer. This must be selected on a program object at link time.

Note: separate capturing mode does not necessarily mean that it captures into different Buffer Objects. It simply means that they are captured into separate transform feedback binding points. Different regions of the same buffer can be bound to the different binding points.

To define the capture settings for a program, as well as which output variables are captured, use the following function:

void glTransformFeedbackVaryings(GLuint program​, GLsizei count​, const char **varyings​, GLenum bufferMode​);

The bufferMode​ is the capturing mode. It must be GL_INTERLEAVED_ATTRIBS or GL_SEPARATE_ATTRIBS

The count​ is the number of strings in the varyings​ array. This array specifies the name of the output variables of the appropriate shader stage to capture. The order that the variables are provided in defines the order in which they are captured. The names of these variables conform to the standard rules for naming GLSL variables.

There are limitations on the number of outputs that can be captured, as well as how many total components can be captured. Within these limits, any output variables can be captured, including members of interface blocks and structs. Array outputs can be captured too.

Captured data format

Transform feedback explicitly captures Primitives. While it does capture data for each vertex, it does so after spiting each primitive up into its separate primitives. So if you're rendering using GL_TRIANGLE_STRIP, and you render with 6 vertices, that yields 4 triangles. Transform feedback will capture 4 triangles worth of data. Since each triangle has 3 vertices, TF will capture 12 vertices, not the 6 you might expect from the rendering command.

The order that the vertex data is written in is in the vertex order after primitive assembly. This means that, for example, when drawing with triangle strips, the captured primitive will switch the winding for vertex components every other triangle. This ensures that Face Culling will still respect the ordering provided in the initial array of vertex data.

Within each vertex, the data is written in the order specified by the varyings​ array (when doing interleaved rendering). If an output variable is an aggregate (struct/array), then each member of the aggregate is written in order. Each component of the basic type of the output is written in order. Ultimately, all of the data is written tightly packed together (though capturing double-precision floats may cause padding).

A component will always be a float/double, signed integer, or unsigned integer, using the sizes for GLfloat​/GLdouble​, GLint​, and GLuint​. No packing or normalization is performed. The transform feedback system does not have an automated analog to the much more flexible vertex format specification system.

This of course does not prevent you from manually packing bits into unsigned integers in your shader.

Advanced interleaving

Advanced Interleaving
Core in version 4.5
Core since version 4.0
Core ARB extension ARB_transform_feedback3

The above settings only provide two modes: either all attributes go into separate buffers or all attributes go into the same buffer. In many cases, it is useful to be able to be able to write several components to one buffer, while writing other components to others.

Also, captured interleaved data is tightly packed, with each variable's components coming immediately after the previous components. It is often useful to be able to skip writing over certain data, if some data changes and other data does not.

These can be achieved by the use of special "varying" names in the varyings​ array. These special names do not name actual output variables; they only cause some particular effect on subsequent writes.

These names and their effects are:

gl_NextBuffer
This causes all subsequent outputs to be routed to the next buffer index. The buffers start at 0 and increment by one each time this is in the varyings​ list.
There must not be more of these than the number of buffers that can be bound for use in transform feedback. Therefore, the number of these must be strictly less than GL_MAX_TRANSFORM_FEEDBACK_BUFFERS.
gl_SkipComponents#
This causes the system to skip writing # number of components, where # may be from 1 to 4. The memory covered by the skipped components will not be modified. Each component in this case is the size of a float.
Note that components skipped in this way still count the limitation on the number of components being output still counts skipped components.

Output variables in the Geometry Shader can be declared to go to a particular stream. This is controlled via an in-shader specification, but there are certain limitations that affect advanced component interleaving.

No two outputs that go to different streams can be captured by the same buffer. Attempting to do so will result in a linker error. So using multiple streams with interleaved writing requires using advanced interleaving to route attributes to different buffers.

Note that this ability effectively makes separate capture mode superfluous. It is a functional superset of what separate mode can do, since it can capture one output to each buffer individually.

Doubles and alignment

Double-precision Alignment
Core in version 4.5
Core since version 4.0
Core ARB extension ARB_gpu_shader_fp64

The alignment of single-precision floats and integers is 4 bytes. However, the alignment of double-precision values is 8 bytes. This causes a problem when it comes to capturing transform feedback data.

The alignment of components must be ensured. This is trivially ensured with floats and integers, but doubles require special care. Exactly what you have to do has changed in various OpenGL versions.

From OpenGL 4.0 (when doubles were first introduced to GLSL) to OpenGL 4.3, the specification states that it is up to the user to ensure 8-byte alignment of double precision data. Specifically, you must ensure two things:

  • Every double-precision component begins on an 8-byte boundary. You may need to insert padding where needed, using the spacing functionality below.
  • All of the vertex data going to a particular buffer that includes a double-precision component must have a total vertex data size aligned to 8 bytes. This ensures that the second vertex will start on an 8 byte boundary. You may therefore need to add padding to the end of vertex data.

For example, if you want to capture the following, in the order defined here:

out DataBlock
{
  float var1;
  dvec2 someDoubles;
  float var3;
};

This is the sequence of strings that you will need in your varyings​ data if you want to capture it in the order of definition:

  • "DataBlock.var1"
  • "gl_SkipComponents1": Padding the next component to 8-byte alignment.
  • "DataBlock.someDoubles"
  • "DataBlock.var3"
  • "gl_SkipComponents1": Padding out the entire vertex structure to 8-byte alignment.

If you do not do this, you get undefined behavior. You could avoid the padding just by changing the order you capture them. You don't have to change the order you define them in the shader.

OpenGL 4.4 does all of this padding automatically. It will insert component skipping and so forth for you.

This will only break your existing 4.3 code if your weren't inserting the needed padding yourself. Which means your code was already relying on undefined behavior, and therefore was broken.

In-shader specification

In-shader Specification
Core in version 4.5
Core since version 4.4
Core ARB extension ARB_enhanced_layout

Shaders can define which outputs are captured by transform feedback and exactly how they are captured. When a shader defines them, querying the program for the mode of transform feedback will return interleaved mode (since the advanced interleaving makes separate mode a complete subset of interleaved mode).

Layout qualifiers can be used to define which output variables are captured in Transform Feedback operations. When these qualifiers are set in a shader, they completely override any attempt to set the transform feedback outputs from OpenGL via glTransformFeedbackVaryings.

Any output variable or output interface block declared with the xfb_offset​ will be part of the transform feedback output. This qualifier must be specified with an integer byte offset. The offset is the number of bytes from the beginning of a vertex to be written to the current buffer to this particular output variable.

The offsets of contained values (whether in arrays, structs, or members of the interface block) are computed, based on the sizes of prior components to pack them in the order specified. Any explicitly provided offsets are not allowed to violate alignment restrictions. So if a definition contains a double (either directly or indirectly), the offset must be 8-byte aligned.

Members of interface blocks can have their offsets specified directly on them, which overrides any computed offsets. Also, all members of an interface block are not required to be written to outputs (though that will happens if you set the xfb_offset​ on the block itself). Stream assignments for a geometry shader are required to be the same for all members of a block, but offsets are not.

Different variables being captured are assigned to buffer binding indices. Offset assignments are separate for the separate buffers. It is a linker error for two variables captured by the same buffer to have overlapping byte offsets, whether automatically computed or explicitly assigned.



Note: When using ARB_enhanced_layout, if ARB_transform_feedback3 is not also available, you may only output to a single buffer. You can still use offsets to put space between vertex attribute data, but you cannot set xbf_buffer​ to any value other than 0.

Buffer binding

Note: If the captured values for a buffer include double-precision values, you must make sure that the offset provided to glBindBufferRange is aligned to 8 bytes.

Feedback process

Feedback objects

Transform Feedback Objects
Core in version 4.5
Core since version 4.0
Core ARB extension ARB_transform_feedback2

Feedback rendering

Feedback pausing and resuming

Limitations

When using separate capture, there is a limitation on the total number of variables that can be captured. This is GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS, which will be at least 4. Also, there is a limit to the number of components that any particular variable can contain. This is GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS, which will be at least 4. If these limits are exceeded, a program linking error will result.

When using interleaved capture, the limit is the total number of components that can be captured. This is GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS, which must be at least 64.

When using advanced interleaving to route different variables to different buffers, the limit on the number of available buffers is GL_MAX_TRANSFORM_FEEDBACK_BUFFERS.

Before OpenGL 4.0 or ARB_transform_feedback3, the limit on the binding index to glBindBufferRange for GL_TRANSFORM_FEEDBACK was GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS, since more than one buffer can only be used with separate outputs. With OpenGL 4.0 or ARB_transform_feedback3, it is GL_MAX_TRANSFORM_FEEDBACK_BUFFERS.

Reference