Transform Feedback

From OpenGL Wiki
(Redirected from Feedback With Doubles)
Jump to navigation Jump to search
Transform Feedback
Core in version 4.6
Core since version 3.0
Core ARB extension

ARB_transform_feedback2, ARB_transform_feedback3,

EXT extension EXT_transform_feedback
Vendor extension NV_transform_feedback

Transform Feedback is the process of capturing Primitives generated by the Vertex Processing step(s), recording data from those primitives into Buffer Objects. This allows one to preserve the post-transform rendering state of an object and resubmit this data multiple times.

Note: Mention will be made of various functions that deal with multiple stream output. Feedback into multiple streams requires access to OpenGL 4.0 or ARB_transform_feedback3 and ARB_gpu_shader5. So if that is not available to you, ignore any such discussion.

Shader setup

In order to capture primitives, the program containing the final Vertex Processing shader stage must be linked with a number of parameters. These parameters must be set before linking the program, not after. So if you want to use glCreateShaderProgram, you will have to use the in-shader specification API, which is only available with OpenGL 4.4 or ARB_enhanced_layouts.

Platform Issue (NVIDIA): NV_transform_feedback allows these parameters to be set dynamically after linking.

The only program object that matters for transform feedback is the one that provides the last Vertex Processing shader stage. These settings can be on any other program in a separate program, but only the settings on the last active Vertex Processing shader stage will be used.

Transform feedback can operate in one of two capturing modes. In interleaved mode, all captured outputs go into the same buffer, interleaved with one another. In separate mode, each captured output goes into a separate buffer. This must be selected on a program object at link time.

Note: Separate capturing mode does not necessarily mean that it must captures into different Buffer Objects. It simply means that they are captured into separate buffer binding points. Different regions of the same buffer object can be bound to the different binding points, thus allowing you to use separate capture into the same buffer object. This will work as long as the bound regions don't overlap.

To define the capture settings for a program, as well as which output variables are captured, use the following function:

void glTransformFeedbackVaryings(GLuint program​, GLsizei count​, const char **varyings​, GLenum bufferMode​);

The bufferMode​ is the capturing mode. It must be GL_INTERLEAVED_ATTRIBS or GL_SEPARATE_ATTRIBS

The count​ is the number of strings in the varyings​ array. This array specifies the name of the output variables of the appropriate shader stage to capture. The order that the variables are provided in defines the order in which they are captured. The names of these variables conform to the standard rules for naming GLSL variables.

There are limitations on the number of outputs that can be captured, as well as how many total components can be captured. Within these limits, any output variables can be captured, including outputs of struct types, arrays, members of interface blocks.

Note: This function sets all of the feedback outputs for the program. So if you call it again (before linking), prior settings are forgotten.

Captured data format

Transform feedback explicitly captures Primitives. While it does capture data for each vertex, it does so after splitting each primitive up into separate primitives. So if you're rendering using GL_TRIANGLE_STRIP, and you render with 6 vertices, that yields 4 triangles. Transform feedback will capture 4 triangles worth of data. Since each triangle has 3 vertices, TF will capture 12 vertices, not the 6 you might expect from the drawing command.

Each primitive is written in the order it is processed. Within a primitive, the order of the vertex data written is the the vertex order after primitive assembly. This means that, when drawing with triangle strips for example, the captured primitive will switch the order of two vertices every other triangle. This ensures that Face Culling will still respect the ordering provided in the initial array of vertex data.

Within each vertex, the data is written in the order specified by the varyings​ array (when doing interleaved rendering). If an output variable is an aggregate (struct/array), then each member of the aggregate is written in order. Each component of the basic type of the output is written in order. Ultimately, all of the data is written tightly packed together (though capturing double-precision floats may cause padding).

A component will always be a float/double, signed integer, or unsigned integer, using the sizes for GLfloat/GLdouble, GLint, and GLuint. No packing or normalization is performed. The transform feedback system does not have an automated analog to the much more flexible vertex format specification system.

This of course does not prevent you from manually packing bits into unsigned integers in your shader.

Advanced interleaving

Advanced Interleaving
Core in version 4.6
Core since version 4.0
Core ARB extension ARB_transform_feedback3

The above settings only provide two modes: either all captured outputs go into separate buffers or they all go into the same buffer. In many cases, it is useful to be able to write several components to one buffer, while writing other components to others.

Also, captured interleaved data is tightly packed, with each variable's components coming immediately after the previous components. It is often useful to be able to skip writing over certain data, if some data changes and other data does not.

These can be achieved by the use of special "varying" names in the varyings​ array. These special names do not name actual output variables; they only cause some particular effect on subsequent writes.

These names and their effects are:

This causes all subsequent outputs to be routed to the next buffer index. The buffers start at 0 and increment by one each time this is encountered in the varyings​ list.
There must not be more of these than the number of buffers that can be bound for use in transform feedback. Therefore, the number of these must be strictly less than GL_MAX_TRANSFORM_FEEDBACK_BUFFERS.
This causes the system to skip writing # number of components, where # may be from 1 to 4. The memory covered by the skipped components will not be modified. Each component in this case is the size of a float.
Note that components skipped in this way still count against the limitation on the number of components being output.

Output variables in the Geometry Shader can be declared to go to a particular stream. This is controlled via an in-shader specification, but there are certain limitations that affect advanced component interleaving.

No two outputs that go to different streams can be captured by the same buffer. Attempting to do so will result in a linker error. So using multiple streams with interleaved writing requires using advanced interleaving to route attributes to different buffers.

Note that this ability effectively makes separate capture mode superfluous. Interleaving with these facilities is a functional superset of what separate mode can do, since it can capture one output to each buffer individually.

Doubles and alignment

Double-precision Alignment
Core in version 4.6
Core since version 4.0
Core ARB extension ARB_gpu_shader_fp64

The alignment of single-precision floats and integers is 4 bytes. However, the alignment of double-precision values is 8 bytes. This causes a problem when it comes to capturing transform feedback data.

The alignment of components must be ensured. This is trivially ensured with floats and integers, but doubles require special care. It is up to the user to ensure 8-byte alignment of all double precision data. Specifically, you must ensure two things:

  • Every double-precision component begins on an 8-byte boundary. You may need to insert padding where needed, using the skipping functionality above.
  • All of the vertex data going to a particular buffer that includes a double-precision component must have a total vertex data size aligned to 8 bytes. This ensures that the second vertex will start on an 8 byte boundary. You may therefore need to add padding to the end of vertex data.

For example, if you want to capture the following, in the order defined here:

out DataBlock
  float var1;
  dvec2 someDoubles;
  float var3;

This is the sequence of strings that you will need in your varyings​ data if you want to capture it in the order of definition:

const char *varyings[] =
  "gl_SkipComponents1",     //Padding the next component to 8-byte alignment.
  "gl_SkipComponents1",     //Padding out the entire vertex structure to 8-byte alignment.

If you do not do this, you get undefined behavior. You could avoid the padding just by changing the order you capture them. You don't have to change the order you define them in the shader.

In-shader specification

In-shader Specification
Core in version 4.6
Core since version 4.4
Core ARB extension ARB_enhanced_layouts

Shaders can define which outputs are captured by transform feedback and exactly how they are captured. When a shader defines them, querying the program for the mode of transform feedback will return interleaved mode (since the advanced interleaving makes separate mode a complete subset of interleaved mode).

V · E

Layout qualifiers can be used to define which output variables are captured in Transform Feedback operations. When these qualifiers are set in a shader, they completely override any attempt to set the transform feedback outputs from OpenGL via glTransformFeedbackVaryings.

Any output variable or output interface block declared with the xfb_offset layout qualifier will be part of the transform feedback output. This qualifier must be specified with an integer byte offset. The offset is the number of bytes from the beginning of a vertex to be written to the current buffer to this particular output variable.

The offsets of contained values (whether in arrays, structs, or members of an interface block if the whole block has an offset) are computed, based on the sizes of prior components to pack them in the order specified. Any explicitly provided offsets are not allowed to violate alignment restrictions. So if a definition contains a double (either directly or indirectly), the offset must be 8-byte aligned.

Members of interface blocks can have their offsets specified directly on them, which overrides any computed offsets. Also, all members of an interface block are not required to be written to outputs (though that will happens if you set the xfb_offset on the block itself). Stream assignments for a geometry shader are required to be the same for all members of a block, but offsets are not.

Different variables being captured are assigned to buffer binding indices. Offset assignments are separate for the separate buffers. It is a linker error for two variables captured by the same buffer to have overlapping byte offsets, whether automatically computed or explicitly assigned.

An explicit buffer assignment is made by using the xfb_buffer qualifier on the same declaration as the offset qualifier. This takes an integer which defines the buffer binding index that the captured output(s) is/are associated with. The integer must be less than GL_MAX_TRANSFORM_FEEDBACK_BUFFERS.

Any offsets for global variables or interface blocks that do not specify a buffer explicitly will use the current buffer. The current buffer is set as follows:

layout(xfb_buffer = 1) out;

All following offsets for globals that do not explicitly specify a buffer will use 1 as their buffer. The initial current buffer for a shader is 0.

Variables can have xfb_buffer assigned to them without xfb_offset. This does nothing and will be ignored.

Interface blocks have a special association with buffers. Each interface block is associated with a buffer, regardless of whether any of its members are captured. The buffer is either the current buffer as defined above or a buffer explicitly specified by xfb_buffer.

As previously stated, all members of a block do not have to be captured. However, if any members of a block are captured, they must all be captured to the same buffer. Specifically, the buffer associated with that block. It is an error to use xfb_buffer on a member if the buffer index you provide is different from the index used by the block.

As an example:

layout(xfb_buffer = 2) out; // Default buffer of 2.

out OutputBlock1            // Block buffer index is implicitly 2.
  float val1;
  layout(xfb_buffer = 2, xfb_offset = 0) first;  // The provided index is the same as the block's index.
  layout(xfb_buffer = 1, xfb_offset = 0) other;  // Compile error, due to changing the buffer index for a block member.

Each buffer has the concept of a stride. This represents the byte count from the beginning of one captured vertex to the beginning of the next. It is computed by taking the output with the highest xfb_offset value, adding its size to that offset, and then aligning the computed value to the base alignment of the buffer. The buffer's alignment is 4, unless it captures any double-precision values in which case it is 8. This means you do not need to manually pad structures for alignment, as you did with outside shader setting.

The stride for a buffer can also be explicitly set using the xfb_stride layout qualifier. This allows you to add extra space at the end, perhaps to skip data that will not change. A compilation error will result if the stride you specify is:

  • Too small, given the offsets and computed sizes of the captured data for that buffer.
  • Not properly aligned. It must be at least 4 byte aligned, and it must be 8 byte aligned if the buffer captures any double-precision values.

The stride for a buffer is set as follows:

layout(xfb_buffer = 1, xfb_stride = 32) out; // Sets stride of buffer 1 to 32. Also, sets buffer 1 to be current.

Linking errors will result if any captured outputs within a buffer overlap in space or violate padding. For example:

layout(xfb_buffer = 0) out Data
  layout(xfb_offset = 0) float val1;
  layout(xfb_offset = 4) vec4 val2;
  layout(xfb_offset = 16) float val3;  // Compiler error. val2 covers bytes on the range [4, 20).

Compiler/linker errors will result if you are using Geometry Shader output streams and two outputs from different streams are routed to the same buffer.

Note: When using ARB_enhanced_layouts as an extension (on older hardware), if ARB_transform_feedback3 is not also available, you may only output to a single buffer. You can still use offsets to put space between vertex attribute data, but you cannot set xbf_buffer to any value other than 0.

Buffer binding

Once you have a program with the proper settings to record outputs, you must now set up Buffer Objects to capture these values. Buffer objects and their storage is created in the usual way. What changes is where you use them.

When you wish to begin a transform feedback operation, you must bind one or more buffers to the indexed GL_TRANSFORM_FEEDBACK_BUFFER binding point. This is done through the use of the glBindBufferRange function (or equivalent functions). The offset you provide must be 4-byte aligned, unless the data being captured into a buffer includes a double-precision value. In that case, it must be 8-byte aligned.

The indices you bind the buffer ranges to are the same buffer binding indices used when setting up where different outputs go. In separate output mode, each listed output goes to a different buffer index, assigned sequentially in the order provided by varyings​, starting from 0. In interleaved mode, the outputs are all either recorded to one buffer, or to the buffers specified in the more advanced layout specification mechanisms.

Feedback process

Once buffers have been bound, a feedback operation can begin. You do this by calling this function:

void glBeginTransformFeedback(GLenum primitiveMode​);

This activates transform feedback mode. While transform feedback mode is active (and not paused), if you execute a drawing command, all outputs that are set to be captured by the final vertex processing stage will be recorded to the bound buffers. These will keep track of the last recorded position, so that multiple drawing commands will add to the recorded data.

Undefined behavior results if your bound buffer ranges are not large enough to hold the recorded data.

All feedback buffer binding indices that have outputs assigned to the by the current program must have valid bindings. If they do not, this function will fail with an OpenGL Error.

The primitiveMode​ must be one of GL_POINTS, GL_LINES, or GL_TRIANGLES. These modes define what kinds of primitives are captured by the system. They also put restrictions on the primitive type that reaches the final primitive assembly stage. That primitive basic type's must match the primitiveMode​. The primitive type for a rendering process is defined as follows, in the given order:

  1. If a Geometry Shader is active, then it is the primitive type output by the GS.
  2. If a Tessellation Evaluation Shader is active, then it is the primitive type generated by the tessellation process.
  3. If neither of those are available, then it is the mode​ parameter of the command used to render.

While transform feedback is active (and not paused), there are certainly things that you cannot do:

  • Change GL_TRANSFORM_FEEDBACK_BUFFER buffer bindings.
  • Doing anything which reads from or writes to any part of these buffers (outside of feedback writes, of course).
  • Reallocating storage for any of these buffers. This includes invalidation.
  • Change the current program. So glUseProgram or glBindProgramPipeline cannot be called. This also includes re-linking the appropriate program, as well as glUseProgramStages if the target pipeline is the bound pipeline.

To end feedback mode, call this:


glBindBufferRange(GL_TRANSFORM_FEEDBACK_BUFFER, 0, feedback_buffer, buffer_offset, number_of_bytes);
glDrawElements(GL_POINTS, sizeof(pindices)/sizeof(ushort), GL_UNSIGNED_SHORT, BUFFER_OFFSET(0));

Feedback objects

Transform Feedback Objects
Core in version 4.6
Core since version 4.0
Core ARB extension ARB_transform_feedback2

The set of state needed to perform transform feedback operations can be encapsulated into an OpenGL Object. Transform feedback objects are container objects. These are created and managed in the usual way, with glGenTransformFeedbacks, glDeleteTransformFeedbacks, and so forth.

To bind a transform feedback object, use this:

void glBindTransformFeedback(GLenum target​, GLuint id​);

target​ must always be GL_TRANSFORM_FEEDBACK. You cannot bind a transform feedback object if the current transform feedback object is active and not paused.

The state encapsulated by these objects includes:

  • The generic buffer binding target GL_TRANSFORM_FEEDBACK_BUFFER. So all calls to glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFER, ...) will attach the buffer to the currently bound feedback object.
  • All of the indexed GL_TRANSFORM_FEEDBACK_BUFFER bindings. So all calls to glBindBufferRange(GL_TRANSFORM_FEEDBACK_BUFFER, ...) (or any equivalent functions) will attach the given region of the buffer to the currently bound feedback object.
  • Whether the transform feedback is active and/or paused.
  • The current count of primitives and so forth recorded in the current feedback operation, if it is active.

So feedback objects store the buffer bindings that are being recorded to. This makes it easy to switch between different sets of feedback buffer bindings, rather than having to bind them all each time through.

The feedback object 0 is a default transform feedback object. You can use it just like any other transform feedback object, except that you can't delete it.

Note: The above notation about when it is legal to do what with a buffer bound for transform feedback only applies to buffers attached to the feedback object that is itself currently bound. So if you unbind the current transform feedback object, you can change them again.

Feedback pausing and resuming

Transform feedback objects are good for more than just easily swapping different sets of buffers. You can halt a transform feedback operation temporarily, do some rendering that does not get captured, and then resume feedback operations with it. The feedback objects will properly keep track of where vertex data is to be recorded to.

To temporarily pause feedback operations, call this function:

At this point, the feedback object can be unbound by binding a different one. The current program can be changed and so forth. Feedback operations can be paused indefinitely, and it is legal to read from buffers that are in a paused feedback operation (though you need to unbind the feedback object first).

To resume a paused feedback operation, you must do the following:

Note that the primitive mode will still be the same after resuming.

If you call glEndTransformFeedback on a paused feedback object, it will correctly end the feedback operation for that object. You cannot use glResumeTransformFeedback on a feedback object that has been ended.

Feedback rendering

To render the data captured by a transform feedback operation, you could use a query object to get the number of primitives captured, multiply that times the number of vertices per primitive, and feed that vertex count into a Vertex Rendering function. However, this process requires that the CPU read from the query object. This can provoke Synchronization issues. While Sync Objects could work around some of these, there are better ways.

Feedback objects record the number of vertices that they captured in a feedback operation. Note that a feedback operation is only complete when glEndTransformFeedback is called, so you must first call that before trying to use this data.

Once the feedback operation is complete, the feedback object has a vertex count. This can be used for rendering purposes by binding the feedback object and calling one of these functions:

These functions work like glDrawArrays (or glDrawArraysInstanced). Note that these function do not perform any of the Vertex Specification setup work. They don't automatically take the feedback buffers and bind them for use as vertex data. You must do that. All these functions do is get the vertex count from the feedback operation. Also note that on the first transform feedback pass a non-Transform glDraw* function must be called to write the vertex data to the transform feedback buffer since the transform feedback object does not yet have the vertex count information. Once this is done, glDrawTransform* can be used both during transform feedback and rendering to screen.


When using separate capture, there is a limitation on the total number of variables that can be captured. This is GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS, which will be at least 4. Also, there is a limit to the number of components that any particular variable can contain. This is GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS, which will be at least 4. If these limits are exceeded, a program linking error will result.

When using interleaved capture, the limit is the total number of components that can be captured. This is GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS, which must be at least 64. Double-precision components count as 2. The number of "components" can be computed for in-shader specification by adding all of the buffer's strides and dividing by 4. That must be less than this queried value.

When using advanced interleaving to route different variables to different buffers, the limit on the number of available buffers is GL_MAX_TRANSFORM_FEEDBACK_BUFFERS.

Before OpenGL 4.0 or ARB_transform_feedback3, the limit on the binding index to glBindBufferRange for GL_TRANSFORM_FEEDBACK_BUFFER was GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS, since more than one buffer can only be used with separate outputs. With OpenGL 4.0 or ARB_transform_feedback3, it is GL_MAX_TRANSFORM_FEEDBACK_BUFFERS.