Tessellation


OpenGL Rendering Pipeline

Tessellation is the Vertex Processing stage in the OpenGL rendering pipeline where patches of vertex data are subdivided into smaller Primitives. This process is governed by two shader stages and a fixedfunction stage.
The tessellation process is divided into three stages which form an optional part of Vertex Processing in the rendering pipeline. Two of the stages are programmable; between them is a fixed function stage. They are described below, in the order they are processed.
Generally, the process of tessellation involves subdividing a patch of some type, then computing new vertex values (position, color, texture coordinates, etc) for each of the vertices generated by this process. Each stage of the tessellation pipeline performs part of this process.
The Tessellation Control Shader (TCS) determines how much tessellation to do (it can also adjust the actual patch data, as well as feed additional patch data to later stages). Therefore, the TCS is primarily responsible for ensuring continuity across patches. So if you have two adjacent patches that need to have different levels of tessellation, the TCS invocations for the different patches need to use their tessellation controls to ensure that the shared edge(s) between the patches use the same level of tessellation. Without this protection, gaps and breaks in what are supposed to be contiguous patches can occur.
The TCS is optional; default tessellation values can be used if no TCS is provided.
The tessellation primitive generator takes the input patch and subdivides it based on values computed by the TCS or provided as defaults.
The Tessellation Evaluation Shader (TES) takes the tessellated patch and computes the vertex values for each generated vertex.
Patches
Tessellation stages operate on patches, a primitive type denoted by the constant GL_PATCHES. A patch primitive is a generalpurpose primitive, where every n vertices is a new patch primitive. The number of vertices per patch can be defined on the applicationlevel using:
void glPatchParameteri(GLenum pname, GLint value);
with GL_PATCH_VERTICES as target and a value which has is on the halfopen range [1, GL_MAX_PATCH_VERTICES). The maximum number of patch vertices is implementationdependent, but will never be less than 32.
Patch primitives are always a sequence of individual patches; there is no such thing as a "patch strip" or "patch loop" or such. So for a given vertex stream, every group of value number of vertices will be a separate patch. If you need to do something like triangle strips, you should use Indexed Rendering to get similar behavior, though it will not reduce the number of vertices in the index list. Fortunately, the PostTransform Cache should deal with any performance impact having more indices.
Tessellation control shader
The first step of tessellation is the optional invocation of a tessellation control shader (TCS). The TCS has two jobs:
 Determine the amount of tessellation that a primitive should have.
 Perform any special transformations on the input patch data.
The TCS can change the size of a patch, adding more vertices perpatch or providing fewer. However, a TCS cannot discard a patch (directly; it can do so indirectly), nor can it write multiple patches. Therefore, for each patch provided by the application, one patch will be provided to the next tessellation stage.
The TCS is optional. If no TCS is active in the current program or program pipeline, then the patch data is passed directly from the Vertex Shader invocations to the tessellation primitive generation step. The amount of tessellation done in this case is taken from default values set into the context. These are defined by the following function:
void glPatchParameterfv(GLenum pname, const GLfloat *values);
When pname is GL_PATCH_DEFAULT_OUTER_LEVEL, values is a 4element array of floats defining the four outer tessellation levels. When pname is GL_PATCH_DEFAULT_INNER_LEVEL, values is a 2element array of floats defining the two inner tessellation levels.
These default values correspond to the TCS perpatch output variables gl_TessLevelOuter[4] and gl_TessLevelInner[2].
Tessellation primitive generation
Primitive generation is a fixedfunction stage responsible for creating a set of new primitives from the input patch. This stage is only executed if a tessellation evaluation shader (TES) is active in the current program or program pipeline. Primitive generation is affected by the following factors:
 The tessellation levels, provided either by the TCS or the default values, as stated above.
 The spacing of the tessellated vertices, as defined by the subsequent TES stage. It may be equal_spacing, fractional_even_spacing, or fractional_odd_spacing.
 The input primitive type defined by the subsequent TES which may be one of triangles, quads or isolines. The TES can also force the generation of the tessellation as a series of points rather than triangles or lines by providing the point_mode primitive.
 The primitive generation order defined by the subsequent TES, which may be cw or ccw. This, coupled with the position of the generated primitives, is used to determine the primitive's Winding Order.
Abstract patch
Notice that the primitive generation is not affected by the userdefined outputs of the TCS (or vertex shader if no TCS is active), the TCS's output patch size, or any perpatch TCS outputs besides the tessellation levels. The primitive generation part of the tessellation stage is completely blind to the actual vertex coordinates and other patch data.
The purpose of the primitive generation system is to determine how many vertices to generate, in which order to generate them, and what kind of primitives to build out of them. The actual pervertex data for these vertices, such as position, color, etc, is to be generated by the TES, based on information provided by the primitive generator.
Because of this dichotomy, the primitive generator operates on what could be considered an "abstract patch". It doesn't look at the patch output from the TCS; it thinks only in terms of tessellating an abstract quad, triangle, or "isoline" block.
Depending on the abstract patch type, the primitive generator evaluates a different number of tessellation levels and applies different tessellation algorithms. Each generated vertex has a normalized position (i.e. the coordinates are on the range [0, 1]) within the abstract patch. This position has two or three components, depending on the type of the patch. The coordinates are provided to the TES via the builtin in vec3 gl_TessCoord input.
Tessellation levels
The amount of tessellation that is done over the abstract patch type is defined by inner and outer tessellation levels. These, as previously stated, are provided either by the TCS or by context parameters specified via glPatchParameter. They are a 4vector of floats defining the "outer tessellation levels" and a 2vector of floats defining the "inner tessellation levels."
The specific interpretation depends on the abstract patch type being used, but the general idea is this. In most cases, each tessellation level defines how many segments an edge is tessellated into; so a tessellation level of 4 means that an edge will become 4 edges (2 vertices become 5). The "outer" tessellation levels define the tessellation for the outer edges of the primitive. This makes it possible for two or more patches to properly connect, while still having different tessellation levels within the patch. The inner tessellation levels are for the number of tessellations within the abstract patch.
Not all abstract patches use the same number of values in the outer/inner tessellation levels data. For example, triangles only uses one inner level and 3 outer levels. The rest are ignored.
The patch can be discarded if any outer tessellation level is 0 or less, but only for tessellation levels that the abstract patch actually uses. The patch can also be discarded if one of these values is a floatingpoint NaN. A patch that is discarded does not get tessellated, and no TES is invoked for it. It is simply swallowed by the system as though it never were.
This allows a TCS to effectively cull patches by passing 0 for a relevant outer tessellation level.
The tessellation levels specified in this way are not directly used. They go through a clamping process to generate the effective tessellation levels that are used to tessellate the primitive. This process depends on the TES's spacing parameter.
In the below discussion, max is the maximum allowed tessellation level, as defined by the GL_MAX_TESS_GEN_LEVEL. It must be at least 64, so you have some room to play with.
The spacing affects the effective tessellation level as follows:
 equal_spacing
 Each tessellation level is individually clamped to the closed range [1, max]. Then it is rounded up to the nearest integer to give the effective tessellation level.
 fractional_even_spacing
 Each tessellation level is individually clamped to the closed range [2, max]. Then it is rounded up to the nearest even integer to give the effective tessellation level.
 fractional_odd_spacing
 Each tessellation level is individually clamped to the closed range [1, max  1]. Then it is rounded up to the nearest odd integer to give the effective tessellation level.
Edge tessellation spacing
At various points in the forthcoming discussion about tessellating the abstract patch, there will be statements that say to tessellate an edge of some primitive. This means to subdivide it into a series of segments according to a particular tessellation level. The total length of these segments is the length of the original segment. However, the length of individual segments relative to one another depends on the spacing parameter specified in the TES and the tessellation level.
If the spacing is set to equal_spacing, then all tessellated segments on an edge will have equal length. Since equal_spacing rounds tessellation levels to the next integer, this means that edges will "pop" as tessellation levels go from one integer to the next.
The other two spacing schemes are intended to provide smooth(er) behavior as the tessellation levels change. These are useful for cases where the amount of tessellation is based on the area being viewed from the camera.
We need to define two values first:
 n shall be the effective tessellation level computed above for the edge in question.
 f shall be the value computed before the rounding step above. Thus, it is a potentially fractional value.
If n is 1, then no tessellation occurs on this edge. If n is exactly 2, then there will be two segments of equal length.
Otherwise, the edge will be divided into two sets of segments. One set will have n  2 segments in it, all of which will have equal length. The other set will have 2 segments, who shall have lengths equal to each other, but not necessarily the segments in set one. The 2 segments will be shorter than the others, so let us call this group the "short segments".
The length of the short segments, relative to the others, is based on (n  f). When this value is exactly 0, then the short segments will be the same length as the others. As this value approaches 2, the short segments' lengths will approach 0.0.
Despite the diagram above, the specification does not require the short segments to be in any particular location relative to the others. The specification only provides the following requirements:
 That the short segments be placed symmetrically across the tessellated edge.
 That the placement of the short segments will be invariant with f, for *any* segment being tessellated.
Note that there is no guarantee that the placement of the short segments will not shift as f changes.
Primitive generation order
When rendering without tessellation, the order of primitives relative to one another is well defined. Rendering requires setting up a vertex stream. This defines an ordered sequence of vertices. The Primitive specified at rendering time determines exactly how the stream is broken down into base primitives. And thus, each primitive is ordered based on the vertices that generate it.
Tessellation makes things less well defined. While the order of vertices within a primitive is well defined (determined by the TES's input layout settings cw or ccw), the order of primitives generated relative to one another is not. Implementations define this ordering, so you cannot rely on any particular order.
Then again, most ways you could even tell rely on incoherent memory accesses...
Tessellating primitives
Triangles
The abstract patch for triangle tessellation is a triangle, naturally. Only the first three outer tessellation levels are used, and only the first inner tessellation level is used. The outer tessellation levels apply to the edges shown in the diagram. Exactly how the inner tessellation level works is less intuitive than it seems.
Each vertex generated and sent to the TES will be given Barycentric coordinates as the gl_TessCoord input. This coordinate defines where this vertex is located within the abstract triangle patch. The barycentric coordinates for the 3 vertices of the abstract triangle patch are shown in the diagram. All other vertices will be specified relative to these.
To perform linear interpolation between 3 values, at each of the three vertices, you would do this:
vec3 accum = vec3(0.0f)
accum += gl_TessCoord[0] * value[0]
accum += gl_TessCoord[1] * value[1]
accum += gl_TessCoord[2] * value[2]
Where value is an array of (in this case, vec3) values that correspond to the three vertices depicted in the diagram.
The algorithm for tessellation of triangles is based on tessellating the edges of the triangle, then building vertices and triangles from them. This makes the behavior of the inner tessellation level a bit unintuitive, since it does not directly correspond to the number of inner triangles.
Step one removes the most degenerate case. If all of the effective outer levels (as computed above) levels are exactly 1.0, and the effective inner level is also 1.0, then nothing is tessellated, and the TES will get 3 vertices and one triangle. And thus, none of the later steps take place.
Next, if the effective inner level is 1.0 (and therefore at least one of the outer levels is > 1.0, otherwise we'd have hit the above condition), then the effective inner level is recomputed as though the user had specified 1.0 + e for the inner level, where e is a number infinitely close to zero. So 1.0 + e is a value just slightly above 1.0.
As a practical matter, this means that if any outer edge tessellation happens, the effective inner tessellation will be no less than 2 or 3, depending on the spacing. This ensures that, if there is any tessellating taking place, then there will always be at least one vertex inside of the triangle.
The next step in tessellation is to subdivide the edges of the abstract triangle patch based on the effective inner level. Yes, the algorithm begins by ignoring the outer tessellation levels; it applies the inner tessellation to all three outer edges:
From here, we build a series of concentric "rings" of inner triangles. At each corner of the outer triangle, the two neighboring subdivided vertices are taken. A vertex is computed from these two by finding the intersection of two perpendicular lines from these vertices. Perpendicular lines from the remaining vertices to the inner triangle's edges define where each edge of the new triangle is subdivided.
This process is repeated, using the new ring triangle to generate the next inner ring, until one of two end conditions is met. If the triangle ring has no subdivided edges (only 3 vertices), then the process stops.
If the triangle ring has exactly 2 subdivided edges (only 6 total vertices), then the "ring" it generates is not a triangle at all. It is a single vertex:
By this algorithm, the number of inner triangle rings is half the effective inner tessellation level, rounded down. Note that if the inner level is even, the innermost ring will be a single vertex.
After producing these inner rings, the tessellation for the main outer triangle is discarded entirely. It is then retessellated in accord with the three effective outer tessellation levels.
With all the vertices generated, triangles must now be generated. The specification outlines a particular algorithm for doing so, but it leaves many of the details to the implementation. The specification guarantees:
 That the area of the triangle, in (u,v,w) space, will be completely covered by the generated triangulation.
 That no part of the area of the triangle will be covered more than once by the generated triangulation.
 That there will be edges between the neighboring vertices of each "ring" of triangles. That is, the dark lines in the above diagram are guaranteed by the specification to be there.
 If a vertex has corresponding vertices on the ring that immediately preceded it, then there will be an edge between that vertex and its corresponding ones (corners of ring triangles have 2 corresponding vertices). Similarly, if a vertex has a corresponding vertex on the ring within it, there will be an edge between them.
The rest of the edges are implementationdefined.
Quads
The abstract patch of a quad is a square, naturally. All 4 outer and 2 inner tessellation levels are used. The outer tessellation levels apply to the four edges shown in the diagram. The two inner tessellation levels apply to the edges depicted in the diagram, but they apply to the edges across from those as well. So inner level 0 applies to both the bottom and top edges. We will explain exactly how this works below.
Each vertex generated and sent to the TES will be provided a normalized 2D coordinate as the gl_TessCoord input, representing the location of that vertex within the abstract patch. The third component of gl_TessCoord will be 0.0.
Quad tessellation works rather like triangle tessellation, only with 4 sides. The tessellation is based on subdividing the edges and building vertices based on the subdivided edges.
Step one removes the most degenerate case. If all of the effective outer levels (as computed above) levels are exactly 1.0, and the effective inner levels are also 1.0, then nothing is tessellated. The TES will get 2 triangles and four vertices (though it may be invoked as much as 6 times). And thus, none of the later steps take place.
Next, if an effective inner level is 1.0 (and therefore at least one of the outer levels is > 1.0, otherwise we'd have hit the above condition), then that effective inner level is recomputed as though the user had specified 1.0 + e for the level, where e is a number infinitely close to zero. So 1.0 + e is a value just slightly above 1.0. This applies to both inner levels.
This ensures that, if there is any tessellating taking place along the outer edge, then there will always be at least one vertex inside of the quad.
The next step in tessellation is to subdivide the edges of the abstract quad patch based on the effective inner levels. Yes, the algorithm begins by ignoring the outer tessellation levels; it applies the inner tessellation to all four outer edges. Inner level 0 is applied to both horizontal edges, while level 1 is applied to the vertical edges.
From here, we build a series of concentric "rings" of inner quads. At each corner of the outer quad, the two neighboring subdivided vertices are taken. A vertex is computed from these two by finding the intersection of two perpendicular lines from these vertices. Perpendicular lines from the remaining vertices to the inner quad's edges define where each edge of the new quad is subdivided.
This process is repeated, using the new ring quad to generate the next inner ring, until one of two end conditions is met. If one or both of the inner rings has only one segment, then that ring is the final ring.
If one or both of of the inner rings has exactly two segments, then the final ring is degenerate. That is, instead of being a quad, it is either a line (using the vertices from the long side to subdivide it) or a point (if both inner levels are identical):
After producing these inner rings, the subdivision for the outer quad is discarded entirely. Each outer edge is then resubdivided in accord with the four effective outer tessellation levels.
With all the vertices generated, the area of the quad must now be broken down into triangles. The specification outlines a particular algorithm for doing so, but it leaves many of the details to the implementation. The specification guarantees:
 That the area of the quad, in (u,v) space, will be completely covered by the generated triangulation.
 That no part of the area of the quad will be covered more than once by the generated triangulation.
 That there will be edges between the neighboring vertices of each "ring" of quads. That is, the dark lines in the above diagram are guaranteed by the specification to be there.
 If a vertex has corresponding vertices on the ring that immediately preceded it, then there will be an edge between that vertex and its corresponding ones (corners of ring quads have 2 corresponding vertices). Similarly, if a vertex has a corresponding vertex on the ring within it, there will be an edge between them.
The rest of the edges are implementationdefined.
Isolines
The space of the abstract patch of isolines is a square, but the abstract patch itself is a series of horizontal lines. Only the first two outer tessellation levels are used. The inner tessellation levels are ignored. The two outer tessellation levels apply to the edges as shown in the diagram.
Each vertex generated and sent to the TES will be provided a normalized 2D coordinate as the gl_TessCoord input. gl_TessCoord.x represents the distance along one of the lines, while gl_TessCoord.y specifies which line the vertex is for. The third component of gl_TessCoord will be 0.0.
The isolines patch is tessellated as follows. The effective tessellation level for outer level 0 is always computed as if you had specified equal_spacing. The vertical edges of the square are tessellated based on this. Each vertex except the top one represents a single line.
The second effective tessellation level specifies how many segments the lines are divided into. It uses the standard rules for spacing to define how many segments the lines all get.
If you want to generate a single isoline, you should pass 1 for gl_TessLevelOuter[0]. However, if you need more tessellated vertices than can be done in a single outer level value, the TES is perfectly capable of stitching multiple separate lines together. The tessellator guarantees that the first and last points for each line will be exactly 0.0 and 1.0.
Remember: the abstract patch is abstract; the TES can position the vertices wherever it wants.
Examples
Isolines with tess levels (4,3)  Isolines with tess levels (6,2) 
Tessellation evaluation shader
The Tessellation Evaluation Shader (TES) is responsible for taking the abstract coordinates generated by the primitive generator, along with the outputs from the TCS (or vertex shader, if no TCS is used), and using them to compute the actual values for the vertices. This is where you code the algorithm that you actually use to compute the new positions/normal/texcoords/etc. The TES is a mandatory part of tessellation; if one is not present, then tessellation doesn't happen.
The TES is rather like a vertex shader, in that each invocation operates on a distinct vertex within the tessellated patch (though, as with a vertex shader, the system may call the TES more than once for the same vertex, so it should be deterministic). Also, the TES cannot cull vertices.
Patch interface and continuity
Tessellating patches so that there are no gaps between primitives generated by adjacent patches is very important. It also is not something that simply happens. Because OpenGL is blind to how the actual tessellation works (the primitive generator only deals with the abstract patch), there are specific things that you must do to ensure gapless tessellation between patches.
Rule 1: The TES implementation of interpolation for the vertices regarding patch edges must receive binaryidentical input values for those edges. The TES defines what the various patch data actually means, as it implements the interpolation. So your TCS, or pervertex patch data in general, must make sure that the data the TES uses to generate those edge positions is the same for the shared edge on the two (or more) patches.
Fortunately, OpenGL's invariance rules means that shader computations that perform the exact same match will produce the exact same results. So usually, this only means making sure that the inputs to the TCS which contribute to interpolation along the edges get the same values from the vertex arrays, as processed through the Vertex Shader. However, different methods of tessellation and interpolation, as implemented by your TES, may have specific needs.
Rule 2: The outer tessellation level for the shared edges must be binaryidentical. While you could simply ensure that the "effective tessellation level" of the edges is identical, the specification says that tessellation invariance is only guaranteed if the given levels are identical. So it's best to be safe by ensuring that the two edges that are supposed to join together get the exact same tessellation level.
The above two rules will ensure that the TES will receive binary identical values for the patch data used to compute shared edge vertices, as well as exactly matching gl_TessCoord values for those vertices along the shared edge. Therefore, the only other rule that must be followed is:
Rule 3: The TES must calculate edge vertices based only on the binary identical data using the exact same math between the shader invocations along the shared edges. OpenGL's standard shader invariance rules work so long as the same codepath is used for both invocations.
The complete invariance rules for tessellation, from OpenGL 4.5, Appendix A.4, page 632, explain how OpenGL provides those guarantees.
Despite the seeming complexity of these rules, they're really not hard to follow. Rule #1 can be hard to satisfy when using unusual tessellation algorithms, but for bezier or NURBS surfaces, so long as the control points are continuous in accord with those surface rules, these will be satisfied.
The most difficult one to satisfy generally would be rule #2, as it complicates the computations for outer tessellation levels. It can't simply be based on the distance to the overall patch; each patch has to know what the neighboring patches used so that they can all match. Essentially, you would have to compute the outer tessellation levels for a patch based on the distance for each edge to the camera. Or some similar algorithm that would be invariant across patches.
Rule 3 simply requires your shader to be reasonably sane, that it doesn't contain conditional logic that causes different patches to be generated using different math. Though if you need continuity between separate programs, that becomes a bit more challenging.
Examples
 OpenGL 4 Tessellation Shader Tutorial: A simple OpenGL tutorial with source code for using Tessellation Shaders. Based on VC++ 2010.
 Drawing curves using Tessellation Shaders: A tutorial for the rasterization of parametric curves using Tessellation Shaders with source code.