Multisampling is a process for reducing aliasing at the edges of rasterized primitives.
Aliasing is a signal processing term caused by analog-to-digital conversions. When an analog signal is converted to a digital signal, this is done by reading the value of the analog signal at a variety of discrete locations called samples. If not enough samples are used, the digital equivalent of the analog signal can manifest unusual patterns. This happens because there are multiple analog signals that could have produced the digital signal using the same sampling pattern. Because the digital signal is not a unique representation of the analog signal, the effects that this produces are called "aliasing" (one signal acts as an "alias" for another).
In terms of computer graphics, 2D rendering is a form of analog-to-digital conversion. The world of primitives is an analog world described mathematically. Rendering it to a series of discrete pixels creates a digital signal representing that world. If not enough samples are used in creating those discrete pixels, the resulting digital image can exhibit aliasing effects.
Aliasing effects in graphics tend to appear as jagged edges of triangles or textures that look extremely rough in areas that should appear smooth, particularly edge on. When animation gets involved, aliasing becomes extremely noticeable, as aliasing pixels tend to shift from color to color abruptly instead of smoothly.
Combating the visual effects of aliasing is simple, so long as performance is irrelevant. Aliasing is the effect of not using enough samples in analog-to-digital conversions. Thus, all true antialiasing techniques revolve around increasing the number of samples used.
Texture filtering is a form of antialising, applied specifically to aliasing caused by accessing textures. Linear filtering mixes together neighboring samples instead of just using one. Mipmaps of a texture are essentially ways of pre-computing an approximation of accessing a large area of a texture. Each texel in a mipmap represents the average of several texels from the higher mipmap. Anisotropic filtering's ability to access different locations from a texture is also a form of antialiasing, fetching multiple samples to compute a more reasonable value.
But texture filtering only deals with aliasing that results from accessing textures and computations based on such accesses. Aliasing at the edge of primitives is not affected by such filtering.
A more general form of antialiasing is the simplest: render at a higher resolution, then compute the final image by averaging the values from the higher resolution image that correspond to each pixel. This is commonly called "supersampling".
In supersampling, each pixel in the eventual destination image gets its data from multiple pixels in the higher resolution image. The high-res pixels that correspond to a particular destination pixel are called "samples". Given this idea, we can think about a supersampled image as having the same pixel resolution as the destination, but with each pixel storing multiple samples of data.
An image where each pixel stores multiple samples is a "multisampled image".
When we do rasterization with supersampling, the primitive is broken down into multiple samples for each pixel. Each sample is taken at a different location within the pixel's area. So each sample contains all of the rasterization products and everything following them in the pipeline. So for each sample in the multisampled image, we must produce a Fragment, execute a Fragment Shader on it to compute colors, do a bunch of other operations, and write the sample.
As previously stated, each sample within a multisampled pixel comes from a specific location within the area of that pixel. When we attempt to rasterize a primitive for a pixel, we sample the primitive at all of the sample locations within that pixel. If any of those locations fall outside of the primitive's area (because the pixel is at the edge of the primitive), then the samples outside of the area will not generate fragments.
So in every way, supersampling renders to a higher-resolution image; it's just easier to talk about it in terms of adding samples within a pixel.
While this is very simple and easy to implement, it's also obviously expensive. It has all of the downsides of rendering at high resolutions: lots of added rasterization, lots of shader executions, and those multisampled images consume lots of memory and therefore bandwidth. Plus, to compute the final image, we have to take time to average the colors from the multisampled image into its final resolution.
Reasonable use of texture filtering can reduce aliasing within the area of a primitive. As such, supersampling's primary value is in dealing with aliasing at the edges of primitives. But the cost of supersampling affects all parts of rasterization, so the costs tend to outweigh the benefits.
Multisampling is a small modification of the supersampling algorithm that is more focused on edge antialiasing.
In multisampling, everything is set up exactly like supersampling. We still render to multisampled images. The rasterizer still generates (most of) its rasterization data for each sample. Blending, depth testing, and the like still happen per-sample.
The only change with multisampling is that the fragment shader is not executed per-sample. It is executed at a lower frequency than the number of samples per pixel. Exactly how frequently it gets executed depends on the hardware. Some hardware may maintain a 4:1 ratio, such that the FS is executed once for each 4 samples in the multisample rendering.
That brings up an interesting question. If the FS is executed at a lower frequency, how do the other samples get their post-FS values? That is, for the samples that don't correspond to a FS execution, from where do they get their fragment values?
The answer is simple: any FS invocations executed in a multisampled pixel will copy their values to multiple samples. So in our 4:1 example above, if the multisample image contains 4 samples per pixel, then those four samples will get the same fragment values.
In essence, multisampling is supersampling where the sample rate of the fragment shader (and all of its attendant operations) is lower than the number of samples per pixel.
Note however that the depth value is still computed per-sample, not per fragment. This means that, unless the fragment shader replaces the fragment's depth value, the depth value for different samples will have different values. And the sample values in the depth buffers can have different values as well. Depth and stencil tests happen per-sample, and the failure of individual samples prevent the fragment's values from being written to those samples.
There still remain a few details to cover. When discussing supersampling, it was mentioned that samples that are outside of the area of the primitive being rasterized don't get values. We can think of the samples in a fragment being rasterized as having a binary state of being covered or not covered. The set of samples in a fragment area covered by the primitive represents the "coverage" of that fragment.
When a depth or stencil test fails for a sample, this modifies the coverage for the fragment by turning off that particular sample for the fragment.
The concept of coverage is important because it can be accessed and manipulated by a fragment shader.
In OpenGL 4.0 or ARB_sample_shading, the fragment shader has a bitmask input that represents the samples covered by that fragment: gl_SampleMaskIn. It also has an output variable that can be used to set the coverage of the fragment: gl_SampleMask. Note that the output sample mask will be ANDed with the physical coverage of the primitive, so you cannot use this to write to samples that are outside of the space actually covered by the primitive.
The alpha value of color number 0 (index 0) generated by the fragment shader can also be set to manipulate the coverage mask (this feature predates gl_SampleMask and is less capable and more hardware-dependent, so you should just modify the sample mask directly if you can). This is activated by using glEnable(GL_SAMPLE_ALPHA_TO_COVERAGE). When activated, the alpha value will manipulate the coverage mask, such that an alpha of 1.0 represents full coverage and a value of 0.0 represents no coverage. The details of the implementation of the mapping of alpha to coverage are hardware-specific, but the mapping is expected to be linear.
The goal of this tool is to make multisampling act somewhat like alpha blending, with values closer to 1 taking up more samples in an area than values closer to 0.
Much like changing the coverage mask directly, alpha-to-coverage will never make the coverage mask affect samples that aren't physically covered by the primitive area.
Because this functionality uses the alpha value being output, that alpha value is not meaningful anymore. If you don't want to use Blending or Write Masking to avoid changes to the alpha, you can glEnable(GL_SAMPLE_ALPHA_TO_ONE). This causes the alpha for the fragment to be converted to 1.0 after the point where it is used to modify the coverage mask.
There is one major caveat to executing the fragment shader at a rate lower than the number of samples: the location of the FS being executed within the pixel. In multisampling, because the FS invocation will be broadcast to multiple samples, the location of that FS invocation within the pixel will not always match the location of the samples that receive that data.
But more important, it is entirely possible that the sample location used by the FS invocation could be outside of the primitive's area, if the pixel is at the edge of the primitive. This is fine for many FS computations, as interpolation of values past the edge of a primitive can still work mathematically speaking.
What may not be fine is what you *do* with that interpolated value. A particular FS execution may not function correctly in the event that the interpolated values represent a location outside of the primitive area. For example, you might apply a square root to an interpolated value, expecting that the value will never be negative. But interpolation outside of the primitive may force it to be negative.
This is what the centroid interpolation qualifier fixes. Any fragment shader inputs qualified by this identifier will be interpolated within the area of the primitive.
Once a multisample image is built and rendered to, if it is to be viewed, it must be reduced down to a single-sample per-pixel. The process of downsampling a multisample image is called the "multisample resolve". In OpenGL, to perform a multisample resolve, you use a blit operation from a multisampled framebuffer to a single-sampled one. Note that such a resolve blit operation cannot also rescale the image or change its format.
If you create a multisampled Default Framebuffer, the back buffer is considered multisampled, but the front buffer is not. So swapping the buffers is equivalent to doing a multisample resolve operation.
The whole idea of multisampling is that the FS should execute at a lower frequency than the sample count. However, it is sometimes useful to force the fragment shader to execute at the sample rate. For example, if you are doing post-processing effects on a multisample image before resolving it, then you need to execute those effects on each sample within each pixel. And the results could be very different depending on overlapping primitives and the like.
This is called "per-sample shading", and it effectively transforms multisampling into supersampling.
Per-sample shading is activated for a shader if you use the sample interpolation qualifier on any FS input value, or if you access gl_SamplePosition or gl_SampleID in the FS.