General OpenGL: Transformations
I can't get transformations to work. Where can I learn more about matrices?
A thorough explanation of basic matrix math and linear algebra is beyond the scope of this FAQ. These concepts are taught in high school math classes in the United States.
Delphi code for performing basic vector, matrix, and quaternion operations can be found here.
Are OpenGL matrices column-major or row-major?
For programming purposes, OpenGL matrices are 16-value arrays with base vectors laid out contiguously in memory. The translation components occupy the 13th, 14th, and 15th elements of the 16-element matrix.
Column-major versus row-major is purely a notational convention. Note that post-multiplying with column-major matrices produces the same result as pre-multiplying with row-major matrices. The OpenGL Specification and the OpenGL Reference Manual both use column-major notation. You can use any notation, as long as it's clearly stated.
Sadly, the use of column-major format in the spec and blue book has resulted in endless confusion in the OpenGL programming community. Column-major notation suggests that matrices are not laid out in memory as a programmer would expect.
A summary of Usenet postings on the subject can be found here.
What are OpenGL coordinate units?
The short answer: Anything you want them to be.
Depending on the contents of your geometry database, it may be convenient for your application to treat one OpenGL coordinate unit as being equal to one millimeter or one parsec or anything in between (or larger or smaller).
OpenGL also lets you specify your geometry with coordinates of differing values. For example, you may find it convenient to model an airplane's controls in centimeters, its fuselage in meters, and a world to fly around in kilometers. OpenGL's ModelView matrix can then scale these different coordinate systems into the same eye coordinate space.
It's the application's responsibility to ensure that the Projection and ModelView matrices are constructed to provide an image that keeps the viewer at an appropriate distance, with an appropriate field of view, and keeps the zNear and zFar clipping planes at an appropriate range. An application that displays molecules in micron scale, for example, would probably not want to place the viewer at a distance of 10 feet with a 60 degree field of view.
How are coordinates transformed? What are the different coordinate spaces?
Object Coordinates are transformed by the ModelView matrix to produce Eye Coordinates.
Eye Coordinates are transformed by the Projection matrix to produce Clip Coordinates.
Clip Coordinate X, Y, and Z are divided by Clip Coordinate W to produce Normalized Device Coordinates.
Normalized Device Coordinates are scaled and translated by the viewport parameters to produce Window Coordinates.
Object coordinates are the raw coordinates you submit to OpenGL with a call to glVertex*() or glVertexPointer(). They represent the coordinates of your object or other geometry you want to render.
Many programmers use a World Coordinate system. Objects are often modeled in one coordinate system, then scaled, translated, and rotated into the world you're constructing. World Coordinates result from transforming Object Coordinates by the modelling transforms stored in the ModelView matrix. However, OpenGL has no concept of World Coordinates. World Coordinates are purely an application construct.
Eye Coordinates result from transforming Object Coordinates by the ModelView matrix. The ModelView matrix contains both modelling and viewing transformations that place the viewer at the origin with the view direction aligned with the negative Z axis.
Clip Coordinates result from transforming Eye Coordinates by the Projection matrix. Clip Coordinate space ranges from -Wc to Wc in all three axes, where Wc is the Clip Coordinate W value. OpenGL clips all coordinates outside this range.
Perspective division performed on the Clip Coordinates produces Normalized Device Coordinates, ranging from -1 to 1 in all three axes.
Window Coordinates result from scaling and translating Normalized Device Coordinates by the viewport. The parameters to glViewport() and glDepthRange() control this transformation. With the viewport, you can map the Normalized Device Coordinate cube to any location in your window and depth buffer.
For more information, see the OpenGL Specification, Figure 2.6.
How do I transform only one object in my scene or give each object its own transform?
OpenGL provides matrix stacks specifically for this purpose. In this case, use the ModelView matrix stack.
A typical OpenGL application first sets the matrix mode with a call to glMatrixMode(GL_MODELVIEW) and loads a viewing transform, perhaps with a call to gluLookAt(). More information is available on gluLookAt().
Then the code renders each object in the scene with its own transformation by wrapping the rendering with calls to glPushMatrix() and glPopMatrix(). For example:
glPushMatrix(); glRotatef(90., 1., 0., 0.); gluCylinder(quad,1,1,2,36,12); glPopMatrix();
The above code renders a cylinder rotated 90 degrees around the X-axis. The ModelView matrix is restored to its previous value after the glPopMatrix() call. Similar call sequences can render subsequent objects in the scene.
How do I draw 2D controls over my 3D rendering?
The basic strategy is to set up a 2D projection for drawing controls. You can do this either on top of your 3D rendering or in overlay planes. If you do so on top of a 3D rendering, you'll need to redraw the controls at the end of every frame (immediately before swapping buffers). If you draw into the overlay planes, you only need to redraw the controls if you're updating them.
To set up a 2D projection, you need to change the Projection matrix. Normally, it's convenient to set up the projection so one world coordinate unit is equal to one screen pixel, as follows:
glMatrixMode (GL_PROJECTION); glLoadIdentity (); gluOrtho2D (0, windowWidth, 0, windowHeight);
gluOrtho2D() sets up a Z range of -1 to 1, so you need to use one of the glVertex2*() functions to ensure your geometry isn't clipped by the zNear or zFar clipping planes.
Normally, the ModelView matrix is set to the identity when drawing 2D controls, though you may find it convenient to do otherwise (for example, you can draw repeated controls with interleaved translation matrices).
If exact pixelization is required, you might want to put a small translation in the ModelView matrix, as shown below:
glMatrixMode (GL_MODELVIEW); glLoadIdentity (); glTranslatef (0.375, 0.375, 0.);
If you're drawing on top of a 3D-depth buffered image, you'll need to somehow disable depth testing while drawing your 2D geometry. You can do this by calling glDisable(GL_DEPTH_TEST) or glDepthFunc (GL_ALWAYS). Depending on your application, you might also simply clear the depth buffer before starting the 2D rendering. Finally, drawing all 2D geometry with a minimum Z coordinate is also a solution.
After the 2D projection is established as above, you can render normal OpenGL primitives to the screen, specifying their coordinates with XY pixel addresses (using OpenGL-centric screen coordinates, with (0,0) in the lower left).
How do I bypass OpenGL matrix transformations and send 2D coordinates directly for rasterization?
There isn't a mode switch to disable OpenGL matrix transformations. However, if you set either or both matrices to the identity with a glLoadIdentity() call, typical OpenGL implementations are intelligent enough to know that an identity transformation is a no-op and will act accordingly.
More detailed information on using OpenGL as a rasterization-only API is in the OpenGL Game Developer’s FAQ.
What are the pros and cons of using absolute versus relative coordinates?
Some OpenGL applications may need to render the same object in multiple locations in a single scene. OpenGL lets you do this two ways:
- Use “absolute coordinates". Maintain multiple copies of each object, each with its own unique set of vertices. You don't need to change the ModelView matrix to render the object at the desired location.
- Use “relative coordinates". Keep only one copy of the object, and render it multiple times by pushing the ModelView matrix stack, setting the desired transform, sending the geometry, and popping the stack. Repeat these steps for each object.
In general, frequent changes to state, such as to the ModelView matrix, can negatively impact your application’s performance. OpenGL can process your geometry faster if you don't wrap each individual primitive in a lot of changes to the ModelView matrix.
However, sometimes you need to weigh this against the memory savings of replicating geometry. Let's say you define a doorknob with high approximation, such as 200 or 300 triangles, and you're modeling a house with 50 doors in it, all of which have the same doorknob. It's probably preferable to use a single doorknob display list, with multiple unique transform matrices, rather than use absolute coordinates with 10-15K triangles in memory.
As with many computing issues, it's a trade-off between processing time and memory that you'll need to make on a case-by-case basis.
How can I draw more than one view of the same scene?
You can draw two views into the same window by using the glViewport() call. Set glViewport() to the area that you want the first view, set your scene’s view, and render. Then set glViewport() to the area for the second view, again set your scene’s view, and render.
You need to be aware that some operations don't pay attention to the glViewport, such as SwapBuffers and glClear(). SwapBuffers always swaps the entire window. However, you can restrain glClear() to a rectangular window by using the scissor rectangle.
Your application might only allow different views in separate windows. If so, you need to perform a MakeCurrent operation between the two renderings. If the two windows share a context, you need to change the scene’s view as described above. This might not be necessary if your application uses separate contexts for each window.
How do I transform my objects around a fixed coordinate system rather than the object's local coordinate system?
If you rotate an object around its Y-axis, you'll find that the X- and Z-axes rotate with the object. A subsequent rotation around one of these axes rotates around the newly transformed axis and not the original axis. It's often desirable to perform transformations in a fixed coordinate system rather than the object’s local coordinate system.
The OpenGL Game Developer’s FAQ contains information on using quaternions to store rotations, which may be useful in solving this problem.
The root cause of the problem is that OpenGL matrix operations postmultiply onto the matrix stack, thus causing transformations to occur in object space. To affect screen space transformations, you need to premultiply. OpenGL doesn't provide a mode switch for the order of matrix multiplication, so you need to premultiply by hand. An application might implement this by retrieving the current matrix after each frame. The application multiplies new transformations for the next frame on top of an identity matrix and multiplies the accumulated current transformations (from the last frame) onto those transformations using glMultMatrix().
You need to be aware that retrieving the ModelView matrix once per frame might have a detrimental impact on your application’s performance. However, you need to benchmark this operation, because the performance will vary from one implementation to the next.
What are the pros and cons of using glFrustum() versus gluPerspective()? Why would I want to use one over the other?
glFrustum() and gluPerspective() both produce perspective projection matrices that you can use to transform from eye coordinate space to clip coordinate space. The primary difference between the two is that glFrustum() is more general and allows off-axis projections, while gluPerspective() only produces symmetrical (on-axis) projections. Indeed, you can use glFrustum() to implement gluPerspective(). However, aside from the layering of function calls that is a natural part of the GLU interface, there is no performance advantage to using matrices generated by glFrustum() over gluPerspective().
Since glFrustum() is more general than gluPerspective(), you can use it in cases when gluPerspective() can't be used. Some examples include projection shadows, tiled renderings, and stereo views.
Tiled rendering uses multiple off-axis projections to render different sections of a scene. The results are assembled into one large image array to produce the final image. This is often necessary when the desired dimensions of the final rendering exceed the OpenGL implementation's maximum viewport size.
In a stereo view, two renderings of the same scene are done with the view location slightly shifted. Since the view axis is right between the “eyes”, each view must use a slightly off-axis projection to either side to achieve correct visual results.
How can I make a call to glFrustum() that matches my call to gluPerspective()?
The field of view (fov) of your glFrustum() call is:
fov*0.5 = arctan ((top-bottom)*0.5 / near)
Since bottom == -top for the symmetrical projection that gluPerspective() produces, then:
top = tan(fov*0.5) * near bottom = -top
Note: fov must be in radians for the above formulae to work with the C math library. If you have comnputer your fov in degrees (as in the call to gluPerspective()), then calculate top as follows:
top = tan(fov*3.14159/360.0) * near
The left and right parameters are simply functions of the top, bottom, and aspect:
left = aspect * bottom right = aspect * top
The OpenGL Reference Manual (where do I get this?) shows the matrices produced by both functions.
How do I draw a full-screen quad?
This question usually means, "How do I draw a quad that fills the entire OpenGL viewport?" There are many ways to do this.
The most straightforward method is to set the desired color, set both the Projection and ModelView matrices to the identity, and call glRectf() or draw an equivalent GL_QUADS primitive. Your rectangle or quad's Z value should be in the range of –1.0 to 1.0, with –1.0 mapping to the zNear clipping plane, and 1.0 to the zFar clipping plane.
As an example, here's how to draw a full-screen quad at the zNear clipping plane:
glMatrixMode (GL_MODELVIEW); glPushMatrix (); glLoadIdentity (); glMatrixMode (GL_PROJECTION); glPushMatrix (); glLoadIdentity (); glBegin (GL_QUADS); glVertex3i (-1, -1, -1); glVertex3i (1, -1, -1); glVertex3i (1, 1, -1); glVertex3i (-1, 1, -1); glEnd (); glPopMatrix (); glMatrixMode (GL_MODELVIEW); glPopMatrix ();
Your application might want the quad to have a maximum Z value, in which case 1 should be used for the Z value instead of -1.
When painting a full-screen quad, it might be useful to mask off some buffers so that only specified buffers are touched. For example, you might mask off the color buffer and set the depth function to GL_ALWAYS, so only the depth buffer is painted. Also, you can set masks to allow the stencil buffer to be set or any combination of buffers.
How can I find the screen coordinates for a given object-space coordinate?
You can use the GLU library gluProject() utility routine if you only need to find it for a few vertices. For a large number of coordinates, it can be more efficient to use the Feedback mechanism.
To use gluProject(), you'll need to provide the ModelView matrix, projection matrix, viewport, and input object space coordinates. Screen space coordinates are returned for X, Y, and Z, with Z being normalized (0 <= Z <= 1).
How can I find the object-space coordinates for a pixel on the screen?
The GLU library provides the gluUnProject() function for this purpose.
You'll need to read the depth buffer to obtain the input screen coordinate Z value at the X,Y location of interest. This can be coded as follows:
GLdouble z; glReadPixels (x, y, 1, 1, GL_DEPTH_COMPONENT, GL_DOUBLE, &z);
Note that x and y are OpenGL-centric with (0,0) in the lower-left corner.
You'll need to provide the screen space X, Y, and Z values as input to gluUnProject() with the ModelView matrix, Projection matrix, and viewport that were current at the time the specific pixel of interest was rendered.
How do I find the coordinates of a vertex transformed only by the ModelView matrix?
It's often useful to obtain the eye coordinate space value of a vertex (i.e., the object space vertex transformed by the ModelView matrix). You can obtain this by retrieving the current ModelView matrix and performing simple vector / matrix multiplication.
How do I calculate the object-space distance from the viewer to a given point?
Transform the point into eye-coordinate space by multiplying it by the ModelView matrix. Then simply calculate its distance from the origin. (If this doesn't work, you may have incorrectly placed the view transform on the Projection matrix stack.)
To retrieve the current Modelview matrix:
GLfloat m; glGetFloatv (GL_MODELVIEW_MATRIX, m);
As with any OpenGL call, you must have a context current with a window or drawable in order for glGet*() function calls to work.
How do I keep my aspect ratio correct after a window resize?
It depends on how you are setting your projection matrix. In any case, you'll need to know the new dimensions (width and height) of your window. How to obtain these depends on which platform you're using. In GLUT, for example, the dimensions are passed as parameters to the reshape function callback.
The following assumes you're maintaining a viewport that's the same size as your window. If you are not, substitute viewportWidth and viewportHeight for windowWidth and windowHeight.
If you're using gluPerspective() to set your Projection matrix, the second parameter controls the aspect ratio. When your program catches a window resize, you'll need to change your Projection matrix as follows:
glMatrixMode(GL_PROJECTION); glLoadIdentity(); gluPerspective(fov, (float)windowWidth/(float)windowHeight, zNear, zFar);
If you're using glFrustum(), the aspect ratio varies with the width of the view volume to the height of the view volume. You might maintain a 1:1 aspect ratio with the following window resize code:
float cx, halfWidth = windowWidth*0.5f; float aspect = (float)windowWidth/(float)windowHeight; glMatrixMode(GL_PROJECTION); glLoadIdentity(); /* cx is the eye space center of the zNear plane in X */ glFrustum(cx-halfWidth*aspect, cx+halfWidth*aspect, bottom, top, zNear, zFar);
glOrtho() and gluOrtho2D() are similar to glFrustum().
Can I make OpenGL use a left-handed coordinate space?
OpenGL doesn't have a mode switch to change from right- to left-handed coordinates. However, you can easily obtain a left-handed coordinate system by multiplying a negative Z scale onto the ModelView matrix. For example:
glMatrixMode (GL_MODELVIEW); glLoadIdentity (); glScalef (1., 1., -1.); /* multiply view transforms as usual... */ /* multiply model transforms as usual... */
How can I transform an object so that it points at or follows another object or point in my scene?
You need to construct a matrix that transforms from your object's local coordinate system into a coordinate system that faces in the desired direction. See this example code to see how this type of matrix is created.
If you merely want to render an object so that it always faces the viewer, you might consider simply rendering it in eye-coordinate space with the ModelView matrix set to the identity.
How can I transform an object with a given yaw, pitch, and roll?
The upper left 3x3 portion of a transformation matrix is composed of the new X, Y, and Z axes of the post-transformation coordinate space.
If the new transform is a roll, compute new local Y and X axes by rotating them "roll" degrees around the local Z axis. Do similar calculations if the transform is a pitch or yaw. Then simply construct your transformation matrix by inserting the new local X, Y, and Z axes into the upper left 3x3 portion of an identity matrix. This matrix can be passed as a parameter to glMultMatrix().
Further rotations should be computed around the new local axes. This will inevitably require rotation about an arbitrary axis, which can be confusing to inexperienced 3D programmers. This is a basic concept in linear algebra.
Many programmers apply all three transformations -- yaw, pitch, and roll -- at once as successive glRotate() calls about the X, Y, and Z axes. This has the disadvantage of creating gimbal lock, in which the result depends on the order of glRotate() calls.
How do I render a mirror?
Render your scene twice, once as it is reflected in the mirror, then once from the normal (non-reflected) view. Example code demonstrates this technique.
For axis-aligned mirrors, such as a mirror on the YZ plane, the reflected scene can be rendered with a simple scale and translate. Scale by -1.0 in the axis corresponding to the mirror's normal, and translate by twice the mirror's distance from the origin. Rendering the scene with these transforms in place will yield the scene reflected in the mirror. Use the matrix stack to restore the view transform to its previous value.
Next, clear the depth buffer with a call to glClear(GL_DEPTH_BUFFER_BIT). Then render the mirror. For a perfectly reflecting mirror, render into the depth buffer only. Real mirrors are not perfect reflectors, as they absorb some light. To create this effect, use blending to render a black mirror with an alpha of 0.05. glBlendFunc(GL_SRC_ALPHA,GL_ONE_MINUS_SRC_ALPHA) is a good blending function for this purpose.
Finally, render the non-reflected scene. Since the entire reflected scene exists in the color buffer, and not just the portion of the reflected scene in the mirror, you will need to touch all pixels to overwrite areas of the reflected scene that should not be visible.
How can I do my own perspective scaling?
OpenGL multiplies your coordinates by the ModelView matrix, then by the Projection matrix to get clip coordinates. It then performs the perspective divide to obtain normalized device coordinates. It's the perspective division step that creates a perspective rendering, with geometry in the distance appearing smaller than the geometry in the foreground. The perspective division stage is accomplished by dividing your XYZ clipping coordinate values by the clipping coordinate W value, such as:
Xndc = Xcc/Wcc Yndc = Ycc/Wcc Zndc = Zcc/Wcc
To do your own perspective correction, you need to obtain the clipping coordinate W value. The feedback buffer provides homogenous coordinates with XYZ in device coordinates and W in clip coordinates. You might also glGetFloatv(GL_CURRENT_RASTER_POSITION,…) and the W value will again be in clipping coordinates, while XYZ are in device coordinates.