From OpenGL Wiki
Revision as of 02:08, 30 June 2006 by SteveBaker (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This section offers advice on making your OpenGL programs go faster and run smoother.

Measuring Performance

Perhaps the most common error in measuring the performance of an OpenGL program is to say something like:

 start_the_clock () ;
 draw_a_bunch_of_polygons () ;
 stop_the_clock () ;
 swapbuffers () ;

This fails because OpenGL implementations are almost always pipelined - that is to say, things are not necessarily drawn when you tell OpenGL to draw them - and the fact that an OpenGL call returned doesn't mean it finished rendering.

Typically, there is a large FIFO buffer on the front end of the graphics card. When your application code sends polygons to OpenGL, the driver places them into the FIFO and returns to your application. Sometime later, the graphics processor picks up the data from the FIFO and draws it.

Hence, measuring the time it took to pass the polygons to OpenGL tells you very little about performance. It is wise to use a 'glFinish' command between drawing the polygons and stopping the clock - this theoretically forces all of the rendering to be completed before it returns - so you get an accurate idea of how long the rendering took. That works to a degree - but experience suggests that not all implementations have literally completed processing by then.

Worse still, there is the possibility that when you started drawing polygons, the graphics card was already busy because of something you did earlier. Even a 'swapbuffer' call can leave work for the graphics system to do that can hang around and slow down subsequent operations in a mysterious fashion. The cure for this is to put a 'glFinish' call before you start the clock as well as just before you stop the clock.

However, eliminating these overlaps in time can also be misleading. If you measure the time it takes to draw 1000 triangles (with a glFinish in front and behind), you'll probably be happy to discover that it doesn't take twice as long to draw 2000 of them.

Graphics cards are quite complex parallel systems and it's exceedingly hard to measure precisely what's going on.

The best practical solution is to measure the time between consecutive returns from the swapbuffer command with your entire application running. You can then adjust one piece of the code at a time, remeasure and get a reasonable idea of the practical improvements you are getting as a result of your optimisation efforts.

Understanding where the bottlenecks are

There are generally four things to look at initially.

  1. Your CPU performance. If your code is so slow that it's not feeding the graphics pipe at the maximum rate it could go - then improving the nature of the things you actually draw won't help much.
  2. Bus bandwidth. There is a finite limit to the rate at which data can be sent from the CPU to the graphics card. If you require too much data to describe what you have to draw - then you may be unable to keep the graphics card busy simply because you can't get it the data it needs. Consider using techniques like compiled vertex arrays and display lists to place the bulky data onto the graphics card just once so you can cause it to be rendered with a compact command such as 'glCallList'.
  3. Vertex performance. The first thing of any significance that happens on the graphics card is vertex processing. If you are using the standard pipeline then lighting and vertex transformation are done here - if you are using a vertex shader then running the vertex shader can be a bottleneck. This is usually easy to diagnose by replacing the shader with something (for example) without lighting calculations and see if things speed up. If they do - then the odds are good that you are vertex limited. If not...not.
  4. Fragment performance. After vertex processing, the polygons are chopped up into fragments (typically the size of a pixel) and the fragment processing takes over. Fragment processing takes longer for large polygons than for small ones - and generally gets slower the more textures you use - and the more complex your fragment shader is (if you are using one).

If you are fragment processing bound, a really simple test is to reduce the size of the window you are rendering into down to the size of a postage stamp and see if your frame rate improves. If it does then you are at least partially fill rate limited - if it doesn't then that's not the problem. This is such a simple test that it should always be the first thing you try.

There are more subtleties here - but this is a start.

Optimising Performance

Toolkits for understanding Performance