Performance

From OpenGL Wiki
Revision as of 16:37, 29 April 2011 by Zyx 2000 (talk | contribs) (Moved a section here from the FAQ)
Jump to: navigation, search

This section offers advice on making your OpenGL programs go faster and run smoother.

Measuring Performance

Perhaps the most common error in measuring the performance of an OpenGL program is to say something like:

 start_the_clock () ;
 draw_a_bunch_of_polygons () ;
 stop_the_clock () ;
 swapbuffers () ;

This fails because OpenGL implementations are almost always pipelined - that is to say, things are not necessarily drawn when you tell OpenGL to draw them - and the fact that an OpenGL call returned doesn't mean it finished rendering.

Typically, there is a large FIFO buffer on the front end of the graphics card. When your application code sends polygons to OpenGL, the driver places them into the FIFO and returns to your application. Sometime later, the graphics processor picks up the data from the FIFO and draws it.

Hence, measuring the time it took to pass the polygons to OpenGL tells you very little about performance. It is wise to use a 'glFinish' command between drawing the polygons and stopping the clock - this theoretically forces all of the rendering to be completed before it returns - so you get an accurate idea of how long the rendering took. That works to a degree - but experience suggests that not all implementations have literally completed processing by then.

Worse still, there is the possibility that when you started drawing polygons, the graphics card was already busy because of something you did earlier. Even a 'swapbuffer' call can leave work for the graphics system to do that can hang around and slow down subsequent operations in a mysterious fashion. The cure for this is to put a 'glFinish' call before you start the clock as well as just before you stop the clock.

However, eliminating these overlaps in time can also be misleading. If you measure the time it takes to draw 1000 triangles (with a glFinish in front and behind), you'll probably be happy to discover that it doesn't take twice as long to draw 2000 of them.

Graphics cards are quite complex parallel systems and it's exceedingly hard to measure precisely what's going on.

The best practical solution is to measure the time between consecutive returns from the swapbuffer command with your entire application running. You can then adjust one piece of the code at a time, remeasure and get a reasonable idea of the practical improvements you are getting as a result of your optimisation efforts.

Even when you do this, you may have to take a little care - some systems force the swapbuffers command to wait for the next video vertical retrace before performing the swap - if that is the case then you'll only ever see times that are an exact multiple of the video frame time and it will be impossible to see exactly how much time you are consuming. However, most PC graphics adaptors do not do this by default - so you would probably have to have taken an active step to turn this feature on. However, if you have (say) a 60Hz monitor and your times all come out at either 16.66ms, 33.33ms or 50ms then suspect that you have a problem.

FPS vs. Frame Time

It is common for people to measure FPS which stands for frames per seconds. This is considered as rendering speed by most gamers and new programmers. New programmers render a cube and their FPS shows 2000. They add a few more complexities and suddenly, the FPS drops to 200 and they can't understand what went wrong.

First, FPS is not a great way to measure performance because it is not linear. It is better to measure Frame Time. Frame Time is 1/FPS therefore

1/2000 = the Frame Time is 0.0005 seconds

1/200 = the Frame Time is 0.0050 seconds

You are in fact already measuring Frame Time but you are not paying attention to it. You are then doing 1/(Frame Time) = FPS and you are turning your attention to this value.

Let's take another example : Let's say the FPS is 180. You add a few models to your scene and you end up with 160. How bad is that? Yes, you did lose 20 points but how many seconds longer is it taking?

1/180 = 0.0056 seconds

1/160 = 0.00625 seconds

0.00625 - 0.0056 = 0.00065 seconds (the difference)

Let's continue the example. Assume your FPS is 60 and you add some models and your FPS drops to 40. How bad is that?

1/60 = 0.01667 seconds

1/40 = 0.02500 seconds

0.02500 - 0.01667 = 0.00833 seconds (the difference)

Notice how long it is taking to render?

Understanding where the bottlenecks are

There are generally four things to look at initially.

  1. CPU performance: If your code is so slow that it's not feeding the graphics pipe at the maximum rate it could go - then improving the nature of the things you actually draw won't help much.
  2. Bus bandwidth: There is a finite limit to the rate at which data can be sent from the CPU to the graphics card. If you require too much data to describe what you have to draw - then you may be unable to keep the graphics card busy simply because you can't get it the data it needs. Consider using techniques like display lists to place the bulky data onto the graphics card just once so you can cause it to be rendered with a compact command such as 'glCallList'.
  3. Vertex performance: The first thing of any significance that happens on the graphics card is vertex processing. A vertex Shader can be a bottleneck. This is usually easy to diagnose by replacing the shader with something (for example) without lighting calculations and see if things speed up. If they do - then the odds are good that you are vertex limited. If not...not.
  4. Fragment performance: After vertex processing, the polygons are chopped up into fragments (typically the size of a pixel) and the fragment processing takes over. Fragment processing takes longer for large polygons than for small ones - and generally gets slower the more textures you use - and the more complex your fragment shader is (if you are using one).

If you are fragment processing bound, a really simple test is to reduce the size of the window you are rendering into down to the size of a postage stamp and see if your frame rate improves. If it does then you are at least partially fill rate limited - if it doesn't then that's not the problem. This is such a simple test that it should always be the first thing you try.

There are more subtleties here - but this is a start.

Optimising Performance

Toolkits for understanding Performance

OpenGL Benchmarks

* C vs Perl
* Perl vs Python
* POGL vs SDL::OpenGL
* Windows vs Linux