Vertex Specification Best Practices

From OpenGL Wiki
Jump to navigation Jump to search


See VBO for general details.

Size of a VBO/IBO

  • How small or how large should a VBO be?

You can make it as small as you like but it is better to put many objects into one VBO and attempt to reduce the number of calls you make to glBindBuffer and glVertexPointer and other GL functions.

You can also make it as large as you want but keep in mind that if it is too large, it might not be stored in VRAM or perhaps the driver won't allocate your VBO and give you a GL_OUT_OF_MEMORY.

1MB to 4MB is a nice size according to one nVidia document. The driver can do memory management more easily. It should be the same case for all other implementations as well like ATI/AMD, Intel, SiS.

Formatting VBO Data

VBOs are quite flexible in how you use them. For instance, there are a number of ways you can represent vertex attribute data in VBOs:

One option for using them would be, for each batch (draw call) allocate a separate VBO per vertex attribute. This is certainly possible. If you have vertex, normal, and color as vertex attributes, pictorially this is: (VVVV) (NNNN) (CCCC)
Another approach is to store the vertex attribute blocks in a batch, one right after the other, in the same block and stuff them all in the same VBO. When specifying the vertex attributes via gl*Pointer() calls you'd pass byte offsets into the VBO to the ptr parameters. Pictorially, this is: (VVVVNNNNCCCC).
Yet another approach is to interleave the vertex attributes for each vertex in a batch, and then store each of these interleaved vertex blocks sequentially, again combining all the vertex attributes into a single buffer. As before, you'd pass byte offsets into the VBO to the gl*Pointer() ptr parameters, but you'd also use the stride parameter to ensure each vertex attribute array access only touched elements for that attribute array. Pictorially, this option is: (VNCVNCVNCVNC)

Now this is just a single batch. There's also nothing stopping you from storing the vertex attribute data for multiple batches inside a single VBO or set of VBOs.

The optimal layout depends on the specific GPU and driver (plus OpenGL implementation), but we can apply a little common sense to help steer our choices. Firstly, one should always vectorize data, and the size of each vector should ideally have four or eight 4-byte (i.e. float or int) components, since that makes them cache-friendly and memory-request-friendly, and also allows modern GPUs to more easily facilitate instruction level parallelism. That means that your 3D position, normal and color information, for example, should be padded into 4D vectors. This works well with most vector operations, given the last component is 0. If it is not zero, certain operations, such as the cross product, will yield wrong results. If you use 3-component vectors instead, some GPU drivers will automatically do the padding for you, others will simply suffer from reduced throughput and increased cache misses.

Secondly comes the big question of whether or not to interleave your data. That really depends on how the rendering pipeline inside each GPU is setup: Modern GPUs, being SIMD devices, have eight to sixteen streaming multiprocessors. That means that they can only execute eight to sixteen different instructions at the same time. However, each processor can concurrently run up many shaders, each performing the same task on a different vertex or pixel. For example, if the current instruction performs an operation on the vertex normal, and you have 100 shaders, each of them sends a 16-byte (4 components x 4 bytes) memory fetch request to the memory scheduler. If the currently requested vertex information is not available in local memory, the scheduler bundles requests to sequential addresses into larger batches, before sending them to memory. One request can usually deliver 32, 64, 128 or 256 bytes of sequential data. If the shaders are working on neighboring vertices and the normals are not interleaved with other data, a whole bunch of normals can be retrieved at once (up to 256/16 = 16 normals). If they are interleaved with other data, in addition to normals, the data coming back from memory would also contain other vertex data, such as position or color, which would have to be discarded, or, together with the normals, put into a small cache. The first case would result into less throughput, whereas efficient caching would imply that both layouts could perform about equally well. There could also exist a case where the render pipeline is optimized for interleaved format which might yield worse performance for non-interleaved data. In conclusion, the GPU render pipeline implementation and usage of components, such as on-chip shared memory, as well as the geometry- and texture-caches, all play a large role in choosing the right data layout. The best approach would be running benchmarks, and then sharing results (with GPU name and model) with the community, by posting or linking them to this wiki.

Vertex, normals, texcoords

  • Should you create a separate VBO for each? Would you lose performance?

If your objects are static, then merge them all into as few VBOs as possible for best performance. See above section for more details on layout considerations.

If only some of the vertex attributes are dynamic, i.e. often changing, placing them in separate VBO makes updates easier and faster.

For example, if you are simulating water on the CPU, the position of each vertex might change all the time, but it's color stays the same.

EXAMPLE: Multiple Vertex Attribute VBOs Per Batch
  //Binding the vertex
  glBindBuffer(GL_ARRAY_BUFFER, vertexVBOID);
  glVertexPointer(3, GL_FLOAT, sizeof(float)*3, NULL);  //Vertex start position address

  //Bind normal and texcoord
  glBindBuffer(GL_ARRAY_BUFFER, otherVBOID);
  glNormalPointer(GL_FLOAT, sizeof(float)*6, NULL); //Normal start position address
  glTexCoordPointer(2, GL_FLOAT, sizeof(float)*6, sizeof(float*3);  //Texcoord start position address

Dynamic VBO

  • If the contents of your VBO will be dynamic, should you call glBufferData or glBufferSubData (or glMapBuffer)?

If you will be updating a small section, use glBufferSubData. If you will update the entire VBO, use glBufferData (this information reportedly comes from a nVidia document). However, another approach reputed to work well when updating an entire buffer is to call glBufferData with a NULL pointer, and then glBufferSubData with the new contents. The NULL pointer to glBufferData lets the driver know you don't care about the previous contents so it's free to substitute a totally different buffer, and that helps the driver pipeline uploads more efficiently.

Another thing you can do is double buffered VBO. This means you make 2 VBOs. On frame N, you update VBO 2 and you render with VBO 1. On frame N+1, you update VBO 1 and you render from VBO 2. This also gives a nice boost in performance for nVidia and ATI/AMD.

Vertex Layout Specification

A lot of new code gets written this way

  glBindBuffer(GL_ARRAY_BUFFER, vboID);
  glVertexPointer(3, GL_FLOAT, sizeof(TVertex_VNTWI), info->posOffset);
  glTexCoordPointer(2, GL_FLOAT, sizeof(TVertex_VNTWI), info->texOffset);
  glNormalPointer(GL_FLOAT, sizeof(TVertex_VNTWI), info->nmlOffset);
  int weightPosition = glGetAttribLocation(programID, "blendWeights");
  glVertexAttribPointer(weightPosition, 4, GL_FLOAT, GL_FALSE, sizeof(TVertex_VNTWI), info->weightOffset);
  int indexPosition = glGetAttribLocation(programID, "blendIndices");
  glVertexAttribPointer(indexPosition, 4, GL_UNSIGNED_BYTE, GL_FALSE, sizeof(TVertex_VNTWI), info->indexOffset);
  glDrawElements(GL_TRIANGLES, numIndices, GL_UNSIGNED_SHORT, 0);

and in the shader, would be using gl_Vertex, gl_Normal and gl_MultiTexCoord0. It is better to use generic vertex attributes for your vertex, normal and texcoord as well, since it is the modern way of specifying your vertex layout. You are already using it for your blendWeights and blendIndices.

In GL 3.1+ core contexts, you are forced to use your own vertex attributes with calls to glVertexAttribPointer.

See Also