PDA

View Full Version : VBO Test - glBufferData vs glBufferSubData vs glMapBufferOES



John
03-31-2011, 09:29 AM
Hi,

I am John, and new to this forum. I want to start by saying, I love OpenGL ES. There are few things that the more I use, the more I fall in love with; C and OpenGL ES are among them.

Here is my question:
I have been doing a lot of tests for how I should setup the base rendering engine. Right now I am not using buffers, but had been considering them for quite a while. I have found that glMapBufferOES is slower then glBufferSubData which is slower then glBufferData; and I am wondering why that is.

My test environment:
OpenGL ES 1.1
iPad (first generation) - PowerVR SGX
iPhone 3G - PowerVR MBX

To my understanding, glBufferSubData should always be faster then glBufferData because glBufferData reallocs the memory each time called, thus if your size doesn't change, use glBufferSubData otherwise use glBufferData. What I have found is that glBufferSubData runs about 68% the speed of only using glBufferData.

I also understand that glMapBufferOES is an extension, but I have found that it also runs slower then glBufferSubData or glBufferData. It, in fact, is the slowest way of updating vertex information.

Overall:
iPad speed: glMapBufferOES < glBufferSubData < glBufferData <= No buffers
iPhone speed: glMapBufferOES ? glBufferSubData ? glBufferData ? No buffers

Is this normal?

Thanks for your help!

jpilon
03-31-2011, 04:09 PM
There might be inefficiencies in the implementation of glMapBuffers and glBufferSubData on the ipad. If things are optimal I would expect:

glMapBufferOES ? glBufferSubData ? glBufferData

Note, some driver will optimize the case of doing uploads with glBufferData if the size is the same as the previous, which allows skipping the re-allocation. You're likely hitting the optimized case. Otherwise, it would be much slower then glBufferSubData and glMapBuffers. You can try removing or adding one vertex each frame, and I bet you'd see a difference.

John
04-01-2011, 09:05 AM
Thanks for the reply!

I took your suggestion and randomized the quantity of data I sent to openGL (seeding prior to each test of course). Unfortunately I came up with the same results.

Note: The values below are Frames Per Second that were averaged over 300 tests.

Key:


I = Not buffered P = Points N = Not textured
M = glMapBufferOES Q = Quad Y = Textured
S = glBufferSubData
V = glBufferData


iPad



Count IPN IPY IQN IQY MPN MPY MQN MQY SPN SPY SQN SQY VPN VPY VQN VQY
256 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00
384 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00
512 60.00 60.00 60.00 60.00 60.00 51.51 60.00 60.00 60.00 51.79 60.00 60.00 60.00 60.00 60.00 60.00
768 60.00 53.95 60.00 60.00 60.00 40.53 60.00 56.76 60.00 40.54 60.00 57.18 60.00 54.07 60.00 60.00
1024 60.00 42.78 60.00 60.00 60.00 33.26 60.00 48.07 60.00 33.41 60.00 48.31 60.00 42.74 60.00 60.00
1532 60.00 29.96 60.00 48.71 60.00 24.60 60.00 36.71 60.00 24.69 59.87 36.72 60.00 30.09 60.00 48.23
2048 60.00 23.13 60.00 37.88 54.96 19.55 51.04 29.70 55.59 19.65 51.45 29.77 60.00 23.00 60.00 37.86
3072 60.00 15.85 60.00 26.11 43.30 13.82 39.87 21.40 43.71 13.87 39.97 21.35 60.00 15.81 60.00 26.18
4096 54.32 12.09 49.16 18.79 35.81 10.74 32.81 16.83 36.28 10.78 33.06 16.82 54.30 12.08 49.17 19.98
6144 40.51 8.19 33.97 12.68 26.44 7.40 23.86 11.71 26.76 7.43 24.09 11.74 40.36 8.18 36.56 13.60
8192 31.51 6.19 26.08 9.58 21.02 5.65 19.00 9.06 21.27 5.67 19.21 9.08 31.47 6.19 28.72 10.27
10500 25.23 4.85 20.42 7.53 17.02 4.46 15.12 7.14 17.28 4.47 15.31 7.18 25.14 4.86 22.84 8.05



iPad (Count - rand() % 10)



Count IPN IPY IQN IQY MPN MPY MQN MQY SPN SPY SQN SQY VPN VPY VQN VQY
256 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00
384 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00 60.00
512 60.00 60.00 60.00 60.00 60.00 51.56 60.00 60.00 60.00 51.89 60.00 60.00 60.00 60.00 60.00 60.00
768 60.00 54.64 60.00 60.00 60.00 40.56 60.00 56.29 60.00 40.53 60.00 56.74 60.00 53.90 60.00 60.00
1024 60.00 42.79 60.00 60.00 60.00 33.30 60.00 47.73 60.00 33.47 60.00 47.97 60.00 42.92 60.00 60.00
1532 60.00 30.14 60.00 48.39 60.00 24.61 59.68 36.39 60.00 24.70 59.83 36.41 60.00 30.20 60.00 48.81
2048 60.00 22.96 60.00 37.58 54.67 19.54 50.84 29.34 55.50 19.64 51.20 29.48 60.00 23.01 60.00 37.14
3072 60.00 15.88 60.00 25.78 43.05 13.81 39.71 21.14 43.64 13.87 39.71 21.12 60.00 15.87 60.00 25.69
4096 53.66 12.08 48.94 18.50 35.64 10.73 32.70 16.62 36.15 10.77 32.96 16.70 54.47 12.09 48.96 19.65
6144 40.48 8.17 33.59 12.55 26.38 7.40 23.80 11.57 26.68 7.43 23.87 11.61 40.33 8.20 36.45 13.37
8192 31.54 6.20 25.88 9.46 20.93 5.65 18.92 8.93 21.19 5.67 19.09 8.97 31.30 6.18 28.50 10.14
10500 24.97 4.85 20.38 7.43 16.98 4.45 15.12 7.04 17.20 4.46 15.12 7.07 25.24 4.85 22.82 7.94



iPhone



Count IPN IPY IQN IQY MPN MPY MQN MQY SPN SPY SQN SQY VPN VPY VQN VQY
256 30.00 29.23 20.77 20.12 30.00 29.38 20.22 19.18 30.00 29.24 20.22 19.29 30.00 29.33 20.11 20.18
384 30.00 27.21 14.82 14.47 30.00 27.29 15.60 14.46 30.00 27.36 15.41 14.36 30.00 27.12 14.84 14.38
512 30.00 24.60 12.11 11.32 30.00 24.77 12.09 11.29 30.00 24.70 12.10 11.28 30.00 24.63 12.08 11.26
768 28.61 18.41 8.21 7.74 28.58 18.63 8.22 7.74 28.64 18.30 8.21 7.73 28.66 18.30 8.20 7.72
1024 27.97 17.07 6.17 5.85 27.67 17.06 6.17 5.84 27.36 17.08 6.15 5.81 27.29 16.99 6.15 5.81
1532 26.06 12.54 4.06 3.86 25.70 12.53 4.06 3.86 26.19 12.55 4.06 3.85 25.69 12.47 4.05 3.84
2048 21.28 9.79 2.94 2.80 21.40 9.80 2.94 2.79 21.40 9.77 2.93 2.79 21.26 9.76 2.91 2.79

John
04-01-2011, 09:11 AM
Whoops, hit submit instead of preview and it won't let me edit :-(. I was trying to align the tables properly.

*I should also note that 'quads' refers to GL_TRIANGLE_STRIP with 6 * count - 2 vertices.*

I am just really confused why glBufferSubData could ever be slower then glBufferData.

Xmas
04-04-2011, 05:08 AM
I am just really confused why glBufferSubData could ever be slower then glBufferData.
Are you replacing the entire buffer contents or just a subset?

John
04-04-2011, 07:17 AM
Whoops, sorry, I forgot to explain.

The count in my tests is the quantity that I am actually updating and drawing. The real amount in memory is always the "next power of two".

Ex:
Count (update and send) -> in Memory
256 -> 256
384 -> 512
512 -> 512
768 -> 1024
1024 -> 1024
...
etc.

info->_sendCount = count;
info->_maxCount = nextPowerOfTwo(count);


void render()
...
if (info->_lastMaxCount != info->_maxCount)
{
info->_lastMaxCount = info->_maxCount;
glBufferData(GL_ARRAY_BUFFER, dataSize * info->_maxCount, info->_vertices, GL_DYNAMIC_DRAW);
}
else
{
glBufferSubData(GL_ARRAY_BUFFER, 0, dataSize * info->_sendCount, info->_vertices);
}
...


This should give subBuffer an advantage on half of the tests (unless bufferData is optimized as jpilon mentioned). Though, even if it is the full buffer though, shouldn't it be at least equal speed?

If you want, I can post the code, I don't mind sharing :-).