Does anybody know how to speed up pixel data transfer to and from graphic cards without involving CPU cycles ?

I am trying to develop a test application on TI OMAP3530 beagleboard that loads 24-bit bmp images, rotates each bmp image, and displays the rotated images.

I modified one of the openVG tranining sample code (OVGPatternFill.cpp) that came with OMAP35x_Graphics_SDK_3_00_00_06. To load bmp, I added loadBMP() in
http://www.videotutorialsrock.com/openg ... t/home.php.
I built/ran the code for beagleboard and found out that the bottleneck is in the memory to GPU memory transfering (actually done by vgImageSubData() ). To transfer a 512x512 bmp image from local memory to GPU, it took around 500 msec.

I heard Pixel Buffer Objects (PBO) are often suggested for faster image transfer (http://www.songho.ca/opengl/gl_pbo.html), but it seems that PBO is not supported by OpenGL ES /OpenVG.

Any help would be appreciated !

Thank you.