[QUOTE=hujanais;1272133]Hello there,
I have been reading lots about using PBOs to capture screenshots rapidly and have got it working. Here is my scenario.
Hardware platform : Android with OpenGL 3.0.[/QUOTE]
What embedded platform is this?
What GPU does it have in it?
What speed of DRAM is in the system?
Does it have dedicated GPU memory?
As GClements said, there is an OpenGL ES Forum on Khronos.org, and you should certainly try posting on it. That said, in my experience, the Khronos GL-ES forum is not very active. By contrast, we get all kinds of OpenGL ES questions on the OpenGL.org forums and have lots of folks reading here. So feel free to post here if you don’t get what you need.
glBindBuffer (GL_PIXEL_PACK_BUFFER, pboIndex[index]);
glReadPixels(0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, NULL); // trigger glReadPixels
What values are assigned to width and height?
What is the format of the color buffer buffer you’re reading from (e.g. RGB565, RGBA4, RGB8, RGBA8, etc.)?
Does the format you’re reading from match the format (bit depth, component order, and packing) you’re asking for?
Is the buffer you’re reading from an EGL surface or a color attachment in an FBO?
glBindBuffer(GL_PIXEL_PACK_BUFFER, pboIndex[nextIndex]); // I am positive this is not waiting for the previous draw to complete as I have experimented by waiting 3-4 frames to be absolutely certain.
ByteBuffer byteBuffer = glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, datasize, GL_MAP_READ_BIT); // read the buffer from the previously frame. the glReadPixel should be done and this will return immediately.
Have you put timing calipers around all of the above section of code to verify that this is all taking near-zero time? How much time do you measure?
// Package byteBuffer into byte array.
// This is where I am having a major slowdown. I have used memcpy in JNI and just straight ByteBuffer.clone. They all work correctly but just too slow.
// When I clone the data from a predefined array of the same datasize, it completes in 5ms but with the buffer pointer coming back from glMapBufferRange, it takes almost 20-30ms. What gives?
You say “package” but then you also say “memcpy” so it’s not clear. Is there any processing involved (e.g. repacking), or is this literally just a memcpy (possibly prepending a header)?
As GClements said, GPUs and GPU drivers typically aren’t optimized for the readback case. Some vendors even cripple the readback performance to serve some marketing goal. That said, with knowledge of your GPU and what formats and methods work best with it for readback, you can often increase your readback performance.
If you would, please make sure that the code prior to and including the MapBufferRange() call is “really” coming back to you in almost zero time. If the previous render and readback hasn’t completed, it is here that you would expect to see a stall. Readbacks are especially bad on mobile GPUs because most of them have low memory bandwidth and run with an added frame of draw latency to try and cover for the very slow CPU/system memory they’re typically forced to use, and a readback will cause a full pipeline flush and sync which is particularly time consuming. Also keep in mind that many GPUs don’t store framebuffer pixel data in the order that you want it to be read back in, so often the driver and possibly the GPU have to do extra work to at least reorder the pixel data if not also convert the pixel format (if what it has and what you want don’t match).
If it truly is your memcpy out of the mapped PBO that is slow, then short of optimizing the readback format, resolution, and method based on your knowledge of the GPU and GPU driver, you’re somewhat at the mercy of the speed of memory your GL driver is putting that buffer in and the speed of your system memory.
Also check out: