Getting texture data after render to texture

I am trying to utilize the shaders to do some data manipulation. I pass in the input data as a 2D texture to the shader and let it render to an output texture through Framebuffer object. The problem is, I don’t know how to get the data out of the output texture. The output texture was bounded to the framebuffer object as color attachment. When I bind the frambuffer object as current, am I supposed to be able to call glReadPixels to get the output texture data out? I tried this approach but it does not seem to be working.

Is the above approach correct or should I do something different?

First of all you should say the reason why you want to get the data out. Pulling data back from the GPU to the CPU is never a fast process; aside from bandwidth considerations (which may or may not be relevant depending on the age of your hardware) it requires to stall the pipeline. There may be a way to do whatever it is you need to do next while still leaving everything on the GPU.

If not, look at glGetTexImage: http://www.opengl.org/sdk/docs/man/xhtml/glGetTexImage.xml

We want to leverage GPU to do some processing that would be very suitable to do in shader. Also on the CPU side, it is already very busy. There is a possibility that we don’t need the data back to CPU, but our main reason is to verify what’s been done in shader is correct.

We can’t use glGetTexImage as it seems not to be available in OpenGL ES, which is what we are using. Is there other way to do it? Why glReadPixels does not work?

Some update: glReadPixels does work, but it is very slow. I think glGetTexImage is similar to glReadPixels, just the parameters are simplified? Internally the data in GPU local memory is copied to main memory where CPU can access.

I am also thinking about using PBO and do a glReadPixels on the PBO, and then do a glMapBuffer on the PBO so that CPU can access the data. But would this has the same performance issue as doing a straight glReadPixels?

On some GPUs, this is definitely true. On others, it is very fast.

I am also thinking about using PBO and do a glReadPixels on the PBO, and then do a glMapBuffer on the PBO so that CPU can access the data. But would this has the same performance issue as doing a straight glReadPixels?

PBO should allow you to pipeline the readback with other operations. Whether you can make use of that or not is your call.

Let me back up and ask this question: do you really want the whole image (i.e. to save to a file) or do you just want to generate statistics based on the image. If the latter, consider using the GPU to do the crunching (reduction) and just read back the reduced result.

Using just glReadPixels is much slower than using PBO with mapping from my testing. From my understanding the latter approach has a copy of the buffer in the GPU local memory only, and this memory is mapped to CPU for access.

I have to have the whole image accessible by CPU as I have some libraries running on CPU that take this image as input.