But this is still very slow for me because I must read back 2 of these images at each step of the process. So, for the moment, I’m not able to break the “30 FPS limitation”.
Is there any way to achieve a faster read back using anything like CUDA or OpenCL or some OpenGL tricks? (and without involving some new hardware…)
That’s what I am currently using in my code. But, as I said, I have two video stream to process like this for BOTH :
Data (in RAM) > PBO > 2D Texture > FBO (1st processing) > Texture > FBO (2nd processing) > Texture
I need to read back the last texture very fast (60Hz would be very cool). For the moment each ending FBO has a specific color attachment to render (one is GL_COLOR_ATTACHMENT0_EXT and the other is GL_COLOR_ATTACHMENT1_EXT). So I have two PBO for reading, per processing line exactly as you wrote it.
Is there any way to write in a PBO directly ie. without using glReadpixels?
If your processing of the pixels have a very small limited kernel, you can use TransformFeedback and do the processing in a vertex shader, writing directly from one buffer object to another.
If you are willing to use either OpenCL or CUDA, both have GL interop capabilities where a GL texture can be used directly by OpenCL or CUDA… there are some rules for the iterop to give well defined outputs (like don’t change the values in GL while a OpenCL or CUDA kernel is using them)… I am like 99.99% sure that if you use CUDA it will be NVIDIA only… I don’t know how well OpenCL works on ATI, or for that matter which generations support it, in NVIDIA for OpenCL or CUDA, one needs GeForce 8 or higher.
You are processing some ‘data’, currently via the Texture object. Since you are already using PBO, your data representation is not that ‘clear’. Generally speaking, you just have a piece of memory and you want to process it into another piece of memory.
For that, instead of copying PBO1->Texture1->Texture2->PBO2 you can simply do PBO1->PBO2 (using TF).
There is pretty good benchmark tool in NVIDIA Cuda SDK,
you can measure readback speed depending on data block size and various memory allocation types.
I made good experiences with GL_RGBA as format and GL_UNSIGNED_INT_8_8_8_8_REV as type. That even has beaten GL_BGRA and GL_UNSIGNED_BYTE.
Also, don’t forget to set GL_PACK_ALIGNMENT to the highest value that suits your format/type combination.