Everywhere I read that GPU to CPU transfers are horribly slow. Now I've had my fair share of GPU programming so I know this is true, but I'm wondering why?
At first I thought it had to do with the bandwidth between the GPU and CPU. I guess it would have something to do with it in some situations, but I ran into a particular situation where it shouldn't have anything to do with it. I was doing a number of calculations on textures, mapping them to other textures using FBOs. And after one complete calculation-transaction was done, I would read back 1 pixel which contained my answer. Now reading 1 pixel would hardly fill the bandwidth I have, but yet when I put this pixel inside a different texture and read back the complete texture after a number passes, it would be faster.
My other idea was that it had something to do with the asynchronous behavior of the GPU-CPU. Since the calls to glReadPixel have to occur synchronous, either one of them has to wait. So let's say the CPU has to wait for the GPU since it's not done yet. Then the CPU would still have to wait for the GPU after 10 passes or any number of passes, since the GPU doesn't magically calculate faster when you do a couple of more calculations.
But where does the speedup then come from?
(I hope everything makes sense btw)



)
