CopyTexSubImage stalls the CPU?

In my application, I continuously generate procedural textures, by first rendering into an FBO, then copying its contents into a texture. The reason I have to do an extra copy, is because I want to store the result in a compressed texture.
So this texture copy operation came up as a spike in my profiler. My guess is that the copy is implemented internally by first doing a ReadPixels, then a TexSubImage. I understand that ReadPixels is normally a blocking call, and stalls the CPU, because it has to wait for the GPU to finish rendering, before it can read back the pixels… However, since it’s the driver doing the readback, it should be able to do it async, right?

I’m running GeForce6600 with 81.98 drivers.

thanks,

Andras

So this texture copy operation came up as a spike in my profiler. My guess is that the copy is implemented internally by first doing a ReadPixels, then a TexSubImage.
Your GPU does not support compressing the texture. Therefore, what the driver has to do is copy the texture to main memory (which is generally slow, and must be syncronous, since it’s about to do something with it), compress the texture manually (not a fast operation), and then upload it back (which theoretically can be async, but by now, it doesn’t matter).

No, it’s not ever a good idea to render to a compressed texture.

I don’t know if it’s done in hardware or not, but it seems really fast. But even if it wasn’t, I wouldn’t care as long as it didn’t block the CPU. And it doesn’t have to block, because it certainly does not need the data, until it finished rendering. This operation could be queued the same way all other commands are.
Just to give you a proof that it can be done without blocking: I could do the same thing manually, by reading into a PBO (which is an async operation), and then, when the rendering is done (here’s my only problem, because I’m not the driver, so I don’t know when it’s done, but I could wait a frame or two, just to make sure), and then do a TexSubImage (again, from PBO, which is, again, asynchronous). And there you have a completely async solution. It’s just a pain to do myself, and since I can’t see the internals of the HW, I can’t know when to initiate upload, so I either wait too much, or too little, in which case it will really have to block…

Originally posted by andras:
a proof that it can be done without blocking: I could do the same thing manually, by reading into a PBO (which is an async operation), and then, when the rendering is done
You are contradicting yourself. You (or the driver) need glFinish() to ensure rendering is done, and that can not be async.

The driver does not need to flush the command queue, it should be able to just append the CopyTexImage command right after my last OpenGL command, just like everything else, so it should be executed when everything in the queue in front of it is done.

Yes, if I do it myself, then I would need a glFinish(), and this would be implicitly called, when I either map the PBO, or use it as a source to TexSubImage. This is exactly why I said, that I would have to wait long enough (read: do something else), so by the time I call TexSubImage (which will cause an implicit glFinish()), it’s done already, so it doesn’t have to block… The driver does not have this disadvantage.

OpenGL is only asynchronous from my (the user’s) perspective. Inside, it’s executing one command after another, so there’s no need for synchronization!