When I programmed on SGIs years ago there was a fast way to copy a block of pixels from part of the back buffer to another part of the back buffer.
Now I’m programming on PCs with NVIDIA and ATI cards. Today it’s an NVIDIA Quadro4 900 XGL.
I thought glCopyPixels was a fast way to copy the contents of the current read buffer to the current draw buffer at the current raster position.
For some reason I am getting way better performance with glDrawPixels, with an image in host memory, than I am from glCopyPixels, where the from and to are both in AGP mem. That is completely the opposite of what I expected.
A quick review of the docs says that glCopyPixels is changing each pixel color component to float during the transfer, which I’m guessing is the cause the slow down.
Is there any way to turn off this conversion to float and just make a fast pixel copy (BLiT) from/to the same buffer in the same format?
I noticed the pbuffer extension spec conformance test lists a step as “Blit from one buffer to the other” so I’m hoping this blit-like copy exists and I’m just missing something obvious.
Thanks.
[This message has been edited by robosport (edited 03-03-2004).]
CopyPixels is fast if there’s no bias, scaling, depth buffering, or something which is not a 1-to-1 copy, that is, if it’s a simple color blit.
There won’t be a conversion to float for such case, that’d be silly.
>>where the from and to are both in AGP mem.<<
and is impossible. At least one buffer lies in video memory. Using glDrawPixels, the destination, using glReadPixels the source, using glCopyPixels both.
I think you’d have a better luck creating a temp. texture, doing a glCopyTexSubImage2D, and then creating a 2D quad with this sub texture where you want to display it.
I don’t think so.
A simple 1-to-1 glCopyPixels from back to front for example should be faster than a glCopyTexSubImage from back and glBegin(GL_QUADS);… to front.
In theory it should be faster, but as i don’t think it’s a code path frequently used by developers, it might not be super optimized by some drivers. To be safe you should test both.
Interesting. Is that pure copyteximage performance or was the texture actually used afterwards?
If not, what remains at the end if the texture is used to draw the exact same rectangle the copypixels blitted?
Originally posted by Relic: Interesting. Is that pure copyteximage performance or was the texture actually used afterwards?
If not, what remains at the end if the texture is used to draw the exact same rectangle the copypixels blitted?
Thanks for the quick replies. That glCopyPixels benchmark is exactly the kind of performance I was expecting (and have seen in the past) from glCopyPixels.
Unfortunately on my Quadro4, with one of the latest drivers, it is really slow (multiple times slower than glDrawPixels).
Am I using glCopyPixels incorrectly? Here is the code for a back buffer to back buffer blit:
// trying to copy a square block from bottom left corner of buffer
int blockWidth = 200;
int blockHeight = 200;
// already set up ortho projection that matches screen/pixel coordinates
glReadBuffer( GL_BACK );
glDrawBuffer( GL_BACK );
// going to blit it just to the right
glRasterPos2i( blockWidth, 0);
glCopyPixels( 0, 0, blockWidth, blockHeight, GL_COLOR );
Must be something else wrong because the code I posted up above is only giving me a dozen blits per second at most from/to back buffer (with no swapping between the blits).
This problem has been around for at least a year now so it seems that there is no easy fix.
This is a major issue since users can virtually break applications that use copypixels and make those using readpixels unusable.
It would be useful to have a document that explains what the problem is, why it is occurring, will it ever be resolved and what developers should do to minimise the issues.
The obvious temporary solution is to switch multisampling off but this doesn’t prevent users from forcing it on. OK users are warned when they force FSAA on that some apps may not work correctly.
It should be mentioned in this FAQ. http://www.nvidia.com/object/General_FAQ.html#p1 in the “I’m using glDrawPixels and glReadPixels in OpenGL. I’m seeing poor performance. What should I do?” section.
I’m interested to know if ATI cards have a similar problem, can someone run the benchmark and report the readpixels/copypixels speed with and without FSAA on? Thx.
Incidently on my system CopyTexImage is also affected by FSAA but not so badly.
NVidia’s copypixels benchmark shows the same slow performance with FSAA if I change glCopyPixels(size, 0, size, size, copyType) to glCopyPixels(0, 0, size, size, copyType)
Running NVidias readpixels benchmark with FSAA initially showed normal performance. I compared their benchmark and mine and the only difference was that they run in windowed mode. When I Added glutFullScreen() and switched on FSAA the performance was as slow as with my benchmark.
[This message has been edited by Adrian (edited 03-04-2004).]
Clarification… this isn’t just slowing down when FSAA is turned on via driver settings.
This glCopyPixels slow down happens when the context is pragmatically using multisampled antialias as well. i.e. WGL_SAMPLE_BUFFERS_ARB with GL_MULTISAMPLE_ARB enabled.