tamlin
03-15-2007, 01:46 PM
I just hit an old code path where I did a CopyPixels on a 256x256 (32bpp) backbuffer area. The only potential culprit I can find is my use of glPixelZoom(1,-1), as I needed the rendered part upside-down in a texture (that I later got by glCopyTexImage2D).
I was more than a little surprised to find out that glCopyPixels took almost 50ms (!) using a 7600 (+93.71). That evaluates to roughly 10MB/s transfer speed! IIRC it ran at least 25-50 times as fast on an age old ati 9600.
I have checked states and I think they are all as expected, but even if I everything and its granma had been enabled.. reaching a speed as low as 10MB/s? (for the curious - no, I haven't underclocked the card to 2.5MHz :-) )
I did an emergency hack where I now simply glReadPixels the area, flip it in system memory using CPU, and finally uploading this to the texture. This lame hack turns out to be probably at least two orders of magnitude faster than a simple flip-upside-down blit that the card should be able to handle, at my estimate, at least 10.000 of/second (which may be grossly underestimated).
Can anyone think of some state, or really just anything, that could produce such truly horrible excuse for performance? I mean, when the most naive approach turned out to be around a hundred times as fast, something must be quite seriously broken here and I want to get to find the root of this.
I was more than a little surprised to find out that glCopyPixels took almost 50ms (!) using a 7600 (+93.71). That evaluates to roughly 10MB/s transfer speed! IIRC it ran at least 25-50 times as fast on an age old ati 9600.
I have checked states and I think they are all as expected, but even if I everything and its granma had been enabled.. reaching a speed as low as 10MB/s? (for the curious - no, I haven't underclocked the card to 2.5MHz :-) )
I did an emergency hack where I now simply glReadPixels the area, flip it in system memory using CPU, and finally uploading this to the texture. This lame hack turns out to be probably at least two orders of magnitude faster than a simple flip-upside-down blit that the card should be able to handle, at my estimate, at least 10.000 of/second (which may be grossly underestimated).
Can anyone think of some state, or really just anything, that could produce such truly horrible excuse for performance? I mean, when the most naive approach turned out to be around a hundred times as fast, something must be quite seriously broken here and I want to get to find the root of this.