framebuffer readback performance

hello
I’m currently trying to use a GPU to do intensive computation. I need at some point to get the computation results from the frame buffer to the CPU using glReadPixels() and for my computations now this is a bottleneck. With AGP, I have 120MBytes/s of bandwidth (Quadro FX 3000).
Evrywhere when you hear about PCI-express from card/chipset manufacturers they’ll boast that you can expect 5x in readback bandwidth vs. AGP.
I have tried my program un a Quadro FX 3400 on a HP xw6200 (Intel E7525) with Windows XP and seen no improvement at all.
It is easy to test the memory bandwidth using the Mesa demo “readpix.exe”.
Does anybody have ever seen another behavior on PCI express board ?
Is it better on other hardware (Wildcard or ATI cards or other chipset) ?
Is it a probleme of driver version ? (I have 61.71)
Is it a problem of operating system (Linux/Windows 2000) ?
Maybe I should try another way than readpix (PixelBufferObjects ??)
Just in case someone from a card manufacturer listens, you card does not match the specs :mad:

Best case on AGP is roughly 180MB/s on all Geforce cards up, to and including the Geforce FX line. ATI AGP cards are even worse (~100MB/s).

The 3DLabs Wildcat VP is way beyond that, expect 600+ MB/s.

The Geforce 6 series (still on AGP) is purportedly faster at readback, too. Couldn’t test myself, so no hard numbers atm.

The Quadro FX may not be representative of “real” PCI Express cards. It’s a chip with a native AGP interface, connected to PCI Express through an external bridge chip (NVIDIA’s “HSI”). The bridge can’t fix the chip’s inherent problems with readback performance.

I understand that this is not a “real” pci express card. When you go to PNY spec sheets of the 3400 they pretend that the card can do “4 GB per second in both upstream and downstream data transfers”.
But you can also read like : the card can use “PCI express” that could reach one day “4GB/s” but the card just sucks like an AGP card.
Just usual lies…
I’ll check if I can test a Wildcard now.
Thank you

Originally posted by zeckensack:
Best case on AGP is roughly 180MB/s on all Geforce cards up, to and including the Geforce FX line. ATI AGP cards are even worse (~100MB/s).
This surely depends on the platform and driver optimization; the results are different on Mac OS X. I recently benchmarked glReadPixels performance on several Macintosh models, here are some numbers for reading back 640x480 32bpp BGRA over AGP with regular no-tricks glReadPixels:

400 Mhz G3 iMac r128: 28 MB/sec
1 GHz G4 eMac r7500: 67 MB/sec
800 Mhz G4 iMac gf2mx: 92 MB/sec
1.25 GHz G4 eMac r9200: 219 MB/sec
1 GHz G4 PowerBook gf4mx: 152 MB/sec
1.8 Ghz G5 iMac gf5200: 165 MB/sec
1.25 GHz G4 PowerBook r9600: 230 MB/sec
2.5 GHz G5 PowerMac gf6800: 275 MB/sec

This is with glFinish before and after the read to accurately time the call, it would be a bit faster if I didn’t force flushing. You can see that the new Geforce cards exceed 180 MB/sec and the ATI cards are generally better than the nvidia ones.

On OS X there is another optimized readback path using glGetTexImage with client mapped storage that is supposed to be even faster, but I haven’t benchmarked it yet.

Originally posted by arekkusu:
This surely depends on the platform and driver optimization; the results are different on Mac OS X. I recently benchmarked glReadPixels performance on several Macintosh models, here are some numbers for reading back 640x480 32bpp BGRA over AGP with regular no-tricks glReadPixels:
<snipped>

Agreed.
I’m on an Athlon 64 2.0GHz, KT880Pro, dual channel PC3200 memory, Windows 2000SP4, latest official drivers for everything (including the chipset, of course).

I was getting very close to my current numbers with an Athlon XP2400+ on a KT266A chipset and PC2100 memory (same OS). I have an SiS chipset based box in another room, I could try if it makes a difference.

Originally posted by Envy Dia:
Evrywhere when you hear about PCI-express from card/chipset manufacturers they’ll boast that you can expect 5x in readback bandwidth vs. AGP.
I think the marketing you are refering to was comparing the current geforces with the previous generation. The GF6 and Quadro 4000’s readback at 5x speed of the previous generation i.e GF5/Quadro 3000’s. They have new hardware support for fast agp readback.

From what I’ve heard there seems to be some issues with readback performance on NVidia’s pci express cards at the moment.

You need to use PDR/PBO for best peformance.

I know of the 5x readback speed on the Quadro4000, although I would rather put it at about 3x.
I haven’t tested the GeForce6800, but I assumed the speedup is only on Quadro boards…