FYI: Readback performance, lots of it

I’ve conducted a little data gathering on another forum, and have collected a lot of readback performance measurements I’d like to share.

Some of the systems were overclocked, so expect the numbers to be a little noisy. I tried to select the “best” measurements for each graphics chipset/mobo chipset combo, whenever I had duplicates. A “good” measurement, in this case, is defined as being generated while running everything at stock speed and with sufficient system information available to figure out the chipset. As you’ll see, the motherboard chipset can make a difference.

The quoted bandwidth figures are for color reads using plain glReadPixels to system memory, from a 1024x768 region (a fullscreen window’s complete viewport). The format/type is either RGBA/unsigned byte or BGRA/unsigned byte, whichever is faster for the given chip. The links lead to more detailed results, including depth and stencil readback performance and small-region readbacks.

I’d like to thank the 3dcenter.org community for collecting and providing all this data.

On with it.

1)Radeon 9500/9700/9800 series/AGP
1a) 90MB/s on VIA K8T800
1b) 90MB/s on SiS 746

2)Radeon X800 series/AGP
2a) 80MB/s on NVIDIA NForce 2
2b) 90MB/s on NVIDIA NForce 3
2c) 100MB/s on VIA KT400
2d) 100MB/s on Intel i855
2e) 120MB/s on VIA K8T800
2f) 120MB/s on Intel i845
2g) 120MB/s on Intel i850
2h) 130MB/s on Intel i865

3)Geforce <=FX series/AGP
3a) 180MB/s on Intel i865
3b) 190MB/s on SiS 746

4)Geforce 6600 series/AGP via HSI
4a) 150MB/s on NVIDIA NForce 2
4b) 160MB/s on VIA KT600

5)Geforce 6800 series/AGP
5a) 160MB/s on NVIDIA NForce 2
5b) 1070MB/s on NVIDIA NForce 3 (NOTE: heavily overclocked)

6)Radeon X700 series/PCIe
6a) 320MB/s on NVIDIA NForce 4

7)Radeon X800/X850 series/PCIe
7a) 180MB/s on Intel i915
7b) 230MB/s on VIA K8T890
7c) 350MB/s on NVIDIA NForce 4

8)Geforce 6800 series/PCIe
8a) 900MB/s on Intel i915
8b) 920MB/s on NVIDIA NForce 4

9)Geforce 6600 series/PCIe
9a) 320MB/s on NVIDIA NForce 4 !w only four lanes!
9b) 820MB/s on NVIDIA NForce 4 (full 16 lanes)

Sadly missing are Geforce PCX cards. Anybody got one of these? :slight_smile:

Thanks for the numbers, Zeckensack. A painful reminder that my Geforce FX is beginning to show its age.

Entries 8 and 9 are particularly encouraging, roughly a 4x improvement over the FX/AGP setup. The disparity in performance between lane number configurations is interesting–looks like 16 is a good number.

I guess it’s about time to break the old piggy bank … or do without food next month :wink:

Cool, 10 times more bandwidth, if i upgrade my pc now!

Really nice to see, that PCIe does in fact improve speed, a few months ago every magazine said “PCIe will not improve speed”. Well, it won´t improve speed in general, but certainly some effects will be possible at good framerates, at last.

Jan.

how fast is a old SGI, for example a onyx 2 with a IR? the is AFAIK very fast.

From memory (and it’s been a while) with Infinite Reality you couldn’t get much more than about 320 MB/sec readback under ideal conditions, it’s an RBUS limitation in the graphics system so even copypixels or copytexsubimage for example went back through the GEs via the RBUS for any potential pixel pipeline processing and format conversion so you still had that limit.

Some of these numbers are incredible and that’s all the way back to the host so it’s pretty impressive stuff.

A lot of the stuff I was using readback for on Infinite Reality for you don’t even need to move date for on these cards, you can just draw straight to a texture for example and of course there’s no comparrison on functionality. PC cards kick ass these days, Infinite Reality doesn’t get a look in, but it’s a very old system, it was good in it’s day.

Using double PBO buffer I can achive 320MB/sec on my system:

CPU P4 2.8 HT
6800U AGP
AGP 8x
MB ABIT IC7-G (Intel 875 chipset)

yooyo

Depending on useage a PBO may eliminate the copy altogether, it’s designed to help with that sort of thing.

Originally posted by Jan:
Really nice to see, that PCIe does in fact improve speed
It is not pcie that is making the difference, it is the fact that the geforce fx has new hardware support for fast readback. Look at the figures for 5b. The fact the system is overclocked wont be making that much difference.

I already posted the massive increase in readback speed with the geforcefx series in June last year. Check the last post in this thread. Note the numbers are from an AGP system.
http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=012129

Presumably NVidia’s marketing department only mentioned the readback speed improvement in the quadro marketing since it wasnt deemed to be important for the average consumer.

Originally posted by Adrian:
It is not pcie that is making the difference, it is the fact that the geforce fx has new hardware support for fast readback. Look at the figures for 5b. The fact the system is overclocked wont be making that much difference.
Typo? You probably meant the Geforce 6.
According to my measurements, the Geforce FX series has the same readback characteristics as Geforce 3 and Geforce 4MX. The big improvement came with the 6800 – AGP or not.

However, PCIe appears to have improved things a bit for ATI, where AGP readbacks are still very slow.

And I’m a bit disappointed that the Geforce 6600/AGP (via bridge chip) didn’t inherit the same readback speed as the Geforce 6800/AGP.

Originally posted by marco:
how fast is a old SGI, for example a onyx 2 with a IR? the is AFAIK very fast.
Don’t know about the Infinite Reality, but keeping with “Pro” cards, I hit close to 600MB/s on a Wildcat VP560 with a VIA KT266A chipset.
I can’t produce full details (on a more up-to-date platform) right now because I’m moving house and Wildcat isn’t with me atm.

Originally posted by zeckensack:
Typo? You probably meant the Geforce 6.
Sorry, yes I meant the GF6.

Originally posted by zeckensack:

9)Geforce 6600 series/PCIe
9a) 320MB/s on NVIDIA NForce 4 !w only four lanes!
9b) 820MB/s on NVIDIA NForce 4 (full 16 lanes)

Hmm… do I understand correctly:

Geforce 6600 series/pci-e should have about the same readback performance as geforce 6800 series/pci-e? That “bad” result above was only caused by using pci-e 4x slot instead of 16x slot or something like that?

Thanks!

Great data!
Just run it on a couple of more unusual machines, dual xeon on 7275/6300, which gets a peak of 145MPix/sec with BGRA with a PCIe 6600 ultra.
And more unusually a dothan 2.2GHz on 82855/6300 on AGP4x with AGP 6600 ultra getting, would you believe, 31MPix/sec peak…

Urg.

Oh well, they both do their jobs well, roll on a competent dothan (or even xeon) chipset!

Originally posted by Adrian:

resumably NVidia’s marketing department only mentioned the readback speed improvement in the quadro marketing since it wasnt deemed to be important for the average consumer.

Well, ‘they’ (meaning everyone in NVidia whom I asked back then) didn’t want to admit anything about the standard 6 series, so I would say it was more a case of wanting to assist the sales of quadro rather than anything else (which I guess is marketing…)

Of course, the cat was really out of the bag a while ago, even though there was a lack of public disclosure in general, and the results do vary a reasonable amount, the 6 series really does rock (technical term that :wink: in a number of interesting ways that are not really mainstream.