NV_pixel_data_range slow in FX5650go

hi,
i am currently implementing image filters on nvidia hardware (see cgshaders forum).
as the last step in this i want to read the results back from the framebuffer. to do that i compared reading without PDR and glReadPixels() and reading with a read-PDR and glReadPixels. i got no difference between the both methods.

mReadBackBuffer = (uint8*)wglAllocateMemoryNV( mInputImage->getHeight()*mInputImage->getWidth()*3*sizeof(unsigned char),
                                                         1.0f,
                                                         0.0f,
                                                         1.0f);

.
.
.
glEnableClientState(GL_READ_PIXEL_DATA_RANGE_NV);
glPixelDataRangeNV( GL_READ_PIXEL_DATA_RANGE_NV,
mInputImage->getHeight()*mInputImage->getWidth()3sizeof(unsigned char),
mReadBackBuffer);
glReadPixels(0,0,mInputImage->getWidth(), mInputImage->getHeight(), GL_RGB, GL_UNSIGNED_BYTE, mReadBackBuffer);
glFlushPixelDataRangeNV(GL_READ_PIXEL_DATA_RANGE_NV);
glDisableClientState(GL_READ_PIXEL_DATA_RANGE_NV);

where mInputImage is the image which was the source to the filterprocess.

i use a 2,4ghz p4, 512mb ram, geforceFX 5650go (forceware 52.70) and windowsXP SP1

could someone tell me what i do wrong or if this is a driver issue.

thx

[This message has been edited by Chris Lux (edited 12-01-2003).]

[This message has been edited by Chris Lux (edited 12-01-2003).]

[This message has been edited by Chris Lux (edited 12-01-2003).]

For best performance use BGRA not RGB.

You can see what difference PDR makes using this benchmark written by Matt Craighead. (Source included) http://www.adrian.lark.btinternet.co.uk/PixPerf.zip

Run it with the following command line options:

pixperf -read -type ubyte -format bgra -size 128
pixperf -read -type ubyte -format bgra -size 128 -readpdr

Originally posted by Adrian:
[b]For best performance use BGRA not RGB.

pixperf -read -type ubyte -format bgra -size 128
pixperf -read -type ubyte -format bgra -size 128 -readpdr[/b]

ok i testet it (results below). the only case PDR makes a big difference is at 1024x1024 with BGRA format. this is strange for me, because (as you can see) even at 512x512 there is no big change, i would say that PDR has no effect on other formats than BGRA.

C:_Studium\pixperf\PixPerf\Release>pixperf -read -type ubyte -format rgb -size 512
157.216760 copies/sec
41.213432 Mpixels/sec

C:_Studium\pixperf\PixPerf\Release>pixperf -read -type ubyte -format rgb -size 512 -readpdr
156.584671 copies/sec
41.047732 Mpixels/sec

C:_Studium\pixperf\PixPerf\Release>pixperf -read -type ubyte -format bgr -size 512
155.623260 copies/sec
40.795704 Mpixels/sec

C:_Studium\pixperf\PixPerf\Release>pixperf -read -type ubyte -format bgr -size 512 -readpdr
158.222888 copies/sec
41.477180 Mpixels/sec

C:_Studium\pixperf\PixPerf\Release>pixperf -read -type ubyte -format rgba -size 512
156.652025 copies/sec
41.065388 Mpixels/sec

C:_Studium\pixperf\PixPerf\Release>pixperf -read -type ubyte -format rgba -size 512 -readpdr
155.470519 copies/sec
40.755664 Mpixels/sec

C:_Studium\pixperf\PixPerf\Release>pixperf -read -type ubyte -format bgra -size 512
156.723139 copies/sec
41.084032 Mpixels/sec

C:_Studium\pixperf\PixPerf\Release>pixperf -read -type ubyte -format bgra -size 512 -readpdr
181.323679 copies/sec
47.532916 Mpixels/sec

C:_Studium\pixperf\PixPerf\Release>pixperf -read -type ubyte -format bgra -size 1024 -readpdr
166.592028 copies/sec
174.684400 Mpixels/sec

You won’t get that much of a speed increase from PDR, 10-20% is about right. The main advantage of PDR is that it enables readpixels to run asynchronously, so you can go and do other things with the cpu while you wait for readpixels to finish.

Your 1024 test results look wrong, way too high. Is 1024 bigger than your vertical screen resolution?

50 MPixels/sec is about the maximum you will see on an Nvidia card. If you think that’s slow you should benchmark readpixels on an ATI. It’s between one half and a third of the speed.

Anyone know if the forthcoming PCI Express will speed up frame buffer reads at all?

[This message has been edited by Adrian (edited 12-01-2003).]

Originally posted by Adrian:

Your 1024 test results look wrong, way too high. Is 1024 bigger than your vertical screen resolution?

no it isnt (1920x1200)

here results from a p4 2.2ghz geforce4 ti4600:

D:_Temp>PixPerf -read -type ubyte -format rgb -size 512
171.229259 copies/sec
44.886724 Mpixels/sec

D:_Temp>PixPerf -read -type ubyte -format rgb -size 512 -readpdr
171.689437 copies/sec
45.007356 Mpixels/sec

D:_Temp>PixPerf -read -type ubyte -format rgba -size 512
169.617180 copies/sec
44.464128 Mpixels/sec

D:_Temp>PixPerf -read -type ubyte -format rgba -size 512 -readpdr
169.666435 copies/sec
44.477040 Mpixels/sec

D:_Temp>PixPerf -read -type ubyte -format bgr -size 512
171.207978 copies/sec
44.881144 Mpixels/sec

D:_Temp>PixPerf -read -type ubyte -format bgr -size 512 -readpdr
171.416678 copies/sec
44.935852 Mpixels/sec

D:_Temp>PixPerf -read -type ubyte -format bgra -size 512
172.449560 copies/sec
45.206616 Mpixels/sec

D:_Temp>PixPerf -read -type ubyte -format bgra -size 512 -readpdr
184.243568 copies/sec
48.298344 Mpixels/sec

D:_Temp>PixPerf -read -type ubyte -format bgra -size 1024
44.146521 copies/sec
46.290984 Mpixels/sec

D:_Temp>PixPerf -read -type ubyte -format bgra -size 1024 -readpdr
163.300306 copies/sec
171.232784 Mpixels/sec

all at 1600x1200 and forceware 52.16.

[This message has been edited by Chris Lux (edited 12-01-2003).]

I just tried 1024 with readpdr and got a blue screen, DRIVER CORRUPTED MMPOOL.

All other combinations work fine.

another test. this time to get a texture as fast as possible to the grahicscard (for a AR-application). to do this the texture is read from a PDR.

the results show, that if PDR is used in all cases but the BGRA case textureloading lasts much longer than without.

512x512 textur:
[ul][li](gfFX5650go 128mb agp4x)[/li]3 kanäle rgb pdr 170ms nonpdr 3.6ms
3 kanäle bgr pdr 114ms nonpdr 3.6ms
4 kanäle rgba pdr 58ms nonpdr 4.1ms
4 kanäle bgra pdr 1.5ms nonpdr 4.1ms

[li](gf4Ti4600 128mb agp4x)[/li]3 kanäle rgb pdr 138ms nonpdr 2.0ms
3 kanäle bgr pdr 95ms nonpdr 2.0ms
4 kanäle rgba pdr 49ms nonpdr 2.4ms
4 kanäle bgra pdr 1.2ms nonpdr 2.0ms[/ul]

another issue is, that if PDR is used to load images, and a imagelibrary like DevIL is used to load the images, a memcpy has to be done from the library to the PDR, which means additional traffic. i hope to find out mor about this issue (or that arb_pixel_buffer_object comes soon).

p.s. i know what is written about accelerated formats in the pdr-spec, but shouldn’t be the results for unaccelerated formats be at least equally fast as the nonpdr attempt?

[This message has been edited by Chris Lux (edited 12-02-2003).]

[This message has been edited by Chris Lux (edited 12-02-2003).]