AGP pixel upload/download

Has anyone been successful in using the pixel data range extension to work? What are the bandwidth numbers? Does AGP (1x, 2x, 4x, 8x) make a difference?

-Won

I tried it out just last night, as a matter of fact. My glReadPixels() throughput went from 138 MPixels/sec to 745 MPixels/sec! It’s a rather contrived benchmark at the moment, though, as all it does is render a single triangle and then do a glReadPixels() on the whole framebuffer. Nothing is ever done with the result of this readback.

I used (0, 0, 1) as the arguments to wglAllocateMemory(). Other settings result in much poorer performance. I haven’t tried to fiddle with my AGP speed yet, nor have I benchmarked glDrawPixels() performance.

– Tom

That is somewhat encouraging. With those arguments to wglAllocateMemory, aren’t you getting video memory? In which case these results aren’t all that interesting.

138 MPixels/sec => 552 MBytes/sec (assuming 32-bit pixels) is actually pretty good for AGP.

-Won

I now have it working, readpixels performance went from 40 Million pixels/sec to 50 Million pixels/sec. I don’t understand why my figures are so much lower. (I haven’t tried the 0,0,1 combination yet though) I have an XP2000,AGP4x + GF4600, latest detonators. I am using the accelerated format, BGRA and my memory is 32byte alligned. Tom what is the spec of your machine? can you release the exe so we can benchmark with it.

The async aspect is really cool!

[This message has been edited by Adrian (edited 12-06-2002).]

Does anyone know, where I can find the latest header file for OGL extensions, that contains the latest ARB ones and the whole new bunch from NVIDIA?

DL link would be veryy nice, thanks .

Diapolo

Oops. I made a typo: the numbers I gave you aren’t MPixels/sec, they’re MBytes/sec. MPixels/sec would be four times lower, i.e. went from about 35M to 180M.

Yeah, (0, 0, 1) would give you video memory. I wouldn’t say that it isn’t interesting, though. I’m sure there’s plenty of interesting things you can do with the data without ever having to touch it with the CPU. (Render-to-vertex-array?)

Adrian, my machine is an XP1600, GF4 4200, AGPx4, latest drivers. Your 40 MPixels/sec sounds about right, and the smaller increase you get with PDR is probably due to your using AGP memory instead of video memory. Which wglAllocateMemory() arguments did you use?

– Tom

I used 1,0,1 and 1,0,.75 and they gave me similar results.

Initially I had problems getting the extension to work.

The first problem was that I was initialising my window with
glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGBA);
This didn’t give me an alpha buffer so readpixels fell back to unaccelerated synchronous mode.

glutInitDisplayMode(GLUT_SINGLE| GLUT_RGBA); does give an alpha buffer as does
glutInitDisplayMode(GLUT_DOUBLE| GLUT_RGBA |GLUT_ALPHA);

The second problem was that I was using glFlushPixelDataRangeNV as a flush to start the read pixels. It actually behaves more like glfinish, it hangs until the readpixels has finished. To start the readpixels asynchronously I should have used glflush.

Just thought I’d mention it in case others encounter the same problems.

I think there is confusion about the arguments to wglAllocateMemoryNV. The read frequencies and write frequncies are NOT how often the video card will read or write the data. They say how often the program itself is going to read and write the data. Both values should be 0 in almost all cases, and I remember reading that anything above 0.25 for either value forces it to use system memory.

Well, about the read pixels to video memory not being interesting: I certainly agree that this is a useful feature. What isn’t all that interesting is how fast it is. It is pretty darn fast, but is it as fast as you’d expect it to be considering the bus its going over? I would almost expect those numbers to be significantly higher. Besides, my particular application would require AGP downloads for it to be useful (the data needs to get streamed out of the computer after some processing).

Still, anyone with AGP 8x?

-Won

Originally posted by Diapolo:
[b]Does anyone know, where I can find the latest header file for OGL extensions, that contains the latest ARB ones and the whole new bunch from NVIDIA?

DL link would be veryy nice, thanks .

Diapolo[/b]

You can find one here: http://cvs1.nvidia.com/OpenGL/include/glh/GL/glext.h

Great, thank you very much .

Diapolo

Originally posted by Coriolis:
Both values should be 0 in almost all cases, and I remember reading that anything above 0.25 for either value forces it to use system memory.

Although readpixels is faster with 0,0,1, any processing(by the cpu) on the data is extremely slow since it is in video memory. The best overall speed comes from using combinations that allocate system memory. Unsurprisingly if I just use malloc I get the same overall speed. I can’t see how I can benefit from using wglAllocateMemoryNV with normal usage of readpixels.

[This message has been edited by Adrian (edited 12-09-2002).]

From the extension spec:

*   How should an application allocate its PDR memory?

    The app should use wglAllocateMemoryNV, even for a read PDR in
    system memory.  Using malloc may result in suboptimal
    performance, because the driver will not be able to choose an
    optimal memory type.  For ReadPixels to system memory, you might
    set a read frequency of 1.0, a write frequency of 0.0, and a
    priority of 1.0.  The driver might allocate PCI memory, or
    physically contiguous PCI memory, or cachable AGP memory, all
    depending on the performance characteristics of the device.
    While memory from malloc will work, it does not allow the driver
    to make these decisions, and it will certainly never give you AGP
    memory.

    Write PDR memory for purposes of streaming textures, etc. works
    exactly the same as VAR memory for streaming vertices.  You can,
    and in fact are encouraged to, use the same circular buffer for
    both vertices and textures.

    If you have different needs (not just streaming textures or
    asynchronous readbacks), you may want your pixel data in video
    memory.

In other words, malloc() will guarantee that you will never get faster than PCI readback speeds.

I put these details in the spec for a reason…

  • Matt

Originally posted by Tom Nuydens:
b[/b]

Right, this is the primary intended use of read PDR in vidmem. Of course, it’s pretty difficult to use it this way unless you also have float buffers; 8-bit vertices aren’t all that interesting. 16-bit half float vertices, on the other hand…

I would hope you could get better than 745 MB/s doing PDR ReadPixels to vidmem, depending on how much video memory bandwidth you have?

  • Matt

>>>The driver might allocate PCI memory, or
physically contiguous PCI memory, or cachable AGP memory, all
depending on the performance characteristics of the device.<<<

The first one means that the allocated space may not be contiguous?
cacheable AGP memory means what? You mean the video card will be doing the caching? AFAIK, AGP memory is not cached by the system cache, which I find a little odd.

I should really ask some of this on hardware groups

V-man

Originally posted by mcraighead:
[b] Right, this is the primary intended use of read PDR in vidmem. Of course, it’s pretty difficult to use it this way unless you also have float buffers; 8-bit vertices aren’t all that interesting. 16-bit half float vertices, on the other hand…

  • Matt[/b]

8-bit verticies aren’t interesting, though 4x8-bit as a displacement fed into a vertex program can definitely be interesting.

~Eric

Originally posted by V-man:
The first one means that the allocated space may not be contiguous?

Not necessarily physically contiguous. Most memory isn’t.

Originally posted by V-man:
cacheable AGP memory means what? You mean the video card will be doing the caching?

No, AGP memory where the pages are marked as cacheable by the CPU. Most AGP memory is marked write-combined and uncacheable; in this case, you’d want it to be cacheable.

  • Matt

You can benchmark glReadPixels performance with the Pixel Data Range extension using this app written by Matt Craighead. (Source included)
http://planet3d.demonews.com/PixPerf.zip

Run it with the following command line options:

pixperf -read -type ubyte -format bgra -size 128
pixperf -read -type ubyte -format bgra -size 128 -readpdr

On my GF4600 XP2000 AGP4x, latest detonators:
42.5 MPixels/sec without
50.7 MPixels/sec with the extension.

I would be particularly interested to see some AGP8x numbers.

[This message has been edited by Adrian (edited 12-12-2002).]

HI

I run the app with following results:

GF4600 XP2400 AGP4x, DET 41.03:
41.6 MPixels/sec without
50.95 MPixels/sec with the extension.

Bye
ScottManDeath

On my other comp GF2 GTS P3 700 AGP2x, latest detonators:
34.1 MPixels/sec without
44.5 MPixels/sec with the extension.