I have searched and read the archives.
I have read “GL_NV_pixel_data_range.txt” (Matt Craighead / NVIDIA).
I have made a lot of progress, but I am still stuck in the ~100 MB / sec range.
What am I trying to do? Pull textures from { sys memory | hard disk | network } and stuff them into textures as fast as possible.
I am running an FX-5950 Ultra card in a 2.53 GHz machine w/ Intel 845G/GL chipset.
Standard glTexSubimage2D with malloc’d ram yields ~100 MB/sec. No surprise here, I don’t think.
My understanding is that the “proper” way to use the PDR extensions for pushing textures on to the card is:
- Allocate memory with glXAllocateMemoryNV with read/write/priority parameters that will hopefully allocate memory in the AGP aperature.
- Enable the extension, notify the driver where the memory is, and how you plan to use it.
- do large block copies into this memory.
- use glTexSubimage2D (or whatever) to get it in a texture. (no borders, pixel transfer operations, or funny formats. GL_BGRA_EXT is what I used)
Does memcpy() qualify for step 3?
SO…when I do all of this, I am able to EITHER get fast glTexSubImage2D performance, OR fast memcpy performance, but not both.
I wrote a program which iterates through all possible parameters for the glXAllocateMemoryNV call (in 0.1 increments), and timed (100 cycles for each) the various operations. (I used glFlushPixelDataRangeNV()) to block after the TexSubimage2D call. I am using two textures and two memory ranges (allocated with glXAllocateMemoryNV as one large chunk) alternating between the two.
Here are two sample timing sequences:
(All tests are for 1024x1024x4 = 4 MB chunks)
glXAllocateMemoryNV parameters: 0.9, 0, 0
Timer stats after 100 iterations:
glTexSubImage : Elapsed time sec: 4 usec:220798
memset : Elapsed time sec: 0 usec:633433
memcpy : Elapsed time sec: 0 usec:811151
glutSwap : Elapsed time sec: 0 usec:5310
or
glXAllocateMemoryNV parameters: 0, 0.9, 0.5
Timer stats after 100 iterations:
glTexSubImage : Elapsed time sec: 0 usec:609059
memset : Elapsed time sec: 0 usec:605422
memcpy : Elapsed time sec: 4 usec:453805
glutSwap : Elapsed time sec: 0 usec:4046
every combination of parameters that I tried gave (more or less) the same results as one of the above two tests…(or returned a NULL pointer).
FAST WRITES ARE NOT CURRENTLY ENABLED. I’m not sure if this matters, and I’m fairly confident that I should be able to achieve better performance without them anyway.
It’s pretty clear to me that memcpy CAN be fast. It’s also fairly clear that in some cases my glTexSubImage2D is fast. My 90ish MB/sec bottleneck has me guessing that I’m going over the PCI bus at some point…
This would make sense if:
When the texSubimage2D is slow, the memory returned by glXAllocateMemoryNV() is just “normal” system memory (hence no fast-path to the card)
When the memcpy is slow, the glXAllocateMemoryNV() is vid mem…hence no fast path for the memcpy()…
Which leaves me wondering why I can’t get AGP memory to be allocated, as I’ve tried all (more or less) combinations of parameters to glXAllocateMemoryNV…
Are fast writes necessary? I guess I’m fairly convinced that it’s time to start upgrading my kernel. When that doesn’t work I will be looking for a motherboard which is supported by the NVidia AGP drivers…and if necessary I’ll run this project on Windows.
Arg!
Thanks, guys…
-Steve