PDA

View Full Version : glVertexArrayRangeNV performance



list67
05-13-2003, 12:50 PM
Hi. I have two questions.

1) I want to know if I should expect improved performance by using glVertexArrayRangeNV as described in the following sentences. A chunk of memory is allocated using wglAllocateMemoryNV. Every frame, glVertexArrayRangeNV is called once. Each frame, the following sequence occurs one or more times:

glEnableClientState(GL_VERTEX_ARRAY_RANGE_NV);
memcpy(nvArray, systemMemoryArray, numSystemMemoryBytes);
glVertexPointer(nvArray); // float verticies
glDrawElements(indicies); // short indicies
glFlushVertexArrayRangeNV();
glDisableClientState(GL_VERTEX_ARRAY_RANGE_NV);

1b) If there is improved performance with the above, then why would a programmer be required to write the extra code? Why wouldn't the copy from system memory to AGP memory be done automatically for all code that calls glVertexPointer and glDrawElements (assuming that the hardware supports glVertexArrayRangeNV)?


2) I read that arrays allocated using wglAllocateMemoryNV have slower access times than system memory arrays, and that a program should have a duplicate array in system memory if the array is accessed frequently. Is this still true?

Thanks.

Mike

Jan
05-13-2003, 01:22 PM
That

memcpy(nvArray, systemMemoryArray, numSystemMemoryBytes);

is not done every frame! Only dynamic data has to be updated. All the static stuff is uploaded once and never again. And dynamic stuff that didnīt change over the last frame doesnīt have to be uploaded either.

Anyway, i never really experienced any speedup, although i have only static data. However i think there is an improvement if you have really lots of vertices. (Meaning some 10 or 100 thousands or so). If you donīt have enough vertices, you are not bus-limited ;-)

But from some posts other users posted, i have the impression, that ARB_vertex_object or whatever its name is, is faster than NVs range extension anyway. So maybe you should try that out. And itīs an ARB extension!

So to answer your questions:
1) By copying it yourself, you can control when something is updated + what part is updated (so not everything is always updated, even if 90% didnīt change). Also if you use fences there is no way the driver could know what you really want to do next.

2) Yes, itīs still true. And certainly (unfortuanataly) it will still be true in 20 years (although i hope it wonīt).

Jan.

jorge1774
05-13-2003, 10:13 PM
Hi!

Never use that:

glEnableClientState(GL_VERTEX_ARRAY_RANGE_NV);
glDisableClientState(GL_VERTEX_ARRAY_RANGE_NV);

Instead use :

glEnableClientState(GL_VERTEX_ARRAY_RANGE_WITHOUT_ FLUSH_NV);
glDisableClientState(GL_VERTEX_ARRAY_RANGE_WITHOUT _FLUSH_NV);

Read the opengl nvidia extension for more info.

See you.

jorge1774
05-13-2003, 10:20 PM
And one more thing. You don't have to enable/disable vertex array range except if you use glBegin/glEnd and standard opengl routines.

In my computer, (800 mhz), not to enable/disable array range saves me about 10% of my process, so you have to try

See you

velco
05-13-2003, 11:28 PM
Originally posted by list67:

memcpy(nvArray, systemMemoryArray, numSystemMemoryBytes);


Do not copy data each frame. If you have to, you'd better use plain vertex arrays - the performance will be the same, but no meddling with extensions.



1b) If there is improved performance with the above, then why would a programmer be required to write the extra code? Why wouldn't the copy from system memory to AGP memory be done automatically for all code that calls glVertexPointer and glDrawElements (assuming that the hardware supports glVertexArrayRangeNV)?


The driver does exactly this - copies the data to GART memory. This is the only memory an AGP card can access (besides its own video memory). That's why the above copying does not buy you anything over plain vertex arrays.



2) I read that arrays allocated using wglAllocateMemoryNV have slower access times than system memory arrays, and that a program should have a duplicate array in system memory if the array is accessed frequently. Is this still true?

Yes. GART memory is not cached (because AGP does not provide cache coherence). Thus reads is *very* slow and writes should be sequential, so the CPU write-combining amortizes the memory access overhead.

~velco

Ysaneya
05-14-2003, 12:45 AM
Do not copy data each frame. If you have to, you'd better use plain vertex arrays - the performance will be the same, but no meddling with extensions.


Not always true. In a 100% dynamic, but multi-pass configuration, you'd better use streaming to agp/video memory and reuse the vertices for the next pass. Saves a lot of bandwidth.

Y.

V-man
05-14-2003, 06:16 AM
>>>>Yes. GART memory is not cached (because AGP does not provide cache coherence). Thus reads is *very* slow and writes should be sequential, so the CPU write-combining amortizes the memory access overhead.

~velco<<<<

This I dont understand. Why the hell is this not cached and can't it be made cacheable?

Someone said that it could but Im not sure if he knew what he was talking about.

velco
05-14-2003, 06:51 AM
Originally posted by V-man:
>>>>Yes. GART memory is not cached (because AGP does not provide cache coherence). Thus reads is *very* slow and writes should be sequential, so the CPU write-combining amortizes the memory access overhead.

~velco<<<<

This I dont understand. Why the hell is this not cached and can't it be made cacheable?

Someone said that it could but Im not sure if he knew what he was talking about.



It could be made cacheable, of course (it's simply a bit in the PTE), but with disastrous results. GART memory is read/written by both the CPU and the GPU. If they cache accesses to GART memory there MUST be a mechanism to determine cache line ownership - like MESI and variants on SMP systems. The AGP bus provides no such mechanism.

Further info in AGP 2.0 spec, "2.4 Platform Dependencies"

~velco

V-man
05-15-2003, 05:25 AM
So if it was cacheable, then GPU and CPU would have to behave as a SMP system. IC

Well, couldn't to GPU send a command to the CPU telling it to spit out the cache back to RAM just before it begins reading from AGP memory?

Or perhaps this could be done inside our program.

velco
05-15-2003, 06:34 AM
Originally posted by V-man:
So if it was cacheable, then GPU and CPU would have to behave as a SMP system. IC

Well, couldn't to GPU send a command to the CPU telling it to spit out the cache back to RAM just before it begins reading from AGP memory?

Or perhaps this could be done inside our program.



Manually maintained cache-coherence ? Well, it could be done, in principle. Many common PCI devices work this way, i.e. the CPU explicitly flushes the cache to memory before initiating bus-master read from the device (read from device point POV), so the device sees current data. Likewise on device write to memory the CPU invalidates its own cache so it reads what the device has written, instead of the stale data in its cache.

Dunno why drivers/cards are not implemented that way.

~velco