VAR problems when using AGP rather than VRAM

imported_MattS · March 18, 2002, 2:12am

I have a problem with VAR which I was hoping someone might have a useful answer to. I have an app which uses VAR to nicely speed up geometry throughput. When I use:

wglAllocateMemoryNV(16000000, 0.0, 0.0, 1.0)

which as I understand it should allocate VRAM I get the speed improvements I would expect to see using VAR. If however I change this call to have 0.5 for priority (AGP mem) the speed drops to about 25% of the former. All other code is the same and the geometry I am using is about 7200 triangles, all static.

I’ve tried other values for the other parameters, and disabling texturing etc, but the speed is still some 25% of VRAM VAR memory. Ordinarily I would assume something is wrong with AGP aperture or the AGP bus speed, however nVidia’s VAR demo runs at the same speed (fast) using either value. I don’t (currently) use fences as I don’t yet need them. I have a GeForce3 and am using driver version 23.11.

Any ideas on what could be causing this?

Thanks very much.

knackered · March 18, 2002, 3:32am

You’re not enabling then disabling VAR every frame, are you?
If you are, then don’t disable var.

wimmer · March 18, 2002, 3:44am

As far as I can tell from my own experiments, this is to be expected in some situations.

If you are writing to VAR memory at the same time than the GPU is rendering, then the bandwidth available to the GPU might not be sufficient to allow full speed rendering.

The speedup in the VAR demo comes mainly from the fact that the VAR demo is doing CPU calculations in parallel to GPU rendering, while when turning VAR off, those calculations have to be done sequentially.

What kind of setup do you have (AGP type, CPU, …) and what do you do apart from rendering (any other calculations)?

Michael

imported_MattS · March 18, 2002, 4:00am

I’m not disabling VAR at all, i.e. all geomtery uses VAR.

I’ve reduced my test case to doing essentially very little except sending primitives to the card. I don’t do any work on the stuff I send to GL.

My PC is a Dell Precision Dual PIII 550 with an Elsa 920 GF3. The AGP is on 2x and I am running W2K with SP2.

With the nvidia demo I disabled the CPU work it had to do to see the effect and the speed improved massively, since I am clearly CPU limited at present. However I don’t understand the huge difference in speeds in my app when those in nVidia’s are negligable even when CPU work is reduced.

When VAR is set to priority 0.5 it is about the same speed as pulling from system memory. However I have called
glGetBooleanv(GL_VERTEX_ARRAY_RANGE_VALID_NV, &bValid);

and it is returning true (though I had to call glEnableClientState(GL_VERTEX_ARRAY); first), but this is true of the nVidia demo also. So as far as I can tell it should be using the hardware grabber.

Cheers for the replies though, much appreciated.

[This message has been edited by MattS (edited 03-18-2002).]

richardve · March 18, 2002, 5:01am

7200 triangles

Hint: VAR is good for sending lots of triangles, and 7200 tris isn’t much…

GPSnoopy · March 18, 2002, 7:40am

I’ve also an AGP2X setup (P3, GF2U).

From tests, I’d say NV_VAR is always a win with static data on VRAM, however static data on AGP was about the same speed as normal vertex arrays (should be better with AGP4X though).

Dynamic data on VRAM should be avoided. Unless you’ve got advanced AGP features such as Fast-Write, writting to VRAM though the AGP bus is quite slow.

Dynamic data on AGP works quite well, got nice speed improvement, espescially when the main memory bandwidth is stressed by the rest of the program.

And as other said, NV_VAR should be used when you’re rendering a lot of triangles (at least 10K of them throught NV_VAR)

knackered · March 18, 2002, 7:45am

I must disagree. I get incredible performance increases using agp over normal vertex arrays.

imported_MattS · March 18, 2002, 8:00am

I will try testing my results with larger numbers of triangles and see if the difference in values come down. Sometimes though we only need to render a small number of primitives and disabling VAR is expensive, so I’d like to avoid it. Is it the number of primitives the extremes in index value that’s most significant?

Most of my geometry will need to be dynamic, which is why I wanted to use AGP memory rather than VRAM. I found that even if I update every vertex in VRAM (with an SIMD based memcpy), it still blows the AGP data out of the water, which I find odd to say the least.

Cheers.

wimmer · March 18, 2002, 8:10am

again, all of this depends very much on the specific setting, even on the type of primitives used (strips or no strips, for example).

If you have only AGP 2x, this could explain the performance drop. AGP 2x has a maximum bandwidth of 512MB/s, whereas your video memory has more than 2GB/s. Storing vertex data in video memory allows the GPU to pull them much faster.

Do you copy data to VAR memory every frame or only once? If every frame, try doing it only once to see how fast you can go…

If you have fastwrites enabled, copying data to VRAM can be very fast (even faster than copying to AGP if the GPU is using the memory simultaneously)…

Michael

Nutty · March 18, 2002, 8:24am

Matts, there is another extension VERTEX_ARRAY_RANGE2, which allows you to disable VAR without the huge speed penalty of issuing a flush. Have a look into that.

Try testing your app on a system with AGP4x, and see if it improves performance. If not, then there must be something wrong in your code.

I get very good performance with AGP, and only a few percent increase further from going to VRAM. Thats with AGP4x on static mesh’s only uploaded once at startup.

Nutty

imported_MattS · March 19, 2002, 7:30am

Thanks for the help.

Unfortunately I haven’t had much time to look into the problem today. I will try with larger/different data sets and see if the differences come down. One thing I did notice in the data is that the indices forming triangles were far apart from each other, e.g. 5, 2100, 3501 and also that adjacent triangles are far apart. I reordered the indices (based on first index and this sped the framerate up by about 15% in both versions). Proving that what you send is significant, but are there any hard and fast rules?

Thnaks again, if I ever solve it I’ll post my findings.