Performance of VA/CVA under Linux/NVidia PC

Hi
While porting my master thesis subdivision modeling program from Win32 to Linux I was very confused with the performance differences of VA/CVA under Linux and Win2k.
It seemed that VA/CVA work very slow under Linux, while the dl performance is nearly equal…
My system is Athlon 650MHz, KX133 Chipset (ASUS K7V) 256 PC133, MSI GeForce2 MX.
cat /proc/driver/nvidia/agp/status is
Enabled, AGPGART, 4x, SBA: Disabled, FW: Enabled - I’m using RedHat 7.1 Kernel 2.4.2, XFree 4.0.3
So I made a little va/cva/dl test program which can downloaded from: http://medo.hit.bg/vatest/vatest.cpp

Any comments about the following fps table are appreciated (Vertices - 30K, Triangles - 30K):
cva/va/dl (fps) = random indices
23,5/22,13/42 - Linux
31/54/47 - Win2k

cva/va/dl (fps) = ordered indices
36,26/24,9/108 - Linux
107/98/171 - Win2k

If you can, please send me results of this program on other linux pcs(and even on windows if they are interesting…)

Thanx in advance
Martin

I tried two machines, and two changes.
One change is to use short indices (instead of your longs)
The other, more important, is to use the nVidia VAR (Vertex Array Range) extension.
If you like, I can send you back the modified program.

The first machine is:
GeForce2 Ultra
VIA Apoollo chipset
P3 731Mhz
nVidia release 23.13
kernel 2.2.14
using nVidia AGP implementation (not AGPGART)
AGP at 4x, no SBA, no FW

legend:
va = vertex array
cva = complied vertex array
dl = display list
var = using vertex array range extension
short = using short indices (long when not specified)

results:
va 55
cva 85
dl 180
var short va 190
var short cva 156
var short rand va 32
var short rand cva 32
short rand va 30
short rand cva 27

Second machine:
GeForce4 Ti4600
i850 chipset
P4 1700Mhz
nVidia release 28.02
kernel 2.2.14
using nVidia AGP implementation (not AGPGART)
AGP at 4x, no SBA, no FW

results:
va 231
cva 186
dl 260
rand va 86
rand cva 35
rand dl 96
short va 230
short cva 200
short dl 260
rand short va 87
rand short cva 35
var short va 250
var short cva 259
var short rand va 36
var short rand cva 36

Comments:

  • nothing beats display lists (I guess you knew that)
  • in my exprience, 23.13 and 28.02 drivers are not much different in this matter
  • in my exprience, GeForce 2 3 and 4 are not much different in AGP speed matters. 3 and 4 have bigger post-T&L vertex cache, but I think your program eliminates this (it doesn’t repeat indices)

Hi,
In fact it seems that VAR is the big deal - on my pc it gives the same performance as the dl. (uint indices). Randomized indices are useless for this benchmark - no one send geometry data which is not used

the version of Moshe Nissim is here: http://medo.hit.bg/vatest/vatest1.cpp
But I still wonder why standard (malloc()) VA/CVA work so slow under linux…

Martin

Originally posted by martin_marinov:
My system is Athlon 650MHz, KX133 Chipset (ASUS K7V) 256 PC133

You have a VIA chipset, try adding to your /etc/modules.conf:

“options NVdriver NVreg_EnableVia4x=1 NVreg_EnableAGPSBA=1 NVreg_EnableAGPFW=1”

Originally posted by tfpsly:
[b] You have a VIA chipset, try adding to your /etc/modules.conf:

“options NVdriver NVreg_EnableVia4x=1 NVreg_EnableAGPSBA=1 NVreg_EnableAGPFW=1”

[/b]

Hi,
thanx for your help, but I have done this since the installation of Nvidia driver - this way I enabled FW. However, my card doesn’t support SB (the via chipset supports it…)

Regards
Martin

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.