my OpenGL renderer is as fast as the DX8 renderer on a GeForce2 MX PCI.
But on GeForce 1/2/3 AGP, the DX8 renderer is 20-30% faster.
I use Vertex Array Range with AGP memory. (priority 0.5, read 0, write 0).
I also tried with an ATI Radeon, it’s so slow that I’ll not lose my time talking about that **** (no way to use AGP/video memory).
Any idea to increase performance on AGP cards ?
I only have static data, I don’t use GL_FENCE.
Can you use video memory to see if there’s a difference (priority of 0.75 -> 1.0f)? If your data is static, how about testing using display lists. You could also try plain old vertex arrays out of interest or CVAs. Obviously VAR should be fastest, though. Maybe, if you’re sending huge amounts of data, your data is larger than the allocated AGP memory. I don’t know what VAR’s behaviour is in that circumstance. Maybe Matt or Cass can help. When you say that the GeForce 1/2/3 OpenGL app is slower than the equivalent DX8 app, how does the speed compare to the GeForce 2 MX performance?
I’ve just read one of the VAR pdf’s that I downloaded from nVidia and it says that you must write data to your arrays sequentially to maximise (memory bandwidth) performance. In one of my apps where I use VAR I copy to the array from a temp array using memcpy() and I’ve seen other people use that too.
[This message has been edited by ffish (edited 05-11-2001).]
On the AGP system, are you putting the VAR in AGP or video? AGP will usually be better.
Will it? On benchmarking I’ve done, I’ve seen that VRAM will be faster than AGP by some noticeable factor, and AGP is faster than system ram by an even more noticeable factor. This is using AllocateMemory() and VertexArrayRange() for static (non-changing) data.
Can you give a better hint on under what conditions AGP will be faster than VRAM?
It provides extra bandwidth. Putting vertices in video memory makes us share bandwidth between rendering and vertex pulling.
What is faster depends a lot on the app and on the system.
For example, P4/i850 systems are oozing with excess system memory bandwidth that the CPU has no way to take advantage of. You might as well use it by pulling vertices from AGP.
Some systems do and some systems don’t have AGP fast writes – essential if you want to write data quickly into video memory. Fast writes are broken or crippled on many chipsets.
Same goes for AGP 2x vs. 4x, PC133 vs. DDR memory, etc.
On Geforce2 MX PCI, I use VAR in Video memory.
On Geforce 1/2/3 AGP, I use VAR in AGP memory.
How DX8 manage its memory ? on PCI card, it uses video memory (I get the same FPS using VAR). But on AGP cards, how DX8 do ?
I tried to use a read value of 1 for AGP, but no memory is allocated (but with read = 0, I can allocate 30 MB of AGP memory).
Then I copy the vertices to AGP/Video memory using memcpy().
---- other question -----
Is it my PC or an ATI Radeon is slower than a Geforce1 DDR ?
Originally posted by opla: Is it my PC or an ATI Radeon is slower than a Geforce1 DDR ?
I don’t think that this should be the case. IMO Radeon DDR should beat the GeForce 256 DDR. But I can only say this based on game benchmarks I saw. Can’t say for your code.
With the parameters to wglAllocateMemoryNV(GLsizei size, GLfloat readFrequency, GLfloat writeFrequency, GLfloat priority); Typical calls are 0 for the read/write frequencies and (as Cass says): (0.25f, 0.75f] for priority for AGP memory, (0.75f, 1.0f] for video memory.
For the ATI Radeon becnhmark, I used Vulpine® GLmark v1.1 to test the performance.
I have an average FPS of 13.3 on ATI Radeon, and 18.8 on a GeForce DDR (without VAR of course). I have only 10.8 on a GeForce2 MX PCI.
In DX8, when you specifiy POOL_DEFAULT for
your vertex buffer, it’s up to the driver
to use either video/AGP/system memory,
depending on whether vertex processing is
in HW or software…
I feel a wind of stupidity blowing on these boards. This is the second “NV/ATI” war topic, and I don’t think this helps OPENGL ADVANCED CODING ! If you want to fight, go on one of those hardware forums.
I’m tired of these closed minded free time coders who just post because they don’t have anything else to do.
Some people work and need real help here.
In our engine we use both vertex arrays and display lists.
Performance is similar using any of the methods, on both GeForces and RadeON.
The RadeON’s performance (using any of the two methods) is about the same as the GF2 GTS. The GF1 is litterally crushed by the RadeON, which is about two times faster.
You may have done something … wrong ?
EDIT
While working on the engine, I realised a weird thing.
The detonator drivers sometimes “correct” some bad OpenGL programming. This means something which is supposed to work bad works fine on the NV cards but bad on any other cards (ATI, Kyro, etc …).
This is not really bad, but shows how dangerous it is using only one brand of cards to develop. EDIT
[This message has been edited by paddy (edited 05-14-2001).]
The programmer of the DX8 renderer of the engine told me that the Radeon is as fast as a GeForce1 (sometimes slower).
I get the same results than him with display list.
I’ll try some demos with VA on both Radeon and GeForce ASAP.
I’ll need VA soon for dynamic data …
Originally posted by paddy: The detonator drivers sometimes “correct” some bad OpenGL programming. This means something which is supposed to work bad works fine on the NV cards but bad on any other cards (ATI, Kyro, etc …).
What do you mean by this? Feel free to email me privately if you want…