View Full Version : AGP performance

05-11-2001, 03:23 AM
my OpenGL renderer is as fast as the DX8 renderer on a GeForce2 MX PCI.
But on GeForce 1/2/3 AGP, the DX8 renderer is 20-30% faster.
I use Vertex Array Range with AGP memory. (priority 0.5, read 0, write 0).

I also tried with an ATI Radeon, it's so slow that I'll not lose my time talking about that **** (no way to use AGP/video memory).

Any idea to increase performance on AGP cards ?
I only have static data, I don't use GL_FENCE.

05-11-2001, 03:39 AM
Can you use video memory to see if there's a difference (priority of 0.75 -> 1.0f)? If your data is static, how about testing using display lists. You could also try plain old vertex arrays out of interest or CVAs. Obviously VAR should be fastest, though. Maybe, if you're sending huge amounts of data, your data is larger than the allocated AGP memory. I don't know what VAR's behaviour is in that circumstance. Maybe Matt or Cass can help. When you say that the GeForce 1/2/3 OpenGL app is slower than the equivalent DX8 app, how does the speed compare to the GeForce 2 MX performance?

I've just read one of the VAR pdf's that I downloaded from nVidia and it says that you _must_ write data to your arrays sequentially to maximise (memory bandwidth) performance. In one of my apps where I use VAR I copy to the array from a temp array using memcpy() and I've seen other people use that too.

[This message has been edited by ffish (edited 05-11-2001).]

05-11-2001, 04:46 AM
I use VAR in video memory (priority 1.0) on the GeForce2 MX PCI to get the same FPS than DX8.

on AGP GeForce cards, it's a little faster, but I think it's only because of the bandwith (I have an AGP 2x PC).

I'll make other tests ASAP.

05-11-2001, 09:36 AM
On the AGP system, are you putting the VAR in AGP or video? AGP will usually be better.

- Matt

05-11-2001, 12:58 PM
On the AGP system, are you putting the VAR in AGP or video? AGP will usually be better.

Will it? On benchmarking I've done, I've seen that VRAM will be faster than AGP by some noticeable factor, and AGP is faster than system ram by an even more noticeable factor. This is using AllocateMemory() and VertexArrayRange() for static (non-changing) data.

Can you give a better hint on under what conditions AGP will be faster than VRAM?

05-11-2001, 01:05 PM
It provides extra bandwidth. Putting vertices in video memory makes us share bandwidth between rendering and vertex pulling.

What is faster depends a lot on the app and on the system.

For example, P4/i850 systems are oozing with excess system memory bandwidth that the CPU has no way to take advantage of. You might as well use it by pulling vertices from AGP.

Some systems do and some systems don't have AGP fast writes -- essential if you want to write data quickly into video memory. Fast writes are broken or crippled on many chipsets.

Same goes for AGP 2x vs. 4x, PC133 vs. DDR memory, etc.

- Matt

05-11-2001, 01:20 PM
On Geforce2 MX PCI, I use VAR in Video memory.
On Geforce 1/2/3 AGP, I use VAR in AGP memory.
How DX8 manage its memory ? on PCI card, it uses video memory (I get the same FPS using VAR). But on AGP cards, how DX8 do ?

I tried to use a read value of 1 for AGP, but no memory is allocated (but with read = 0, I can allocate 30 MB of AGP memory).
Then I copy the vertices to AGP/Video memory using memcpy().

---- other question -----
Is it my PC or an ATI Radeon is slower than a Geforce1 DDR ?

05-11-2001, 02:00 PM
Originally posted by opla:
Is it my PC or an ATI Radeon is slower than a Geforce1 DDR ?

I don't think that this should be the case. IMO Radeon DDR should beat the GeForce 256 DDR. But I can only say this based on game benchmarks I saw. Can't say for your code.

05-12-2001, 04:07 AM
RadeON just crushes the GF256DDR in Quake 3 which uses vertex arrays.
You may have a "NVidia only" optimised code :p

05-12-2001, 08:37 AM
How do you specify AGP and/or vid memory?

05-12-2001, 09:05 AM
With the parameters to wglAllocateMemoryNV(GLsizei size, GLfloat readFrequency, GLfloat writeFrequency, GLfloat priority); Typical calls are 0 for the read/write frequencies and (as Cass says): (0.25f, 0.75f] for priority for AGP memory, (0.75f, 1.0f] for video memory.

05-14-2001, 12:56 AM
For the ATI Radeon becnhmark, I used VulpineŽ GLmark v1.1 to test the performance.
I have an average FPS of 13.3 on ATI Radeon, and 18.8 on a GeForce DDR (without VAR of course). I have only 10.8 on a GeForce2 MX PCI.

05-14-2001, 02:15 AM
In DX8, when you specifiy POOL_DEFAULT for
your vertex buffer, it's up to the driver
to use either video/AGP/system memory,
depending on whether vertex processing is
in HW or software...

05-14-2001, 06:47 AM
I made more tests on the ATI radeon.
Using Vertex arrays, I have 40 FPS.
but using Display lists, I have 125 FPS !!!

On GeForce1/2, Vertex array are 10 FPS faster than Display lists !!!

Will I have to check the glGetString (GL_VENDOR) to use VA or DL ?

ATI really sucks with extensions. When I use GL_EXT_compiled_vertex_array and GL_EXT_draw_range_elements, I only have 10 FPS !

05-14-2001, 06:57 AM
I feel a wind of stupidity blowing on these boards. This is the second "NV/ATI" war topic, and I don't think this helps OPENGL ADVANCED CODING ! If you want to fight, go on one of those hardware forums.
I'm tired of these closed minded free time coders who just post because they don't have anything else to do.
Some people work and need real help here.


05-14-2001, 08:02 AM
Well, If I speak about both ATI/nVidia, it's because I want my software to run fast on both.

I tried the same test (previous post) with a GeForce DDR, I get 125 FPS with Display list and 125 FPS with VAR (AGP and Video mem).

So ATI Radeon is as fast as a GeForce1, but using Display list only.

That's one of the point of this kind of forum, how to get the maximum FPS on any card.

05-14-2001, 09:10 AM
In our engine we use both vertex arrays and display lists.
Performance is similar using any of the methods, on both GeForces and RadeON.

The RadeON's performance (using any of the two methods) is about the same as the GF2 GTS. The GF1 is litterally crushed by the RadeON, which is about two times faster.

You may have done something ... wrong ?

While working on the engine, I realised a weird thing.
The detonator drivers sometimes "correct" some bad OpenGL programming. This means something which is supposed to work bad works fine on the NV cards but bad on any other cards (ATI, Kyro, etc ...).
This is not really bad, but shows how dangerous it is using only one brand of cards to develop.

[This message has been edited by paddy (edited 05-14-2001).]

05-14-2001, 09:29 AM
I would like to know if I made something wrong.

The programmer of the DX8 renderer of the engine told me that the Radeon is as fast as a GeForce1 (sometimes slower).
I get the same results than him with display list.

I'll try some demos with VA on both Radeon and GeForce ASAP.
I'll need VA soon for dynamic data ...

05-14-2001, 09:37 AM
If you want to compare compiled VA you can start with Quake 3.

05-14-2001, 11:53 AM
Originally posted by paddy:
The detonator drivers sometimes "correct" some bad OpenGL programming. This means something which is supposed to work bad works fine on the NV cards but bad on any other cards (ATI, Kyro, etc ...).

What do you mean by this? Feel free to email me privately if you want...

- Matt

05-15-2001, 12:59 AM
I'm not sure nVidia drivers "corrects" bad code. I made a soft (without OGL extensions) that work fine on nVidia's OpenGL and MS's OpenGL software renderer. But on ATI Radeon, the soft crashes when you close it.

I think it's because I don't call DestroyWindow (I didn't check yet). As usual, there's no info on the ATI web site ...

Anyway, I don't consider ATI driver as a reference for good OpenGL programming (I would like to have an SGI OpenGL implementation for that).

05-16-2001, 02:42 AM
to paddy : I tried Quake3 with gltrace,
and it's not using CVA on a Radeon !
only VA ....

05-16-2001, 04:59 AM
I tried wit hthe last version of gltrace, Quake3 uses CVA ideed ...

05-16-2001, 08:08 AM
I asked to devrel@ati.com why this difference of FPS between DL and VA. They told me to use VA of 1024 verticies max, and it works !!! several VA of 1024 verticies is faster than a LARGE VA (on Radeon).

05-16-2001, 08:14 AM
Interesting info ...
But 1024 vertices is really LOW !
Maybe a driver side patch which reduces big VAs into several smaller ones could do the job.
ATI ? Suggestions ?

05-16-2001, 08:42 AM
Small question : in the nvOpenGLspecs.pdf, nVidia says : "The specification of glDrawElements does not allow optimal performance for some OpenGL implementations, however. In particular, it has no restrictions on the number of indices given," (page 74)

But when I glGetIntegerv(GL_MAX_ELEMENTS_INDICES_WIN, &DrawRangeElementMaxInd);
I have a limit of 65535 indicies ! (on a Radeon, didn't test on a geforce)

The official Opengl 1.2.1 specs says : "Implementations denote recommended maximum amounts of vertex and
index data, whichmay be queried by calling GetIntegerv with the symbolic

if glDrawElements if limited, why a EXT_draw_range_elements ?

05-16-2001, 09:31 AM
I havent used EXT_draw_range_elements, but all software T&L systems I know of typically transform all verticies in a buffer when the buffer is locked. I would assume EXT_draw_range_elements provides a way to indicate that you are only going to use a subrange of the vertex array so that only those verticies will be transformed.

As an example, if you have a vertex array that contains 100 different objects, each composed of 100 verticies, and you tried to lock the array to draw just one of those objects, the system would have to transform 9900 verticies that it will never use. By specifying a range, the system will only transform the 100 verticies that you indicated will be used.