AGP performance

opla · May 11, 2001, 2:23am

my OpenGL renderer is as fast as the DX8 renderer on a GeForce2 MX PCI.
But on GeForce 1/2/3 AGP, the DX8 renderer is 20-30% faster.
I use Vertex Array Range with AGP memory. (priority 0.5, read 0, write 0).

I also tried with an ATI Radeon, it’s so slow that I’ll not lose my time talking about that **** (no way to use AGP/video memory).

Any idea to increase performance on AGP cards ?
I only have static data, I don’t use GL_FENCE.

ffish · May 11, 2001, 2:39am

Can you use video memory to see if there’s a difference (priority of 0.75 -> 1.0f)? If your data is static, how about testing using display lists. You could also try plain old vertex arrays out of interest or CVAs. Obviously VAR should be fastest, though. Maybe, if you’re sending huge amounts of data, your data is larger than the allocated AGP memory. I don’t know what VAR’s behaviour is in that circumstance. Maybe Matt or Cass can help. When you say that the GeForce 1/2/3 OpenGL app is slower than the equivalent DX8 app, how does the speed compare to the GeForce 2 MX performance?

I’ve just read one of the VAR pdf’s that I downloaded from nVidia and it says that you must write data to your arrays sequentially to maximise (memory bandwidth) performance. In one of my apps where I use VAR I copy to the array from a temp array using memcpy() and I’ve seen other people use that too.

[This message has been edited by ffish (edited 05-11-2001).]

opla · May 11, 2001, 3:46am

I use VAR in video memory (priority 1.0) on the GeForce2 MX PCI to get the same FPS than DX8.

on AGP GeForce cards, it’s a little faster, but I think it’s only because of the bandwith (I have an AGP 2x PC).

I’ll make other tests ASAP.

mcraighead · May 11, 2001, 8:36am

On the AGP system, are you putting the VAR in AGP or video? AGP will usually be better.

Matt

imported_jwatte · May 11, 2001, 11:58am

On the AGP system, are you putting the VAR in AGP or video? AGP will usually be better.

Will it? On benchmarking I’ve done, I’ve seen that VRAM will be faster than AGP by some noticeable factor, and AGP is faster than system ram by an even more noticeable factor. This is using AllocateMemory() and VertexArrayRange() for static (non-changing) data.

Can you give a better hint on under what conditions AGP will be faster than VRAM?

mcraighead · May 11, 2001, 12:05pm

It provides extra bandwidth. Putting vertices in video memory makes us share bandwidth between rendering and vertex pulling.

What is faster depends a lot on the app and on the system.

For example, P4/i850 systems are oozing with excess system memory bandwidth that the CPU has no way to take advantage of. You might as well use it by pulling vertices from AGP.

Some systems do and some systems don’t have AGP fast writes – essential if you want to write data quickly into video memory. Fast writes are broken or crippled on many chipsets.

Same goes for AGP 2x vs. 4x, PC133 vs. DDR memory, etc.

Matt

opla · May 11, 2001, 12:20pm

On Geforce2 MX PCI, I use VAR in Video memory.
On Geforce 1/2/3 AGP, I use VAR in AGP memory.
How DX8 manage its memory ? on PCI card, it uses video memory (I get the same FPS using VAR). But on AGP cards, how DX8 do ?

I tried to use a read value of 1 for AGP, but no memory is allocated (but with read = 0, I can allocate 30 MB of AGP memory).
Then I copy the vertices to AGP/Video memory using memcpy().

---- other question -----
Is it my PC or an ATI Radeon is slower than a Geforce1 DDR ?

ET3D · May 11, 2001, 1:00pm

Originally posted by opla:
Is it my PC or an ATI Radeon is slower than a Geforce1 DDR ?

I don’t think that this should be the case. IMO Radeon DDR should beat the GeForce 256 DDR. But I can only say this based on game benchmarks I saw. Can’t say for your code.

paddy · May 12, 2001, 3:07am

RadeON just crushes the GF256DDR in Quake 3 which uses vertex arrays.
You may have a “NVidia only” optimised code

HFAFiend · May 12, 2001, 7:37am

How do you specify AGP and/or vid memory?

ffish · May 12, 2001, 8:05am

With the parameters to wglAllocateMemoryNV(GLsizei size, GLfloat readFrequency, GLfloat writeFrequency, GLfloat priority); Typical calls are 0 for the read/write frequencies and (as Cass says): (0.25f, 0.75f] for priority for AGP memory, (0.75f, 1.0f] for video memory.

opla · May 13, 2001, 11:56pm

For the ATI Radeon becnhmark, I used Vulpine® GLmark v1.1 to test the performance.
I have an average FPS of 13.3 on ATI Radeon, and 18.8 on a GeForce DDR (without VAR of course). I have only 10.8 on a GeForce2 MX PCI.

beavis · May 14, 2001, 1:15am

In DX8, when you specifiy POOL_DEFAULT for
your vertex buffer, it’s up to the driver
to use either video/AGP/system memory,
depending on whether vertex processing is
in HW or software…

opla · May 14, 2001, 5:47am

I made more tests on the ATI radeon.
Using Vertex arrays, I have 40 FPS.
but using Display lists, I have 125 FPS !!!

On GeForce1/2, Vertex array are 10 FPS faster than Display lists !!!

Will I have to check the glGetString (GL_VENDOR) to use VA or DL ?

ATI really sucks with extensions. When I use GL_EXT_compiled_vertex_array and GL_EXT_draw_range_elements, I only have 10 FPS !

paddy · May 14, 2001, 5:57am

I feel a wind of stupidity blowing on these boards. This is the second “NV/ATI” war topic, and I don’t think this helps OPENGL ADVANCED CODING ! If you want to fight, go on one of those hardware forums.
I’m tired of these closed minded free time coders who just post because they don’t have anything else to do.
Some people work and need real help here.

Point.

opla · May 14, 2001, 7:02am

Well, If I speak about both ATI/nVidia, it’s because I want my software to run fast on both.

I tried the same test (previous post) with a GeForce DDR, I get 125 FPS with Display list and 125 FPS with VAR (AGP and Video mem).

So ATI Radeon is as fast as a GeForce1, but using Display list only.

That’s one of the point of this kind of forum, how to get the maximum FPS on any card.

paddy · May 14, 2001, 8:10am

In our engine we use both vertex arrays and display lists.
Performance is similar using any of the methods, on both GeForces and RadeON.

The RadeON’s performance (using any of the two methods) is about the same as the GF2 GTS. The GF1 is litterally crushed by the RadeON, which is about two times faster.

You may have done something … wrong ?

EDIT
While working on the engine, I realised a weird thing.
The detonator drivers sometimes “correct” some bad OpenGL programming. This means something which is supposed to work bad works fine on the NV cards but bad on any other cards (ATI, Kyro, etc …).
This is not really bad, but shows how dangerous it is using only one brand of cards to develop.
EDIT

[This message has been edited by paddy (edited 05-14-2001).]

opla · May 14, 2001, 8:29am

I would like to know if I made something wrong.

The programmer of the DX8 renderer of the engine told me that the Radeon is as fast as a GeForce1 (sometimes slower).
I get the same results than him with display list.

I’ll try some demos with VA on both Radeon and GeForce ASAP.
I’ll need VA soon for dynamic data …

paddy · May 14, 2001, 8:37am

If you want to compare compiled VA you can start with Quake 3.

mcraighead · May 14, 2001, 10:53am

Originally posted by paddy:
The detonator drivers sometimes “correct” some bad OpenGL programming. This means something which is supposed to work bad works fine on the NV cards but bad on any other cards (ATI, Kyro, etc …).

What do you mean by this? Feel free to email me privately if you want…

Matt