PDA

View Full Version : AGP performance



opla
05-11-2001, 02:23 AM
my OpenGL renderer is as fast as the DX8 renderer on a GeForce2 MX PCI.
But on GeForce 1/2/3 AGP, the DX8 renderer is 20-30% faster.
I use Vertex Array Range with AGP memory. (priority 0.5, read 0, write 0).

I also tried with an ATI Radeon, it's so slow that I'll not lose my time talking about that **** (no way to use AGP/video memory).

Any idea to increase performance on AGP cards ?
I only have static data, I don't use GL_FENCE.

ffish
05-11-2001, 02:39 AM
Can you use video memory to see if there's a difference (priority of 0.75 -> 1.0f)? If your data is static, how about testing using display lists. You could also try plain old vertex arrays out of interest or CVAs. Obviously VAR should be fastest, though. Maybe, if you're sending huge amounts of data, your data is larger than the allocated AGP memory. I don't know what VAR's behaviour is in that circumstance. Maybe Matt or Cass can help. When you say that the GeForce 1/2/3 OpenGL app is slower than the equivalent DX8 app, how does the speed compare to the GeForce 2 MX performance?

I've just read one of the VAR pdf's that I downloaded from nVidia and it says that you _must_ write data to your arrays sequentially to maximise (memory bandwidth) performance. In one of my apps where I use VAR I copy to the array from a temp array using memcpy() and I've seen other people use that too.

[This message has been edited by ffish (edited 05-11-2001).]

opla
05-11-2001, 03:46 AM
I use VAR in video memory (priority 1.0) on the GeForce2 MX PCI to get the same FPS than DX8.

on AGP GeForce cards, it's a little faster, but I think it's only because of the bandwith (I have an AGP 2x PC).

I'll make other tests ASAP.

mcraighead
05-11-2001, 08:36 AM
On the AGP system, are you putting the VAR in AGP or video? AGP will usually be better.

- Matt

jwatte
05-11-2001, 11:58 AM
On the AGP system, are you putting the VAR in AGP or video? AGP will usually be better.


Will it? On benchmarking I've done, I've seen that VRAM will be faster than AGP by some noticeable factor, and AGP is faster than system ram by an even more noticeable factor. This is using AllocateMemory() and VertexArrayRange() for static (non-changing) data.

Can you give a better hint on under what conditions AGP will be faster than VRAM?

mcraighead
05-11-2001, 12:05 PM
It provides extra bandwidth. Putting vertices in video memory makes us share bandwidth between rendering and vertex pulling.

What is faster depends a lot on the app and on the system.

For example, P4/i850 systems are oozing with excess system memory bandwidth that the CPU has no way to take advantage of. You might as well use it by pulling vertices from AGP.

Some systems do and some systems don't have AGP fast writes -- essential if you want to write data quickly into video memory. Fast writes are broken or crippled on many chipsets.

Same goes for AGP 2x vs. 4x, PC133 vs. DDR memory, etc.

- Matt

opla
05-11-2001, 12:20 PM
On Geforce2 MX PCI, I use VAR in Video memory.
On Geforce 1/2/3 AGP, I use VAR in AGP memory.
How DX8 manage its memory ? on PCI card, it uses video memory (I get the same FPS using VAR). But on AGP cards, how DX8 do ?

I tried to use a read value of 1 for AGP, but no memory is allocated (but with read = 0, I can allocate 30 MB of AGP memory).
Then I copy the vertices to AGP/Video memory using memcpy().

---- other question -----
Is it my PC or an ATI Radeon is slower than a Geforce1 DDR ?

ET3D
05-11-2001, 01:00 PM
Originally posted by opla:
Is it my PC or an ATI Radeon is slower than a Geforce1 DDR ?

I don't think that this should be the case. IMO Radeon DDR should beat the GeForce 256 DDR. But I can only say this based on game benchmarks I saw. Can't say for your code.

paddy
05-12-2001, 03:07 AM
RadeON just crushes the GF256DDR in Quake 3 which uses vertex arrays.
You may have a "NVidia only" optimised code :p

HFAFiend
05-12-2001, 07:37 AM
How do you specify AGP and/or vid memory?

ffish
05-12-2001, 08:05 AM
With the parameters to wglAllocateMemoryNV(GLsizei size, GLfloat readFrequency, GLfloat writeFrequency, GLfloat priority); Typical calls are 0 for the read/write frequencies and (as Cass says): (0.25f, 0.75f] for priority for AGP memory, (0.75f, 1.0f] for video memory.

opla
05-13-2001, 11:56 PM
For the ATI Radeon becnhmark, I used VulpineŽ GLmark v1.1 to test the performance.
I have an average FPS of 13.3 on ATI Radeon, and 18.8 on a GeForce DDR (without VAR of course). I have only 10.8 on a GeForce2 MX PCI.

beavis
05-14-2001, 01:15 AM
In DX8, when you specifiy POOL_DEFAULT for
your vertex buffer, it's up to the driver
to use either video/AGP/system memory,
depending on whether vertex processing is
in HW or software...

opla
05-14-2001, 05:47 AM
I made more tests on the ATI radeon.
Using Vertex arrays, I have 40 FPS.
but using Display lists, I have 125 FPS !!!

On GeForce1/2, Vertex array are 10 FPS faster than Display lists !!!

Will I have to check the glGetString (GL_VENDOR) to use VA or DL ?

ATI really sucks with extensions. When I use GL_EXT_compiled_vertex_array and GL_EXT_draw_range_elements, I only have 10 FPS !

paddy
05-14-2001, 05:57 AM
I feel a wind of stupidity blowing on these boards. This is the second "NV/ATI" war topic, and I don't think this helps OPENGL ADVANCED CODING ! If you want to fight, go on one of those hardware forums.
I'm tired of these closed minded free time coders who just post because they don't have anything else to do.
Some people work and need real help here.

Point.

opla
05-14-2001, 07:02 AM
Well, If I speak about both ATI/nVidia, it's because I want my software to run fast on both.

I tried the same test (previous post) with a GeForce DDR, I get 125 FPS with Display list and 125 FPS with VAR (AGP and Video mem).

So ATI Radeon is as fast as a GeForce1, but using Display list only.

That's one of the point of this kind of forum, how to get the maximum FPS on any card.

paddy
05-14-2001, 08:10 AM
In our engine we use both vertex arrays and display lists.
Performance is similar using any of the methods, on both GeForces and RadeON.

The RadeON's performance (using any of the two methods) is about the same as the GF2 GTS. The GF1 is litterally crushed by the RadeON, which is about two times faster.

You may have done something ... wrong ?

*EDIT*
While working on the engine, I realised a weird thing.
The detonator drivers sometimes "correct" some bad OpenGL programming. This means something which is supposed to work bad works fine on the NV cards but bad on any other cards (ATI, Kyro, etc ...).
This is not really bad, but shows how dangerous it is using only one brand of cards to develop.
*EDIT*


[This message has been edited by paddy (edited 05-14-2001).]

opla
05-14-2001, 08:29 AM
I would like to know if I made something wrong.

The programmer of the DX8 renderer of the engine told me that the Radeon is as fast as a GeForce1 (sometimes slower).
I get the same results than him with display list.

I'll try some demos with VA on both Radeon and GeForce ASAP.
I'll need VA soon for dynamic data ...

paddy
05-14-2001, 08:37 AM
If you want to compare compiled VA you can start with Quake 3.

mcraighead
05-14-2001, 10:53 AM
Originally posted by paddy:
The detonator drivers sometimes "correct" some bad OpenGL programming. This means something which is supposed to work bad works fine on the NV cards but bad on any other cards (ATI, Kyro, etc ...).


What do you mean by this? Feel free to email me privately if you want...

- Matt

opla
05-14-2001, 11:59 PM
I'm not sure nVidia drivers "corrects" bad code. I made a soft (without OGL extensions) that work fine on nVidia's OpenGL and MS's OpenGL software renderer. But on ATI Radeon, the soft crashes when you close it.

I think it's because I don't call DestroyWindow (I didn't check yet). As usual, there's no info on the ATI web site ...

Anyway, I don't consider ATI driver as a reference for good OpenGL programming (I would like to have an SGI OpenGL implementation for that).

opla
05-16-2001, 01:42 AM
to paddy : I tried Quake3 with gltrace,
and it's not using CVA on a Radeon !
only VA ....

opla
05-16-2001, 03:59 AM
UPDATE :
I tried wit hthe last version of gltrace, Quake3 uses CVA ideed ...

opla
05-16-2001, 07:08 AM
I asked to devrel@ati.com why this difference of FPS between DL and VA. They told me to use VA of 1024 verticies max, and it works !!! several VA of 1024 verticies is faster than a LARGE VA (on Radeon).

paddy
05-16-2001, 07:14 AM
Interesting info ...
But 1024 vertices is really LOW !
Maybe a driver side patch which reduces big VAs into several smaller ones could do the job.
ATI ? Suggestions ?

opla
05-16-2001, 07:42 AM
Small question : in the nvOpenGLspecs.pdf, nVidia says : "The specification of glDrawElements does not allow optimal performance for some OpenGL implementations, however. In particular, it has no restrictions on the number of indices given," (page 74)

But when I glGetIntegerv(GL_MAX_ELEMENTS_INDICES_WIN, &DrawRangeElementMaxInd);
I have a limit of 65535 indicies ! (on a Radeon, didn't test on a geforce)

The official Opengl 1.2.1 specs says : "Implementations denote recommended maximum amounts of vertex and
index data, whichmay be queried by calling GetIntegerv with the symbolic
constants MAX ELEMENTS VERTICES and MAX ELEMENTS INDICES." (page 37)

if glDrawElements if limited, why a EXT_draw_range_elements ?

LordKronos
05-16-2001, 08:31 AM
I havent used EXT_draw_range_elements, but all software T&L systems I know of typically transform all verticies in a buffer when the buffer is locked. I would assume EXT_draw_range_elements provides a way to indicate that you are only going to use a subrange of the vertex array so that only those verticies will be transformed.

As an example, if you have a vertex array that contains 100 different objects, each composed of 100 verticies, and you tried to lock the array to draw just one of those objects, the system would have to transform 9900 verticies that it will never use. By specifying a range, the system will only transform the 100 verticies that you indicated will be used.