PDA

View Full Version : display list performance on radeon



JanHH
03-22-2004, 06:18 AM
I recently purchased a radeon graphics card and experienced it to be extremely slow with the program I am working on.. in a certain "boring OpenGL 1.1 mode", on my gf fx 5700 it runs with about 400 fps, while on the radeon (9600 pro), it renders 15-30 fps. The program makes heavy use of display lists. I remember reading that display lists rather slow things down on ati hardware.. is this true? Slow down to such an extent, whilst the chipset itself rather shoud be faster?

Jan

forgottenaccount
03-22-2004, 07:45 AM
In Homeworld2 we found display lists on all ATI hardware to be slower than vertex arrays. nVidia, Matrox, and Intel all have good display list implementations with their latest drivers.

If you want fast performance on ATI hardware you'll have to use VBOs until they optimize this part of their driver. If you stick to OpenGL 1.1 then I guess you just won't get good performance on ATI.

Use vertex arrays instead of display lists for a bit of a performance increase with current ATI drivers.

JanHH
03-22-2004, 08:12 AM
you are talking about "a bit" performance increase.. but performance is about 10-20 times slower than on gf fx with display lists. when using ARB_vertex_program and ARB_fragment_program on the ati, the situation does not change, it's still the same speed, whilst on nvidia it slows down to about 60 fps.. making it stil about three times faster than on ati. So can the problem really be the display lists? If it's not a *bit* slower but that much?

thanks
Jan

Ysaneya
03-22-2004, 10:01 AM
If your scene is only rendered using display lists, AND if you do not have implemented frustum culling in your program, then what you might see is NVidia drivers doing frustum culling on the display lists.

Your radeon *is* fast. But in my experience, ATI cards are much more sensitive to "doing the correct thing" (tm) than NVidia. A small mistake, an invalid vertex format, or a bad combination of states enabled, all of that can make your performance suffer a lot.

Y.

forgottenaccount
03-22-2004, 11:55 AM
FYI: I couldn't find the old HW2 numbers so I did some quick tests now with the 4.3 cats and a 9700Pro. Display lists weren't slower than vertex arrays like I previously said (although we had found this with old drivers). VBOs beat both methods.

Ysaneya's post says my thoughts too.

zeckensack
03-22-2004, 02:21 PM
Originally posted by JanHH:
in a certain "boring OpenGL 1.1 mode"Care to explain?
Are you talking about TexImage calls in display lists? I think you shouldn't do that.

JanHH
03-22-2004, 03:10 PM
no teximage calls inside of display lists, all that happens there is:

- tex coords
- normals
- vertex
- enable/disable texture for certain tex units
- bind texture
- change color

and yes the program does frustum culling itself.

Is there anything that has to be avoided on ati that works fine on nvidia?

"boring OpenGL 1.1 mode" means basically that it uses standard OpenGL lighting (and multitexturing, so it's rather OpenGL 1.2 mode ;) ), no vertex program and no fragment program, but I reworked it to use them for bump mapping. But it still has the old mode available, and on ati its both very very slow, while on NV, old mode = 200 fps and new mode = 60 fps.

I really have no idea what's going wrong.

well, thanks for all your help :)

zeckensack
03-22-2004, 03:30 PM
Originally posted by JanHH:
no teximage calls inside of display lists, all that happens there is:

- tex coords
- normals
- vertexSame number each?
I seem to remember that knackered, IIRC, had some strange issues with display lists when he tried

- enable/disable texture for certain tex units
- bind textureTry not to :)

- change colorShould be okay if you do it only a few times. If you do it very frequently, see above.


Is there anything that has to be avoided on ati that works fine on nvidia?No idea. However, I can tell you what runs well for me (and has done so for years) with ATI drivers.
dlists containing texture environment state changes
glActiveTextureARB, glTexEnvi dlists containing pure geometry, all used attributes set for every vertex dlists that start with a glColor* call and contain geometry, but with no further glColor calls
I actually never tried mixing state (other than vertex attributes) and geometry in a display list, I just never had a usage model for that.
Maybe you should just split into multiple display lists, so that they are all "pure".

JanHH
03-22-2004, 05:03 PM
yes it's in fact planned to use a separate display list for every material, so that color and texture changes do not appeare inside of them. however, I cannot see anything logical about texture and color changes being faster outside of display lists than inside. and after all, there are only very few color and texture binding changes at all, so this cannot be the problem. and remember, it is not a subtle thing but it is 400 fps on nvidia vs. 20 fps on ati. so there MUST be something going entirely wrong, I think.

Jared
03-22-2004, 10:56 PM
hmm.. how exactly is your data looking? the only time i had really extreme slowdowns (100th of vertex arrays) was with vbo and unaligned data. maybe there are similiar restrictions for display lists so with the advice above you could try:

specify normals etc. (except color maybe) for every vertex. use floats (or for colors if you use byte, use all 4 components). i tend to think ati loves to sacrifice flexibility for speed.

JanHH
03-23-2004, 05:20 AM
vertex, color and normal data is floats, and specified every vertex.

the data exists in two flavours, it's a ground mesh which is segmented, one version has 6x6(=36) segments, the other one 20x20(=400) segments, both are the same otherwise, same amount of faces, same look, and both run at the same (slow) speed.

I just discovered when using vertex and fragment program, it slows down even more, from 20 down to 14 fps, and all the fragment program does is fetching the texel color and writing it to result.color.

So there MUST be something wrong.. do you at least agree with me on ths?

Jan

Ysaneya
03-23-2004, 06:35 AM
Colors are floats ? This should ring your alarm. That's not a "standard" vertex format, you should fix this anyway.

How many polygons per segment?

If you see no difference at all between 36 and 400 segments, this should ring your alarm a second time. The problem is likely not a vertex format/transfer/bandwidth problem. I'm guessing you don't see a difference if you use vertex arrays or even immediate mode, do you?

Y.

Mazy
03-23-2004, 06:47 AM
I think floats as colors are considered standard for ATI as well ( i only use that, and i dont have any speed problems)

Have you tried to just use VBO? just so see if you get the same strange results

JanHH
03-23-2004, 07:28 AM
216618 vertices at all, whichs makes an average of

6107 vertices in the 6x6 segment version and

542 vertices in the 20x20 segment version.

No I did not try to change this to immediate mode or vertex arrays/vbo because this would be far too much work, and the program running on ati is "nice to have" but in general we are content with nvidia. So it's rather my personal interest (and at least I spent EUR 139,- on the radeon ;) ) than a neccessity to get it running on ati hardware.

wimmer
03-26-2004, 04:25 AM
I was also surprised at ATI's performance in "dumb" OpenGL modes (no shaders, ...) when I first popped in a 9800XT into our renderer. The GeforceFX 5950 proved to be faster by factors of 3-7, depending on the draw mode.

I didn't have a VBO implementation available, but for all other stuff, ATI was extremely sensitive to the way you send you geometry (triangle strips or not, indexed geometry or not etc.), while the Geforce provided excellent performance all across the board.

And I wasn't even able to compare display list performance. The ATI chucked out due to "out of memory" for a 1 million polygon mesh, whereas the Geforce was able to render 2 million polygons with display lists just beautifully.

I assume that with VBO, you can get the same level of performance on the ATI, but I have to say that for normal "dumb" OpenGL operation, NVIDIA's drivers are much better...

Michael

JanHH
03-26-2004, 04:48 PM
This does not sound very positive for ati.. if the radeon chip is that sensitive to all kinds of things and geforce does a much better job all in all, it's the exact oppositve of the ati hype in game magazines, where you get the impression that geforce fx is really a "looser" chipset and ati is much better. But it seems that the radeon is a taylor-made direct3d chip and is not that convincing in other situations, whilst the geforce fx is not as bad as those game magazines say.

Korval
03-26-2004, 08:50 PM
Think of it like this. ATi's first priority over the last year is to stabilize their drivers. So, where do you put your optimizations given limitted development time? On code paths that are already in use, or are likely to be used in the future. VBO, VAO (when it mattered), vertex arrays for formats you support. So what if your card doesn't handle display lists as fast as it could? That doesn't matter because DL's aren't in wide use among actual professional developers. Not to say that some don't use them, but those who don't outnumber those who do. Greatest good for the greatest number.

So, they optimize what makes actual functioning programs of importance run fast, as well as the API's that programs expect to use in the future (VBO).

It isn't that the Radeon is a D3D chip, as it can beat nVidia chips on GL programs as well. It's just a matter of GL having numerous data transfer API's to optimize, and D3D only having 1. So, you pick the ones that you need. And Radeons still handily own NV3x's at floating-point fragment shaders. So if you're doing something advanced, you'll still want your Radeon.

Basically, the R3xx's, and their drivers, are made for actual game developers and gamers first, and everyone else second.

So, if you find the sub-optimal path for sending vertices to be slow on your Radeons, stop using the sub-optimal path, and start using something real.

Jared
03-26-2004, 10:28 PM
Originally posted by JanHH:
This does not sound very positive for ati.. if the radeon chip is that sensitive to all kinds of things and geforce does a much better job all in all, it's the exact oppositve of the ati hype in game magazines, where you get the impression that geforce fx is really a "looser" chipset and ati is much better.ati is fast, but it seems many optimizations include a kind of "do it our way or live with the consequences"-policy. a few "right" ways that work well and many "wrong" ways that kill performance (and in some occasions crash your system).



But it seems that the radeon is a taylor-made direct3d chip and is not that convincing in other situations, whilst the geforce fx is not as bad as those game magazines say.i dont know if the whole hardware is "aimed" at d3d, but at least point sprites and a few other issues make me feel that d3d comes first when they write drivers. mentioned point sprites even make me wonder if they screw up opengl support on purpose.

seems like a draw: more performance and less flexibility or less performance and less touchy.

JanHH
03-28-2004, 04:03 AM
Well, I'll stick with nvidia then, I am quite content with nv performance when using NV_vertex/fragment_program instead of arb. And isn't it that nvidia IS doing a better job with their drivers, if they support any path, whilst ati only supports "their" path?

I disagree with "more flexible, less speed (nvidia) vs. less flexible but more speed (ati)". I don't think that one preferred path which is meant to be fastest has to become slower if other paths are supported as well. It's just more work for the driver writers, I guess ;) . Or maybe also for the hardware designers to create a chipset that is more flexible.

After all, I think this s*cks. sorry.. but one could think if what ati does is a "certified" OpenGL implementation at all, if it goes like, "hey, you can use display lists also but it will hardly be faster than simply drawing it with a software renderer" (which nearly seems to be the case). Maybe consisten speed throughout most of the implementation should be a criteria, too, as well as support of all features.

Jan

Ysaneya
03-28-2004, 04:19 AM
And isn't it that nvidia IS doing a better job with their drivers, if they support any path, whilst ati only supports "their" path?
And you know that clipping planes are done in software on NVidia cards, while they are in hardware on ATIs ? Not to start a flamewar, but you can find tons of differences between these two families of video cards, and after having worked for quite a while with both, it is honnestly my opinion that there is no black or white; none is superior to the other. If you're a serious developer you'll have to live with it and learn the differences...

Y.

JanHH
03-28-2004, 05:53 AM
I absolutely disagree.. if on ati *some* things run well and on Nvidia, *all* things do so, I would say that nvidia is superior. And after all, we'll have to wait for the next chipset generation, and if nvidia will reach ati pixel shader peformance so that ati does not have any performance advantage, and still on ati only some things run well and on nv still all things do so, I think it is quite clear that nv is better. but in fact another flame war is not very neccessary. At least now I know that I would have to rework the program to use vbo to reach decent performance on ati. So thanks :) .

Jan

Ysaneya
03-28-2004, 07:23 AM
if on ati *some* things run well and on Nvidia, *all* things do so, I would say that nvidia is superior.
Yes but my point was exactly that not everything runs better on NVidia than ATI. Pixel shaders are already one important thing. I gave you another example, clipping planes. And VBO support on NVidia cards is still not flawless. I had to reimplement one of my programs using VAR recently because VBOs were not giving a good performance. These are just simple examples. I don't see what makes you think "all things run well" on NVidia.

Y.

JanHH
03-28-2004, 03:03 PM
it was simply my impression because of the things people said here.. nvidia: broad suppot for most or all functionality, ati: only very special things run well. And also my own experience, my program runs well on nvidia and does not on ati ;) .

Jan

Jared
03-29-2004, 05:44 AM
Originally posted by Ysaneya:
And VBO support on NVidia cards is still not flawless. I had to reimplement one of my programs using VAR recently because VBOs were not giving a good performance.[/QB]bad example. vbo on ati is severly limited by their 4 byte alignment and as vao gave me even more trouble i have to fall back to vertex arrays just when vbo would be crucial (as i can either ask for 70mb vbo and not get it or use 17 and stick to vertex arrays).

Ysaneya
03-29-2004, 06:11 AM
I don't think that's a bad example, i never said ATI didn't have its own problems, i was just saying you still have lots of issues on NVidia cards too.

Y.

OldMan
03-31-2004, 03:51 PM
Come on guys! Everyone knows that NONE is perfect. For each flaw at one side theother will eventually find 2 in the other (until NVIDIA and ATI guys discover that they are witha ration of 150% defective features in their drivers :) )

So let`s stop fighting about wich has fewer problems.. and talk about how to solve (or at least go around) them?