Displaylists and T&L

Do compiled displaylists take advantage of hardware T&L ?

of course, it depends on the drivers and the card, but it should. In fact, display lists are the best way to optimize everything for the drivers. In my card (GeForce2), display lists can give you really, really big performance improvements (200 or 300%), even with really high polygon count scenes (200k).

Yes, we optimize geometry inside display lists. It won’t beat a hand-managed VAR heap for static geometry, but it’s about the best we can do without being able to read your app’s mind about what it wants to do.

  • Matt

ok men!! i am using display lists , classic vertex arrays, commands lists & VAR for the GeForce boards!! My display lists are defined with the GL_COMPILE enum.
i’ve tested to change my display from one routine to another (using vertex arrays, commands or display lists for the same primitive) and i’ve only noticed small differnences between the first three set , of course VAR gain is tremendous!!
What i mean is there’s no 200% or 300% != by using dislpay lists coco!! classic vertex arrays are just a bit slower than display lists! so what’s the problem??

And what do ya think of that matt??
If there are no differences between command lists, CVA & display lists i have to think that they all send vertex on the fly through the bus!! and there is no special case for display lists management!! Moreover, if i put a TnT2 board my rastertime should be approximately the same!! thus, i explain the little rastertime down bcoz of the TnT2 fillrate…
so what?? there is no difference between display lists mechanism from the TnT drivers & GeForce Boards??? i am using 6.31 drivers under NT4.0…

Finally, i would be rather interested in any piece of code which demonstrate any 200% or 300% performance using display lists between TnT2 boards & GeForce!!
(i am not talking of VAR buisness here…)

“It won’t beat a hand-managed VAR heap for static geometry”

what does VAR stand for?

Vertex Array Range check out Nvidia dev site to know more about that…
but please VAR are not the subject here…
Any comments guys to previous questions??

About Sylvain:
“there’s no 200% or 300% !=”
of course, it depends greatly in your geometry, but in my P3 600, GeForce2 when throwing more than 100k vertices per frame, it pays. Alsol, maybe my engine is not 100% optimized yet (but it has been optimized a lot), since I programmed it to be full-featured, and maybe I do a little too much state changes (which can be optimized in display lists).
It really depends. I guess you can get even higher results from using display lists, and even pathologic cases where display lists are slower than VARed tristrips.

By 200 or 300% I meant that you can get up to 2x or 3x the performance you have without display lists (depending in a lot of factors).
In a test I did, I got 2.7x increase (58fps over 21fps) by using display lists over using glDrawElemens/EXT_compiled_vertex_arrays. The scene contains about 50 objects, 20 different materials/textures, and it sends 200k vertices per frame.
I guess the trick is to include all material/texture state changes in the list too. Of course, it gives you a pretty static scene, but it can be perfect for the arquitecture in a shooter game, etc.

You get it right Coco! display lists are damn cool when using static geometry, but what i was trying to say was that there are no big differences when using display lists with TnT2 boards and GF boards… the display lists seem to use intensive CPU work! and that means that vertice are sent on the fly to the GPU… thus when you’re using a GF board you have to use VertexArrayRange extension to get this 2x to 3x improvements depending on the primitive type… What’s disturbing the subject is the performance between glCommands themselves… but from a Nvidia board to another there are NO significant differences when you’re using optimal static geometry like 100% stripped primitives!!

OOpps, sorry Paddy…
No the display lists do not take advantage of the TnL and more precisely the GPU because of the Nvidia GeForce impelementation which sucks… that’s my opinion becoz the Display lists performances are sooo far from the Vertex arrays Range!! I’ve not tested yet if display lists take advantage of the hardware lighting but i think so!! the problem with display lists mechanism is that data are not stored onboard thus vertice are sent to the GPU with the CPU through the bus or via the GLFence which sucks like every caching system and that’s why you get this lack of performances i suppose… The question should be: why with the GL_COMPILE enum display lists are not stored onboard definitely with a nice vertex structure adapted to the primitive type?? i dunno… instead of VAR , display lists should be the
Best way to get advantages of Hardware T&L implementations, for the moment people at Nvidia use the microsoft way of life!! proprietary extensions, commercials extensions & so on!! but what i have to say is that they don’t conform to the ideals of OpenGL philosophy.

Sylvain said:

No the display lists do not take advantage of the TnL and more precisely the GPU because of the Nvidia GeForce impelementation which sucks

So you say that the GeForce doesn’t use Hardware T&L with display lists. And mcraighead, who works at nVidia, and programs drivers for their cards says they do.

Who knows more about this?

I have used display lists of about 250,000 vertices, which really is not optimal, and my framerates are still perfectly acceptable. If they weren’t using Hardware T&L for that, the computer would choke.

j

Display lists use T&L.

Vertex arrays use T&L, with or without the use of VAR.

Immediate mode uses T&L.

There are efficiency differences, yes, but in the end, a vertex is a vertex is a vertex. The only difference is in how fast they get to the card.

  • Matt

Sylvain:
I must admit that I havent tryed nv_vertex_array_range or other nvidia extensions, only the compiled vertex arrays, and yes, the difference of 2x to 3x is real for some cases, and an 1.5x performace is pretty common, whether you can achieve it or not.

You’re probably doing something wrong with display lists. I suggest to check your code.

A few more things:

  • display lists are good ONLY for static geometry

  • display lists accelerate all that a geforce can.

"the problem with display lists mechanism is that data are not stored onboard thus vertice are sent to the GPU with the CPU through the bus or via the GLFence which sucks like every caching system and that’s why you get this lack of performances i suppose… "

  • I didn’t write the geforce drivers, but I know that there’s no such lack of performance by using display lists. In fact, they blow away any other method 90% of the time. yes, the cpu might be involved, but data is optimized inside display lists in internal formats so it passes in the fastest way posible to the gpu.

“Best way to get advantages of Hardware T&L implementations, for the moment people at Nvidia use the microsoft way of life!! proprietary extensions, commercials extensions”

  • I agree.

“the display lists seem to use intensive CPU work!”

  • I bet DL do things with less CPU cycles than your code. Inside display lists, driver writers can store in optimized internal formats all data so when you call the DL you get top performance.

“thus when you’re using a GF board you have to use VertexArrayRange extension to get this 2x to 3x improvements depending on the primitive type”

  • Or in some cases, display lists.

"NO significant differences when you’re using optimal static geometry like 100% stripped primitives!! "

  • In real world geometry, there’s no such thing as 100% stripped primitives. But yes, there are cases where display lists dont give you 3x performance, but I have seen that 1.2-1.5x is a very common increase.

“but what i was trying to say was that there are no big differences when using display lists with TnT2 boards and GF boards”

  • Do your testing.

  • Last, not only have I found that in a GeForce2 and a TNT card, display lists can give you great results, but I have also found that similar results happen in ALL cards I have tested (riva 128, Permedia2, Intense3D 3410GT, SGI Visual Workstation, FireGL 4000).

Dont Get it personal, but I think you need to first be sure and think what you’re going to write, because it seems like you took your code (which can have bugs as any code) with some specific-case geometry and an specific card and concluded a lot of stuff. I’m not even sure that you did properlly test display lists at all.

Anyway, this is my opinion. As always, maybe someone with reputation can end this discussion like Matt or Cass (the only card guys here, from nvidia), or some other that thinks like you or me.

[This message has been edited by coco (edited 01-23-2001).]

Ok.

Wanna see 100% stripped primitives?? http://oks.merseine.nu (second button on the TV!)

It could be cool if display lists where as fast as VAR but it is NOT ,
as you will see i’ve done the testings…

Checkout FPS it is shooted from TnT2 boards… (using display lists!! with this card)
scenes are about 40000 to 80000 points 100% stripped primitives in ONE strip!

Thus this kind of reality now exists!!

Ozzy

VAR allows you to change the geometry between
each frame. Display lists don’t. Well, you can
create and tear down a list for each frame,
but the performance will most probably suck.

I’m pretty happy with performance of simple
vertex arrays (not compiled, even) but that’s
probably because I don’t iterate over the
same geometry multiple times in a single frame.

Oh, and quoting mcraighead:
“it’s about the best we can do without being
able to read your app’s mind about what it
wants to do.”
What kind of cop-out is that? Huh? :slight_smile:

for VAR please read VertexArrayRange not classic vertex arrays

Which is the fastest :
CVA’s or tristrips if you only have one strip and both having the same number of triangles and vertices?

[This message has been edited by Osku (edited 01-26-2001).]