PDA

View Full Version : Display lists vs. vertices arrays



Warzywo
09-28-2008, 01:54 AM
Hi, what in Your opinion is faster while rendering ? I would like to render complex meshes like terrain and simpler one like solid objects. When i make some tests i will place results here, but i would like to have some info :). Thx

Zengar
09-28-2008, 02:33 AM
DL lists and VBOs should be about the same speed. It is preferable to use VBOs for everything (as they allow maximal flexibility), but display lists are still a good choice for static objects.

Warzywo
09-28-2008, 04:44 AM
DLs are easier to implement ;), but vertices arrays are the fastes. I did some tests, VBO not present:

Renderer: ATI Radeon HD 2600 Pro AGP

1. terrain mesh - 1024 vertices:

normal ~ 195 fps
DLs ~ 202 fps
v.arrays ~ 205 fps

2. terrain mesh - 65536 vertices:

normal ~ 49 fps
DLs ~ 87 fps
v.arrays ~ 94 fps

resolution: 1430x940, multisampling 4x

Zengar
09-28-2008, 05:45 AM
Actually, on a reasonable implementation DL should be faster then plain client-side vertex arrays

zeoverlord
09-28-2008, 07:51 AM
I don't know about that, i once tried 2 different methods of doing shadow volumes, with immediate mode and with VBO, all the data was rewriten every time, but still the VBO went way faster.
If i can do that then surly the driver also can with vertex arrays and DL.
And from the numbers Warzywo posted i say that is the case, for both.
But vertex arrays are already formatted correctly so i guess it would account for that difference.

Zengar i agree with you that i might seem like DL should have the possibility to have VBO speed, since we have a simple way to tell if the data has changed, and that means you don't have to build new VBOs every time internally.
My guess is that they just don't care, DLs have been pretty much deprecated for some time now (for real in openGL3).

Warzywo, would you care to rerun that experiment by adding VBOs into the mix, i think you will find that VBOs are the best for static data.

Brolingstanz
09-28-2008, 11:56 AM
Funny thing is DX11 is now introducing DLs as part of its new MT solution. They amount to single-threaded command buffers, precompiled for deferred execution in the main render thread...

Korval
09-28-2008, 01:47 PM
They amount to single-threaded command buffers, precompiled for deferred execution in the main render thread...

Longs Peak Reloaded was going to provide something not entirely unlike display lists too.

knackered
09-29-2008, 12:42 AM
Display lists are notoriously slow on ATI hardware. NVidia display lists are screamingly fast, though (just geometry).

Dark Photon
09-30-2008, 07:21 AM
Display lists are notoriously slow on ATI hardware. NVidia display lists are screamingly fast, though (just geometry).
I heartily second that on NVidia.

And I should qualify. I only have our product using display lists on NVidia for (what was called here recently) "geometry-only display lists." I.e. no OpenGL state transition capture. Just the raw batches (vtx array binds/enables and batch submission). NVidia is super fast rendering these. I could not get VBOs to touch them (and I tweaked formats and packings like crazy; and yes, I'm using a good indexed triangle optimizer).

For instance, some past numbers I captured:
* Client arrays: 16.7ms draw
* Server arrays (VBOs): 11.4ms draw (31% faster)
* Geometry-only display lists: 7.4ms draw (56% faster)

This is draw only, which excludes cull and other frame overhead.

And until I can get the same or better performance from VBOs, I don't want to get rid of OpenGL display lists. OpenGL currently does not publish enough info about the driver "fast path" to reproduce it, or perhaps even enough functionality.

CatDog
09-30-2008, 07:30 AM
Do you optimize your arrays anyway, before putting them into the lists? Does this make a difference?

CatDog

Dark Photon
09-30-2008, 09:55 AM
Do you optimize your arrays anyway, before putting them into the lists? Does this make a difference?
Yes, optimized arrays always submitted. ACMR in 0.7-0.9, so they aren't too bad. But you do raise an interesting question: whether NVidia's display list implementation re-optimizes triangle order.

CatDog
09-30-2008, 10:08 AM
Yes, exactly that was my question... :)

If I find some time somewhere, I'll give it a try.

CatDog

CatDog
09-30-2008, 05:38 PM
Hmmm, at first glance, I can not second your observations concerning disply list speed. Here is what I did:

I'm drawing a cache optimized static mesh (only positions and normals) using exactly 6 glDrawRangeElements-Calls. Around 5 mio tris and 0.8 mio verts in total. On GeForce 7950GX2 on WinXP.

1. All vertices stored in one interleaved VBO. Indices stored in one element VBO.
--> 92 FPS

2. Now instead of using VBOs, I'm wrapping the 6 glDrawRangeElements-Calls into a display list and call this instead.
--> 27 FPS !!!

(And both of my CPU cores jumping to max. With VBOs, it's around 30%...)

Huh? I'm stopping here, because my original question (is nVidia optimizing display lists?) becomes irrelevant at this point.

CatDog

knackered
10-01-2008, 04:41 AM
things i've observed on nv hardware with dlists:
1/ you get better performance if you ACMR the triangles before creating the list.
2/ you can actually beat nv display lists with vbo if you pack multiple batches into the same vbo and offset the indices so you don't have to re-bind the pointers in between.

Dark Photon
10-01-2008, 06:16 AM
things i've observed on nv hardware with dlists:
...
2/ you can actually beat nv display lists with vbo if you pack multiple batches into the same vbo and offset the indices so you don't have to re-bind the pointers in between.

Will have to try that knackered. In my above stats (reflective of other tests) I had one batch per display list and one batch per VBO (2 really, one for indices) to keep things apples-to-apples. However, given your info, who knows -- maybe NVidia dlists are packing multiple into a single VBO pair behind the scenes... Wish this weren't such black magic. (gluOptimizeBatches anyone? Heck, I'll take gluNVOptimizeBatches.)

knackered
10-01-2008, 06:57 AM
I think they are packing them into vbo's based on the order in which the dlists are created. I have a vague memory of doing a test and coming to that conclusion. Create dlist#1 for an object, then create a lot of redundant dlists, then create the next real dlist#2, then render dlist#1,dlist#2,dlist#1,dlist#2 etc. and you get worse performance than if you didn't create the ones in between.
The "geometry display lists" idea has more legs than a football team. The IHV is better placed to format my static data than me.

CatDog
10-01-2008, 06:59 AM
pack multiple batches into the same vbo and offset the indices so you don't have to re-bind the pointers in between.
That's what I did above. Six draw calls using different index offsets. The VBO is bound only once per frame.
But on the other hand, I also only created one display list, containing the six draw calls. Strange.

CatDog

blackwind
10-01-2008, 07:46 AM
[quote=knackered]

For instance, some past numbers I captured:
* Client arrays: 16.7ms draw
* Server arrays (VBOs): 11.4ms draw (31% faster)
* Geometry-only display lists: 7.4ms draw (56% faster)



is that for static geometry or dynamic (say, an animated character)?

knackered
10-01-2008, 09:53 AM
are you using VBO's in the display lists?

CatDog
10-01-2008, 10:12 AM
Me? No. Just the plain client array to compile the display list.

(blackwind, it's all static geometry.)

CatDog

Dark Photon
10-01-2008, 07:01 PM
For instance, some past numbers I captured:
* Client arrays: 16.7ms draw
* Server arrays (VBOs): 11.4ms draw (31% faster)
* Geometry-only display lists: 7.4ms draw (56% faster)

is that for static geometry or dynamic (say, an animated character)?
Static vertex data. Display list compilation time largely precludes when vertex attributes are dynamic.

Note however that that this is different than static vs. animated characters.

khamoon
10-02-2008, 03:55 AM
Display lists will be gone, removed from OpenGL in some near future, so it's probably not a good idea to start using them now. Especially since they could be really hard to remove from the code later on, hard to implement in your OpenGL wrapper etc. The only situation a have seen, where geometry display lists were slightly faster than VBO's is drawing large amounts of really small objects. But this should be now implemented with instancing, or pseudo-instancing. And I confirm that display lists work nicely only on NVidia drivers.

V-man
10-02-2008, 06:08 AM
Display lists will be gone, removed from OpenGL in some near future

I didn't see that in the deprecation list.
Even if it was removed, they might add geometry only DL.

khamoon
10-02-2008, 06:18 AM
They are clearly deprecated. In the deprecation section the following can be found (page 407 in the basic OpenGL 3.0 spec):
"Display lists - NewList, EndList, CallList, CallLists, ListBase, GenLists,
IsList, and DeleteLists (section 5.4); all references to display lists and behavior
when compiling commands into display lists elsewhere in the speci-
fication; and all associated state."

CatDog
10-02-2008, 06:42 AM
It was stated (by Michael Gold) that deprecated features may stay as "core extensions". Vendors are free to support them furthermore, on users demand.

But since I can not confirm the advantage of display lists on nVidia (at least not on my hardware with my huge meshes), I don't care if they will be gone. If they would be the fastest method, I'd use them.

CatDog

Xmas
10-02-2008, 08:22 AM
The subject line is a bit odd since display lists and vertex data passing are orthogonal concepts. In principle display lists still make sense when using the most efficient way to pass vertex data, however their particular implementation/specification in OpenGL has several shortcomings. I'm hoping for a better way to express state vectors in the future.

tamlin
10-05-2008, 10:55 PM
For batches of many small objects (geometry only, which is quite useful when doing shadows :) ) I think a DL, in the compile once use many case, could always be better than plain VBO's as the driver knows its hardare and could create a single optimized commandbuffer for it.

This got me thinking... could it be that it could be useful to expose a "special" mode/state/path for (opaque) geometry-only, for f.ex. depth-buffer-rendering, occlusion, and other stuff I haven't considered?