Display List question

Hello all,

Just a simple question: I’m reading Red Book 7th edition and I’m in the chapter talking about display lists. At the beginning of the chapter, the author clearly state that display list are now deprecated since OpenGL Version 3.1.

However, reading the first few pages, there is a lot of comments regarding the performance gain we can get by using them. What is the reason for them to be deprecated ? Is there something better replacing them ?

Thanks in advance ! Any help will be appreciated !

Mick

Every OpenGL call changes the state of the GL context. Some calls require “validation” that is relatively expensive to make sure everything is setup correctly. So if you had let’s say 20 calls, OpenGL would go through it and compute the resulting state. Obviously if you do that over and over with the same 20 calls (or 1000 or something) it’s faster for OpenGL to cache this once.

Now that OpenGL is moving away from fixed-function programming, there are actually few relevant calls like to turn on lights and such in OpenGL and more goes on in the shader. As such, caching these functions is less and less relevant. In only a few lines you can set up your shaders and sent the geometry down in a VBO (Vertex Buffer Object), which is now the preferred way to do it. As far as I know, that’s why display lists are now provide negligible gains and were deprecated.

Just a short comment…
The speed gains are not negligible on NVIDIA’s hardware (and drivers), but the optimization of DLs is not easy. VBO is much “cleaner” mechanism for driver developers to implement and optimize. Furthermore, with immediate mode deprecation DLs become less significant.

And display list only shines for static geometry, due to the long compile time.
VBO is much better for dynamic or “semi-static” stuff, like multipass rendering on dynamic meshes.

Thanks for these explanations. But I still don’t understand why removing them is a good idea. Using DLs in some situations and VBOs in others would not be a perfect fit ?

Edit: Oops, for some reason I did not see strattonbrazil post. I think this all make more sense now. Thanks all !

Another reason is that DL are somewhat hard to optimize, and some vendors felt that it would ease their life is deprecated.
That is not the case for Nvidia, who has (one of ?) the best implementation out there.

A more complete answer for you:

THE POLITICS

Deprecating them came from the pipe dream that if we just nuked a bunch of older functionality in OpenGL, OpenGL drivers from multiple/all vendors would miraculously be much higher quality and easier to write/maintain, and all maintainers of shipping OpenGL applications would happily spend days/weeks/months porting applications up to the new “leaner” OpenGL to stay up with the latest features with no profit to be had for that effort (i.e. eat the cost out of profit, just for the heck of it).

Reality set in, and the ISVs/vendors flat said that plan was ridiculous, so it was canned. Thus the existence of the COMPATIBILITY profiles which don’t deprecate anything. Display lists are still there.

THE TECHNICAL

Display list perf on NVidia far exceeds anything that’s been possible via VBOs for many batch scenarios for a long time (many years). It wasn’t until Bindless Graphics came along (specifically NV_vertex_buffer_unified_memory) that you could make ordinary VBO performance even come close to Display List performance in your app in most cases.

However, the Bindless extensions are currently NVidia-only extensions, so you can’t get this on ATI or Intel. Though I’m hoping bindless batches at least will be pushed up to EXT/ARB soon, as only with one of these extensions does deprecation of display lists make any kind of intelligent sense (and I’m referring to the common case of geometry-only display lists here).

SUMMARY

So to answer your confusion, yes, from a performance standpoint, it made absolutely no sense deprecating display lists when on some vendors they were by far the fastest way in general to submit static geometry to the GPU for rendering, and there was no way to even come close to touching that with other methods in general. And on NVidia, for static geometry, without using vendor-specific extensions, they still are the fastest way to submit static batches to the card. For static geometry, just use them and enjoy the extra frame time to do other things!

So if you’re targetting a COMPATIBILITY profile, Display Lists are still there. Try them, use them, and see what you think. VBOs are more forward looking, but to even touch Display List perf in some circumstances you’ll need bindless.

But I still don’t understand why removing them is a good idea. Using DLs in some situations and VBOs in others would not be a perfect fit ?

Display lists under ATI hardware doesn’t provide performance boosts over VBOs. Now, there are one of two reasons for this. Either ATI’s VBO implementation is optimal for their hardware, or ATI simply uses VBOs (or the equivalent back-end) in display lists. Either way, display lists are no guarantee of acceptable performance.

The technical reason to remove DLs (among other things) is so that there will be a single rendering path for submitting geometry to OpenGL. This means that IHVs can optimize a single basic rendering flow. This means that users know what is the optimal path and don’t have to guess or put in vendor-specific code. And if that path is not optimal for API reasons, then ARB members can’t just gloss over it; they have to recognize the problem and correct it in the API.

Reality set in, and the ISVs/vendors flat said that plan was ridiculous, so it was canned. Thus the existence of the COMPATIBILITY profiles which don’t deprecate anything. Display lists are still there.

By “Reality” of course, you mean that Mark Kilgard said, “this won’t help.” And rather than waiting for the validity of this to actually be tested, ARB_compatibility was shoved through the ARB in 3.1. And then the final sabotaging of removing functionality was complete with the compatibility profile in 3.2.

And I would remind you that Longs Peak, the total API rewrite that would have gone much farther than simple deprecation, was started as a joint ATI/NVIDIA venture.

Also, the use of the term “deprecated” is all wrong. “Deprecation” means that functionality is still available, but it is recommended that it not be used because it is intended to be removed in future versions. In 3.0, display lists are deprecated. In 3.1, display lists are removed, and they remain removed in the core profile of every higher OpenGL version.

Display list perf on NVidia far exceeds anything that’s been possible via VBOs for many batch scenarios for a long time (many years).

Of course, on the ATI side, display lists have never been the preferred path. So if you want a completely inflexible method of vertex submission that will only buy you performance on certain cards (and considering NVIDIA’s recent fumblings, I wouldn’t keep my hopes up for them gaining marketshare in the short-term), then by all means, use display lists.

Also, “far exceeds” is a bit of an exaggeration. NVIDIA display lists are better, but it’s not that huge of a performance difference.

Sorry to burst your bubble, but no, it’s far from an exaggeration.

2X faster or more (i.e. takes “half” the draw time) is pretty huge from any perspective. Not just I but others on this forum have seen this. That is, display lists sometimes up to (and beyond) 2X faster than standard static VBO batches, on NVidia.

Testing and intuition suggests that a lot of this benefit may be from NVidia using GPU addresses instead of VBO handles in their display list implementation (i.e. getting the “bindless batches” optimization in their display lists – that is, NV_vertex_buffer_unified_memory). This cool optimization can be hidden quite transparently behind the GL display list abstraction. This is evidenced by VBOs+bindless coming very, very close to NVidia’s display list performance (for geometry-only display lists) in tests I’ve run recently. However, let me clarify, I’m guessing here. I don’t work for NVidia and have never chatted driver internals with any of their guys.

If this assertion is true, that ATI display list perf is no greater than their straight VBO handle batch perf suggests they may not have implemented this “GPU address” optimization in their GL display list implementation, OR (long shot) cannot benefit from this speed-up due to something about their GPU architecture.

As much as you try to wish batches with VBO handles were and are the fastest (or at least close to the fastest) batch submission method to come along, doesn’t make it so. They’re still not beating display lists, or even coming close, in some scenarios.

Did you actually even test your assertion, or are you just preaching from the spec as if it were divinely-inspired gospel?

But to spin this into a positive note, let us hope that a few vendors or the ARB pushes bindless batches up to EXT/ARB status. With this, VBOs will finally be a sensible performance choice over display lists, on any vendor’s GPUs, for any batch sizes (where currently, without using vendor-specific extensions, they are not!).

2X faster or more (i.e. takes “half” the draw time) is pretty huge from any perspective.

It’s a semantic question. If 2x is “huge”, what do you call 10x? If 2x “far exceeds”, what would 7x be?

If you use large superlatives for a “mere” doubling, that leaves little room for anything else.

If this assertion is true, that ATI display list perf is no greater than their straight VBO handle batch perf suggests they may not have implemented this “GPU address” optimization in their GL display list implementation.

Or as I suggested before, maybe that’s simply as fast as they can go. It may simply be that VBO speed is as fast as ATI hardware can go. In which case, it is NVIDIA’s drivers that need fixing, not the API.

But to spin this into a positive note, let us hope that a few vendors or the ARB pushes bindless batches up to EXT/ARB status.

That’s not a positive note. A positive note would be finding an appropriate solution, not merely the first thing that comes along, that solves the problem. If there is in fact an intrinsic problem with buffer object usage.

Ha. That’s the lamest post I’ve read in a while. If 2X speed-up isn’t huge to you, you’re tenured in academia.

If you want to stop putting your foot in your mouth, start citing specific numbers or percentages for the measurements you have actually done, rather than throwing around generalities based on “wishful thinking”.

Ok…So I’m getting not everyone were happy with the deprecation (or removal) of display list. I suppose some decisions must be made and not everyone agreed.

Anyway, thanks again.

Pretty much. But we have the COMPATIBILITY profile. So in practice the deprecation doesn’t matter much…

…except on GPU vendors that have said they’ll only produce GL 3.1+ CORE-profile-only drivers – i.e. no COMPATIBILITY profile or GL 2.x support. However, I don’t think there are any examples of this yet.

Still would be nice to deprecate display lists with a general consensus, when they are no longer the perf-leader. With VBOs, you have complete control over GPU memory allocation and consumption (an advantage), but with display lists it’s a “black box”. And display lists take a long time to compile, rendering them not-so-ideal for run-time paged (or generated) geometry, where as VBOs (or client arrays in a pinch) are ideal for this. So once VBO perf is as good as display lists, VBOs will be the better choice in all areas.