Display list extension

I’ve been thinking about the way renderers currently handle transparency and complex blending. Usually, the app is forced to do the following:

  1. Draw opaque objects
  2. Sort transparent objects based on depth(requires an extra camera transform)
  3. Draw transparent objects

All of this is done completely on the CPU. The most intensive part is getting the depth of each triangle for sorting.

OK, so why not put sorting into the hardware? This can be done without major changes to OpenGL by extending display lists. Example:

//Let’s draw some transparent objects.
//First, set the sorting func:
glSortFuncEXT(GL_FARTHEST_TO_NEAREST_EXT);
//Now, put all the primitives we want to sort and draw into a list:
glNewList(,GL_COMPILE_AND_SORT_EXT);
… //put blending changes, texture changes, material changes, and all other drawing stuff
glEndList();
//Then, just use glCallList( ); normally whenever you need to render.
//At this point, the video card(or driver) would transform everything to camera space and sort based on the sorting function.

Of course, there could be lots of different sorting functions…

Anyway, has this been considered before? I think this would be an amazingly useful feature.

-Shard

[This message has been edited by Shard (edited 07-16-2002).]

Sorting of primitives introduces several problems. One of the problems that has no solution is three or more mutually overlapping primitives. That is, A overlaps B, B overlaps C, and C overlaps A. You can split one of the primitives, but then you are generating other problems, for example precision problem in multipass algorithms. Also, I think the spec says you are not allowed to generate extra geometry. For example, in two sided lighting, an implementation can’t draw the two different sides in two passes, since it intruduces the generation of extra geometry (one triangle passed, two generated by the implementation). But as I said, not sure, and can’t find it in the spec. I believe same would apply to sortable display lists if it’s true.

Another problem will be nested display lists. It may not enough to sort the individual display lists only.

Yet another problem is that the driver has less knowledge about the data being sorted, meaning it has to be very general. For example, if you don’t move the viewpoint much from frame to frame, you can take advantage of that. You don’t have to do a complete resort. Since the final draw order is almost identical between two consecutive frames, you can choose a sorting algorithm that is faster for almost sorted lists. An unaware implementation can’t make this descision, but you can.

What you need to know to perform an efficient sort, is knownedge about the scene, both in space and in time. Since OpenGL is an immediate more API, it don’t know about these stuff. It requires a scene graph like API.

Also, just because what you suggested is implemented as extension, or into the core, doesn’t mean it will he done by the graphics hardware. Sorting is not an easy process, especially if you intend to keep it general enough to handle most possible cases without special knowlegde about it. And to cut the price of the hardware (read, save silicon), you will probably see this feat in software anyway.

Let me try to address that point by point:

Problem: three or more mutually overlapping primitives.

All right, so make an adjustable metric as well. For example:
glSortMetricEXT( [GL_FAR,GL_MIDDLE,GL_NEAR] );
glSortBiasEXT( [0.0-1.0] );

Then, the implementation has specific instructions about the comparison. Yes, if the primitives are overlapping in Z, then you have problems. But how often does this really happen? More importantly, how often do you need an exact solution for it?

Problem: nested display lists.

OK, first off, the extension may need to disallow nested sorting lists.(You can’t have a GL_COMPILE_AND_SORT_EXT list within another GL_COMPILE_AND_SORT_EXT list)
Second, Handle display lists within the sort like any other geometry. I mean, they ARE eventually converted to primitive drawing operations, right? So, sort them as well. Or, if you really want configurability, make a flag:
glEnable(GL_SORT_NESTED_EXT);

Problem: OpenGL doesn’t know about the scene in time so sorting is very inefficient.

Agreed, yes, the algorithm has to be general. But I don’t know how much of speed hit this will be. Most games I know don’t have THAT many transparent objects to be sorted in the first place. This is more of a question for hardware vendors.

However, there is a way around this. Since this sorting is accomplished with a display list, remember that each display list has an associated number. The implementation can have the optimizations you mentioned by storing pre-calculated stuff using the display list number. Does that make sense? Here:

int MyListNumber=1;

//init:
glNewList(MyListNumber,GL_COMPILE_AND_SORT_EXT);//So, this is a list of STATIC transparent objects
… //draw stuff
glEndList();

//rendering:
… //draw opaque objects first
//then, transparent objects:
glCallList(MyListNum);

On the first call to glCallList, OpenGL will sort the list. On the next frame, the list will already be sorted, and GL can use that to its advantage. Nifty, eh? Display lists are quite appropriate for this extension.

Problem: won’t be done by the graphics hardware. Sorting is not an easy process, especially if you intend to keep it general enough to handle most possible cases without special knowlegde about it.

Fine, but remember that the graphics driver has access to stuff we don’t. I’m guessing even if this is done in software mode, it will probably be faster than doing it manually. Once again, this is a question for the vendors.

Thanks for the lengthy reply.
Any more thoughts?

-Shard

Yes, if the primitives are overlapping in Z, then you have problems. But how often does this really happen? More importantly, how often do you need an exact solution for it?

If you have 3 primitives overlapping each other, such that each one could be sorted in front of one and behind the other, then there is no solution, per-polygon. And yes, if it happens, you need an exact solution. Otherwise, you would see a pretty nasty artifact on-screen.

Fine, but remember that the graphics driver has access to stuff we don’t. I’m guessing even if this is done in software mode, it will probably be faster than doing it manually.

Based on what do you presume that the driver could perform this operation? To do it, the driver would need access to post-T&L data, which none do, save tile-based renderers that already sort per-pixel. It would have to destroy stripping information, thus slowing down rendering. Lastly, the performance of such things is unknown, as state changes could happen very frequently, due to constantly drawing other primitives.

Your assumption in all of this is, basically, that the hardware could do it if it wanted to. The point is, only a tile-based renderer is set up to do any kind of sorting at all.

Lastly, if, “Most games I know don’t have THAT many transparent objects to be sorted in the first place,” then what’s the point? If it isn’t really an issue, because transparency isn’t an issue most of the time, why should any hardware vendor bother to add this? After all, sorting per-object isn’t really that hard, nor is it particularly time-consuming on the CPU.

If you have 3 primitives overlapping each other, such that each one could be sorted in front of one and behind the other, then there is no solution, per-polygon. And yes, if it happens, you need an exact solution. Otherwise, you would see a pretty nasty artifact on-screen.

Yeah, yeah, but once again, how OFTEN does this really happen?

Based on what do you presume that the driver could perform this operation? To do it, the driver would need access to post-T&L data, which none do, save tile-based renderers that already sort per-pixel. It would have to destroy stripping information, thus slowing down rendering. Lastly, the performance of such things is unknown, as state changes could happen very frequently, due to constantly drawing other primitives.

Hmm…that’s a good point. Not sure about that…

if, “Most games I know don’t have THAT many transparent objects to be sorted in the first place,” then what’s the point? If it isn’t really an issue, because transparency isn’t an issue most of the time, why should any hardware vendor bother to add this? After all, sorting per-object isn’t really that hard, nor is it particularly time-consuming on the CPU.

Well, I don’t know. It IS a speed issue in my engine. It takes me far more time to process the scenegraph with transparent objects. Without going into specifics, If I disable sorting, I have a speed gain from 60 FPS to 90 FPS (33%). Pretty time consuming if you ask me…

-Shard

Why not just glDisable(GL_DEPTH_TEST) when blending? Maybe I have some wild version of the GeForce2 but as far as I can see everything is blended exactly the same way no matter what order it’s drawn in when GL_DEPTH_TEXT is disabled…

Maybe I have some wild version of the GeForce2 but as far as I can see everything is blended exactly the same way no matter what order it’s drawn in when GL_DEPTH_TEXT is disabled…

One of three things is happening for you:

  1. You aren’t particularly certain what the visual look of “correct” blending is for the scenes you are using.

  2. You just happen to be drawing in the right order.

  3. All of your blending is additive. Non-additive blending is order-dependent.