State changes vs front to back and other stuff

I posted this some time ago in the beginner’s forum but I got no reply, so I guess it’s an advanced topic.

We know that vendors are claiming two ways to increase performance:
1- Front to back rendering. Increases performance by using early z-cull, decreasing the needed fillrate.
2- Good state change management. Increases performance by less stalling the pipelines.

Now, as I see it, those two methods are in some way conflicting so I would like to hear some opinions about that. As I see it, (2) will be best (if not now, in the not so distant future) but I cannot be sure.

Another thing I would like to get some comments is the using of a z-only first pass. In this way, the overdraw caused by using (2) would be significantly reduced. Does someone experimented with this?
The nice idea is that by using (2) we may be able to use occlusion queries (a topic which was posted some time ago) immediatly after having rendered the ‘nearest z’, so the queries will hopefully terminate before we complete the ‘shade for real’ pass.

BTW, by using (2) we can get a faster application stage (state management can be done once, while front-to-back sorting cannot), which however, is not so important considering most CPUs are quite fast right now (but memory bandwith is not, and this may be a plus).

I hope I have explained me well enough. What is your opinion on that? As you may have noticed, I have mixed feelings about this.
Thank you in advance!

[This message has been edited by Obli (edited 05-12-2003).]

It depends on the number of state changes in your scene.
If you have lots of objects, all using a ‘material’ (material=group of state changes), and lots of objects share the same material, then you should sort by material, then sort within each material group by distance from viewer. You could then sort the material list from least expensive to most expensive (so the geometries with the least expensive materials fill the z-buffer first, thus taking advantage of the early-z cull for the more expensive materials).

All right, I already though some time about taking the best of both worlds, however, I would like to point out some things which are somewhat “implicit” in the post above. I wouldn’t have exposed it but since you have taken a good point, I need to make describe my position better.

The first pass is z-only. By this point of view, most ‘materials’ are the same. Leaving out the rare fragment programs which are doing depth replace (which anyway can still be implemented in the algo), we should be able to get the z by just applying vertex programs.
There will only be a material for “position-invariant” vps and so much materials as there are “position-variant” vps. In other words, all the position invariant vps will be considered the same while every position variant vp will make a material of its own.
In a real world scenario, I guess most scenes will have only 1 material (position invariant) and even in a worst case condition, there should be no more that 10 materials in the scene (if it goes higher state management becomes a bit more complicated but nothing really problematic).
By this point of view, what you tell me may give nice speedups (and basically, front-to back rendering will do most of the job), but keep on reading please…

  1. Consider the case in which, as you said, we are ALSO sorting. In this case we get very good efficiency. Do you think this would be good enough to justify the two pass method? Comments?

  2. The case in which we are NOT sorting.
    Since the pixel pipe is filling only z, it should be quite fast (I guess at least twice as fast as with ‘standard’ rendering), I am not sure optimizing the z-only pass front to back may give a serious advantage.
    This is the consideration I would like to get ideas about.
    The point is that if it is really fast right now, making it go faster won’t save too much since other ops will be bottlenecking. Consider filling a z-only fragment will probably take half time a standard pixel takes (only considering bandwidth) so, being careful would help, but throwing a bunch of z-only to the trashcan may not ruin up the performance too much (especially if the real rendering is going to be fragment program limited).

Another point I consider to let you understand why I put my attention on the case of the unsorted z-first (please do not take this in consideration about the post, I tell it to you so you can understand why I though at that algo) is that very probably the data structure I am going to use does not provide an easy way to sort polys like BSP so sorting may become difficult. Again, skipping sorting at all may provide faster application stage (at least in my case, where it may take a while) while slowing down a bit a stage which in my opinion, is likely not bottlenecking.
Another reason I wanted to hear something about that is that someone told me it would be a good idea to consider a z-only first pass. Maybe it was on a nv whitepaper.

I hope the motivations are clear enough now.

EDIT: a note about z-only fill and overrall pipe performance, from app to fb.

[This message has been edited by Obli (edited 05-12-2003).]

If the first pass is Z only, then there are no state changes. You could then sort your objects ROUGHLY from near to far. If you’re drawing terrain blocks, you might want to generate 4 triangle lists, and choose the one which is sorted the most correctly. Then if you render meshes, sort each major mesh near to far, using center-of-mesh. That ought to get you close enough to ideal to not warrant further improvement.

Successive passes could then use GL_EQUAL depth testing, and sort by state.

This might give you the best of both worlds!

Anyway, the way to get good performance is to define a performance target, set up good test rigs, and make sure you never fail to meet your target as you add functionality. When you need more functionality, figure out where the bottleneck is, figure out how to remove it, and only then add the additional feature.

Yes, maybe it’s a bit too early to think about that. Thank you for your ideas about sorting, they will turn out really useful (i had something different in mind).

However, I still don’t understand while in z-only mode there are no state changes. Right, most of the time there won’t, but I still think I need to change state if we are going to change VP (position-independant is an exception since all position-independant vps can be considered the same).
I will resume the question when I’ll have something up and running, following your suggestion.

Be careful - if you’re rendering any polys which may have textures with texels that modulate to zero alpha and you have the alpha test on.
Your z pass will render them as solid, but in fact they’re supposed to have holes in them.

Sure, I also think the idea is a bit more complicated since now even VPs and FPs may use alpha in a non-trivial way, however, this is not really a problem to fix.

Thank you!

Originally posted by Obli:
however, this is not really a problem to fix.
Thank you!

How would you fix this without state changes?

Of curse, I cannot, but I can figure out what VPs are affecting the alpha by looking at the program string. The same applies to FP which are playing with alpha or depth. In that case, I just need to group’em correctly and then sort just as jwatte suggested. The idea is now to sort between the same render state in front-to-back order and then switch state and continue.

Everything has a chance to get transparent will be considered trasparent and won’t be rendered in the z-only pass, just like the things are done now in most games.
BTW, isn’t alpha test discarding the fragment before it’s written in the depth buffer? If so, they can be considered non-transparent and still work (since the holes are really not there). This of curse, if it has only alphaFunc and no blendFunc (usually the case of grates, for example). The thing you said applies to surfaces which are transparent (examples: glass, water, a thunder in the sky).

A thing may be trasparent if:
vertexcolor got alpha < 1.0
vertexprogram applied plays with alpha (not sure but 99.9% of the time)
texture got alpha (no need to check if there’s really a texel with alpha < 1.0, we can just assume)
fragmentprogram applied plays with alpha in some way

But, however, all the things above may still produce a completely opaque polygon. All it takes is to deactivate the blendfunc.
This is another problem which is off topic right now.

EDIT: an idea about alpha test.
EDIT: made a point of the algo (sort inside the same render state).

[This message has been edited by Obli (edited 05-14-2003).]

[This message has been edited by Obli (edited 05-14-2003).]