Draw order for a game renderer

I’m writing a rendering subsystem for my game and would like to know what I should aim for in terms of drawing order. Some points on the game’s rendering:

-The rendering is pretty simplistic, it just uses texture billboard quads (i.e. rotated to always face the viewer) for every creature / item, each of which appear on a polygonal terrain / surface.
-I don’t need z-buffering since I’m using partly-transparent textured quads to give the illusion of form. I use the painter’s algorithm instead.
-The game is for fairly standard-grade x86/64 desktop systems (standard OpenGL, not ES).

For the render order, so far I have this:

by depth (necessity for accurate draw order / painter’s algorithm)
-by program (context changes here must be costly!)
–by texture in program (I have heard that texture context changes are costly…)
—by mesh that uses that texture(glVertexAttribPointer, not horrifically costly?)
----by discrete entity transform (every entity using this mesh)

Does this order make sense? I have no idea how to go about this decision, except by ordering from what I think is the most expensive context switch (to happen least frequently), down to the least expensive (to happen most frequently). I would love to be able to put program before depth, but obviously that would break the painter’s algorithm! Also I don’t know if “render by texture” is in the most suitable place here. In fact, unless someone shows me a better way, I will probably have just one shader program for all my in-game objects because otherwise it’s going to lead to a LOT of shader program changes. That would lead to:

by program
-by depth
–by texture
—by mesh
----by discrete entity transform

Your second option seem better, a priori.
Texture switches are quite costly too, how many texture switches do you expect ? Try to pack in texture atlases or texture array.

Atlases are a definite. Cheers for the encouragement.

I had a related discussion about the above question elsewhere, centred around using the painter’s algorithm vs using the z-buffer (I was planning on foregoing the latter in favour of the former).

My understanding, now, though, is that the z-buffer does depth ordering correctly EVEN IF we switch shader programs halfway through rendering all our objects. If this is so, I’d like to use the z-buffer… this would also mean I could use approach 2, obviously.

Does this sound about right?

[QUOTE=Nick Wiggill;1241862]Atlases are a definite. Cheers for the encouragement.
[/QUOTE]
Before doing this change, try rendering you “world” with one single texture. That will give you an idea how much you can gain with an atlas. I did that in my voxel-based project, and could notice no difference. So the actual gain depends on your situation (as always). And of course the result can be different on different hardware.

Using atlases have disadvantages and requires some effort, so it may be a good idea do some testing first.

There is the obvious question that should have been asked first: Have you done some performance measurements that show you may have a problem?

If not, don’t worry. It could be much better to implement a logic that is easy to understand and follow. To optimize draw order, there are ways of iterating through all the data, without drawing, but saving in sorted lists of what shall be drawn, and then iterate over the lists as a second stage.

In regards to switching textures I can remember a project I ran on an GeForce 8600 M GS mobile GPU - which is obviously not that powerful - where I did something like 130 texture switches per frame and it didn’t make any difference. Try to really profile into that!

As always: Start with some order that seems reasonable, run and profile both the CPU and GPU times of at least some coarsely defined sections of your application, find out what’s taking the most time, optimize and repeat the process until you’re either satisfied or can be sure that no further optimization makes sense or is possible. Don’t assume stuff - gather and deal with real performance data.

Re zbuffer or painter’s algorithm :

using partly-transparent textured quads to give the illusion of form

Does that mean you use blending ? Or simply alpha tested shapes ?
Because with blending, zbuffer is mostly unusable.
Without blending, you should absolutely use a zbuffer, and draw sorted by shader then texture !
One could also use a coarsely sorted polygons and draw front-to-back to take advantage of early-z optimisations, useful if you have complex shaders that do not modified the fragment depth.

@Kopelrativ, thokra Thanks… I have to be honest I was always intending to use a few simplistic textures, thus atlases are the natural choice so I’m not too worried about texture context switches.

@ZBuffeR You nailed it. It’s alpha-tested shapes I want, not alpha-blended. So I will definitely be going for z-buffer, sorting by shader then texture. Many thanks!

Re “early-z”… I read this term a lot but not sure exactly what this is. I looked at this abstract and it says:

“Front to back rendering requires sorting opaque geometry based on distance from the camera and rendering the geometry closest to the camera first. This rendering order will maximize depth test failure without additional D3D API calls.”

…So from what they said and what you said, I understand that it is better to write front/closer polygons’ pixels to the z-buffer earlier, thus allowing pixels in all subsequent fragments that would have been “behind” to be discarded before any undue processing is done on them… Is this correct? If so, I understand what you mean about the coarse sorting.

Later on, I might like to have billboarded particles that use alpha-blending. How then would I go about incorporating these into such a renderer as I’m building here, without depth glitches? Would I need to do manual point-sprite z-sorting?.. I would want them to be obscured by eg. terrain, but not be able to obscure terrain themselves (i.e. not affect the z-buffer state).

Early-Z, Z-Cull, call it however you want: Conceptually it’s nothing more than performing the depth test before running an invocation of a fragment shader for the fragment in question thus not incurring any fragment processing cost. In this regard, it should be obvious why the preferred rendering order to leverage this optimization is front-to-back for opaque geometry. Current, non-sid0e-effect-free GLSL even permits doing the early depth test explicitly because simply discarding fragments without making sure other invocations don’t depend on their result might lead to false results.

Anyway, early-Z can really save a lot - if and only if you’ve got expensive fragment shading going on. Otherwise the CPU overhead of sorting all visible elements in your scene may even worsen performance depending on the number of objects and employed sorting algorithm. As I said before: Nothing’s absolute - profile first, then decide what optimization actually made a difference.

@thokra, Thanks, what you said there re the cost of the fragment shading in question is very helpful. As I’ll be using vertex-lit billboards for my characters and items, and fairly simple toon-outlined and either vertex-lit or cel-shaded geometry elsewhere, I’m guessing the fragment shader will not be very expensive as these things go. So as a first approach I will avoid unnecessary sorting and take it from there.

I should correct myself: In GL 4.3 (and I’m pretty certain in 4.2 as well), early fragment tests have to be explicitly enabled to really happen before fragment shader execution. This includes per-fragment depth tests. Otherwise the tests will happen after execution of the fragment shader.

I should correct myself: In GL 4.3 (and I’m pretty certain in 4.2 as well), early fragment tests have to be explicitly enabled to really happen before fragment shader execution.

No. The implementation is allowed to have early fragment tests, but only so long as the implementation detects that it would get the answer from late fragment tests. That is, if it wouldn’t matter when the test happens, then the implementation can freely do the test first.

The explicit early fragment test setting is for those times when you need to force OpenGL to do the test first when it might not otherwise do so. Before image load/store, fragment shader writing was based purely on its outputs. So the OpenGL spec could say that the depth test happens after the fragment shader, but an implementation can optimize it to be early in certain cases where you can’t tell the difference. The only time this mattered before was when you wrote to gl_FragDepth, and the reason for that moving the fragment test to late is obvious.

Once you have arbitrary image and buffer load/stores, you effectively prevent an implementation from employing the early depth test optimization. At that point, you need an explicit setting, because you need to change observeable behavior: you want a depth test to stop the fragment shader from writing things to images/buffers.

So you only need to explicitly specify the fragment test if you’re doing image load/stores.

Is there any mention of it in the spec? 14.9 and 15.2.4 in the GL 4.3 core spec and the small section in 4.4.1.3 in the GLSL spec don’t mention anything about that. That’s the placed I looked to confirm my argument because I thought 3 places would be enough to mention such a detail. :frowning:

Is there any mention of it in the spec?

There doesn’t need to be. The specification defines apparent behavior. If running the depth test before the fragment shader won’t affect anything you see, then it’s OK for the implementation to do so. That’s how implementations get away with guard-band clipping; the spec says that partially-visible triangles need to be clipped into smaller ones, but as long as you can’t tell the difference, it doesn’t matter if the implementation uses a short-cut.

Thus, as long as everything appears as though the depth test came after the fragment shader, then the implementation is fine. That’s why it’s turned off when the fragment shader writes to the fragment depth; because the appearance still needs to match what the spec says.

So back to the top post, how important is the order of sorting things… if you are not trying to “push it to the limit”? And what is the ideal order. If there is a consensus.

I ask because I have a desktop organization that sees the graphics firstly as instances, so each graphic is like a spreadsheet so that all of the instances can be rendered simultaneously. Within the graphic the pieces are sorted by material, but it would be kind of annoying to to try to cross reference them by texture instead. I figure a standard graphics card can keep a few textures ready to go, but I do not know. Sorting the graphics weakly by texture overlap would not be a problem.

And then I have per material shaders basically. So shaders are associated with textures. So a change of texture might mean a change of programs too. Does any of that sound like something worth wasting time or losing any sleep over?