State Management

There seems to be very little info beyond the basic advice about minimising state changes. Is this an area where developers like to protect their secrets? Anyway, I would appreciate any and all comments on the strategy I am planning to implement in my (relatively simple) scenegraph (sg)…

My plan is a stack based state machine. Each node (in reality a selected few) in my sg “pushes” it’s desired states onto the stack. In doing so the current “locked” states will override new ones and “dirty” flags will be set for those states that have actually changed. Only dirty states will actually trigger the relevent OpenGL call. Apart from special conditions, such as at the start of rendering a frame, I have three situations during sg processing where states may be pushed/ popped: Parent-to-Child (PC), Child-to-Child (CC) and Child-to-Parent (CP).

The CC step is more correctly a CPC assuming parent states always override a previous child states (I don’t know if this would be considered “correct” in general sg theory). This I think needs special attention otherwise redundant state changes are possible if a parent sets, say, lighting on as a default for it’s children but C1 sets lighting off and then C2 also sets lighting off.

I’m also hoping that this stack based approach will lend itself to future optimisations, at least identifying candicate states for on-the-fly display list creation (maybe by caching the stack from the previously rendered frame).

Does this strategy sound over-compilicated, over-simplified, flimsy? Am I simply deluding myself and it’s just a short term hack that would have to be scrapped in the near future? I’m not asking anyone to give up hard earned secrets but I would appreciate a nudge in the right direction. Thanks in advance for any advice offered.

Shadowing states in software is generally a good idea, scenegraph or no, to prevent the same state from being called twice for no good reason. But then be very wary of code that doesn’t use your state shadowing mechanism – a typical cause for messed up state is simply losing track.

However, this isn’t what people generally mean by “state minimization.” True minimization requires effectively re-ordering the scenegraph such that objects of similar state are drawn together. Putting explicit state nodes in your scenegraph makes this even harder.

The most common solution (and probably the main reason this post belongs here instead of in the “higher level API” topic) is to use draw bins. The basic idea is that you create N draw bins, each with a unique combination of useful state, including texture, lighting, shader, etc… based on your actual data.

So any objects of combined state A will go in bin A, remembering the full world-space transform for each object. Then, during rendering, you iterate over the bins, where each bin sets up its desired state and in turn iterates over its visible objects, changing only the transform per object.

This truly minimizes state changes and has other benefits. If you can organize the vertex and index data correctly, you can make each draw bin its own vertex array, thereby reducing the number of glBindVertex and VAR/VBO setup calls.

And if you can put some or all vertices in a common coordinate space (if they’re relatively static), you can even reduce the number of separate glDrawElement calls by rendering multiple objects with a single transform. In practice it may be faster to batch and draw whole bins this way than to iterate on only the visible elements.

Anyway, some people have used the binning idea as an argument against using scenegraphs at all. Why put your data in a scenegraph when you need to render in bin-order?

But I always point out that the bin approach is actually an example of a second parallel scenegraph organized by state (where the first is organized spatially, for efficient culling).

Consider that an array of such bins is at least a very flat scenegraph. But most often, you can get some additional state reduction by grouping similar bins together and sharing state between bins. So if bins A-F use standard T&L vs. some shaders for the others, you can effectively create a “standard T&L group” for A-F which sets the state once and iterates from A to F. Similarly, G-M may form a “shader group” and so on. With a couple of different states to sort by, you wind up with a flat/wide tree of states, with your draw bins at the leaf nodes. This is what I’d call a “state graph,” even if you only ever express it as a flattened list that iterates the depth-first traversal order.

This does not negate the desire for having a “normal” scenegraph organized by transform. Those are great for articulating objects and spatial culling. But the most speed critical part is probably going to be the draw-order problem. And so that’s the most important thing to focus on for most apps.

Avi
www.realityprime.com

You can always search for old topics. Here is one discussion.
http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/001787.html

BTW, you dont have to bother with every single state. Just deal with the most expensive ones like rebinding textures, resetting a combiner, rebinding a VP or FP, …

None of this is a secret. We discuss them here.

Thanks both for the info. Very useful and plenty to think about. I had read about the use of bins before but considered them an overkill for a small engine, although it will probably a good “learning” experience to implement anyway. Another reason I had not considered them is it’s difficult to tell when a design benefits from one approach over another. I suppose this is where the “art” of design enters into it and, again, implementing it is the best way to learn. Have to admit I had not considered extraneous state changes, which may happen in my case because I have an option for a node to trigger a call to a “raw opengl” function. Thanks again for your replies.

PS: Yeah, I had trawled all past discussions. Problem is that while there are useful nuggets of info, they are scattered far and wide.

btw since in most scenes the majority of the meshes is static and by that I mean it is added to the scene (not frustum) during initialization and stays there(like terrain, rocks, trees etc. and unlike players, monsters, etc.) wouldn’t it be reasonable to sort the meshes by hand(using a piece of paper not code) and load them into the scene in the ‘correct’ order. Assuming the structure used to hold entities/meshes preserves order (lists,queues,graphs all do that) the meshes should remain ordered pretty well even if some are culled away. As for the dynamic meshes wich get added and removed repeatedly, well they’re not that many (often just a couple visible each frame) so it wouldn’t matter. This scheme might probably not provide the best performance but it will be close and it should the simplest/easiest to code (well you don’t have to code it) and bug-free method possible.

Perhaps this should go in the scene graph forum, but it’s mostly on topic, so what they hey :slight_smile:

Zen: while you could do that, it would be a pain to sort these things by hand, and remember to re-sort them when you change your data (which is ALL THE TIME). Thus, it’s much better to sort on load, or at least sort on data compilation time, as a pre-process. Code tends to do much better at repetetive tasks than humans.

I would sort, in this order:

  1. By (fragment, vertex) program pairs, where fragment programs are the high order bit.
  2. By vertex buffer object (set) (in the case of instancing, assuming there are many multi-issues)
  3. By texture set-ups.

If you’re using multi-pass techniques that use different shaders for the different passes, you might want to consider running each pass as a separate pass over your entire scene (!) Else you might as well not sort.

jwatte: well I may have exaggerated a bit with the paper instead of code bit but what I wanted to make clear is that the sorting should not be done by the engine code. For example say we have a terrain engine. Most of the meshes are the terrain itself trees, rocks, buildings, the sky, etc. Now this is all put together by some sort of map editor. You could let the editor do as detailed a sorting as you wish and use an output file that preserves this sorting e.g. a script like:

beginmap();
loadsurf(mesh1,shader1);
loadsurf(mesh2,shader1);
loadsurf(mesh3,shader2);
loadsurf(mesh3,shader3);
endmap();

These surfaces would stay in sorted order as the only thing that might change their order would be temporary skipping of a few surfaces due to culling. It might not be perfect sorted order but it should be good enough and besides those other methods don’t ensure perfect order as well.
Of course surfaces that get linked/unlinked to/from the scene (not due to culling but because they were not and now are part of the scene, e.g. a new player entered the game) will not be sorted (or they will have to be sorted at insertion time) but they should be relatively few and not hurt performance too much. Does this sound better?

Edit:

Thus, it’s much better to sort on load, or at least sort on data compilation time, as a pre-process. Code tends to do much better at repetetive tasks than humans.

That’s exactly what I mean. And if you load/compile your sata in sorted order and use a structure that preserves this order for traversing and rendering the scene you shouldn’t need to do anything else (like binning or whatever).

[This message has been edited by zen (edited 10-23-2003).]

Originally posted by zen:

That’s exactly what I mean. And if you load/compile your data in sorted order and use a structure that preserves this order for traversing and rendering the scene you shouldn’t need to do anything else (like binning or whatever).

Some older scenegraph systems did rely on the modeler to create the desired rendering order by manually placing objects in the scenegraph in the proper traversal order. Some other scenegraphs did/do sort on the fly for only the visible objects.

But in most applications, the visual database changes over time, either during runtime or at the very least for one or more iterations during the deveopment process.

Do you want to force some poor human (you!) to do work the computer could do in no time at all?

If you create a state vector and sort on load, you’re almost done. You have your state order. Now, just be smart about not setting the same state twice and do skip states which have no visible geometry to them. That works.

But you generally need some way to integrate a cull result (presumably unsorted by state, perhaps not even spatially sorted) and a draw list. Simply iterating over a state-sorted list of objects, skipping the invisible ones, is a losing proposition when the # of total objects grows large. This is why some scenegraphs chose to sort the visible list every frame. In many cases, it was more efficient.

Binning and state graphs have other advantages as well, such as making retargeting to different hardware easier. Setting individual bins to handle multi-pass vs. multi-texture, for example, is pretty simple. Bins also have some advantages when dealing with VAR/VBO, especially if you can combine/coalesce multiple objects with similar state.

Anyway, it all comes down to having the right state order and combining/coalescing where appropriate. Bins, state graphs or no, is a decision of how much intelligence you want to build in to the management of that state order. In a simple app, not much is needed.

Avi

[This message has been edited by Cyranose (edited 10-23-2003).]

I think I read here that the driver resolves redundant state changes.

JD: that depends on what state, and what driver, and what “resolve” actually means.

zen: I bet you a donut that if you run a sorting pass dynamically after gathering all geometry for each rendering pass, it won’t show up in the top 100 functions in a profile. Whether you do it dynamically or on load isn’t much of an issue – structure it the way that works best for you!

If you can do things on load (i e, you have a “loading” screen, not a continuous streaming world) then you can also do things like glom multiple mesh instances together by pre-transforming them into the same model space, to reduce state changes that way. Changing modelview is a moderately expensive operation.

If you want to be really hard core about it, you split your world in 50 meter chunks, and compiles one shader that contains all the geometry and texture state for that chunk (or use a pre-compiled shader with varying texture state). Then each 50 meter chunk is a single vertex array and single render state.

zen: I bet you a donut that if you run a sorting pass dynamically after gathering all geometry for each rendering pass, it won’t show up in the top 100 functions in a profile. Whether you do it dynamically or on load isn’t much of an issue – structure it the way that works best for you!

It is not so much performance I’m worried about but complexity. I guess doing the sort per-frame wouldn’t be that bad but I doubt it would make that big of a difference in many cases.