State Objects

skynet · August 18, 2008, 9:20am

Since the specification of state objects seem to be one of the problems of LP, I want to add my 2 cents to this. The whole debate seems to be, what part of the state objets should be mutable/immutable.

Imutable state objects offer fast state switching, because once sucessfully created, the state objects are valid and can be used by the driver without a second thought. But they also imply, that for each distinct state an application needs, it has to keep a prepared state object around, possibly multiple of them specifying the same state (in different parts of the application not knowing of each other).

Mutable state objects offer more flexibility at the application’s side, but force the driver to check the current state at the next “critical” point (like a call to glDrawElements()) over and over again.

The dispute at ARB seems to be, what parts of the state are changed so often that they deserve mutability and what states are usually changed less frequent, so that their immutability would be less a burden to the application developer.

I propose a third way that lies somewhere inbetween. It sounds so easy that it might have been suggested earlier… so please bear with me

To put it easy, I suggest to have fully mutable state objects around and to introduce explicit “critical checkpoints”. Basically, its like telling the driver “I’m done changing the state or this object, please check it and then take it as current state”.

It could work like this:



// create and initialize the state object
GLuint depthstencilstate=0;
glGenTemplates(GL_CLASS_DEPTH_STENCIL_STATE, 1, &depthstencilstate);
glSetPropertyi(depthstencilstate, GL_DEPTH_TEST, GL_TRUE);
glSetPropertyi(depthstencilstate, GL_DEPTH_FUNC, GL_LEQUAL);
GLfloat depthrange[2]={0.0f, 1.0f};
glSetPropertyfv(depthstencilstate, GL_DEPTH_RANGE, depthrange);

// now make it "current", this is the explicit checkpoint for
// the validity of the 'depthstencilstate" object
glSetState(GL_DEPTH_STENCIL_STATE, depthstencilstate)
assert(glGetError()==GL_NO_ERROR); // new state accepted?

Now, this opens many opportunities. As application programmer, I might keep several state-objects around or only manipulate 1 object the whole time. State-objects might undergo even invalid states until glSetState() is actually called.

If glSetState() would keep a ‘Validated’ Flag around, the driver might store, if he already validated the state object or not, thus keeping multiple validated stat objects around might be good for an application performance wise.

glSetState() introduces also opportunities for a driver. Since invalid state objects are not accepted, the ‘Real’ driver state is (almost) never in an invalid state. It reliefs the burden from all drawing commands to check for the current state, if its invalid or not.

Another advantage is, that it would be very easy to create a “delta-state” from the current state (that a particular part of the code is not aware of or even uninterested in) and to implement a very efficient glPushAttrib()/glPopAttrib() replacement. Look at this:


GL uint oldstate=0;
GL uint newstate=0;
glGenTemplates(GL_CLASS_DEPTH_STENCIL_STATE, 1, &oldstate);
glGenTemplates(GL_CLASS_DEPTH_STENCIL_STATE, 1, &newstate);

//retrieve current state and copy it into "oldstate"
glGetState(GL_DEPTH_STENCIL_STATE, oldstate);
//again, retrieve the current state. 
// we could also introduce a "glCopyStateObject" function for this
glGetState(GL_DEPTH_STENCIL_STATE, newstate);

// now change only the depth test without touching the rest of it!
glSetPropertyi(newstate, GL_DEPTH_TEST, 0);

// introduce the changed state
glSetState(GL_DEPTH_STENCIL_STATE, newstate);

// draw some stuff
glDrawElements();
...
//now, restore the old state
glSetState(GL_DEPTH_STENCIL_STATE, oldstate);

It might be even possible that the OpenGL context provides “built in state objects” that allow instant access to some properties of them (either for monitoring or controlling something). The ability to specify read-only or write-only properties is also an cool feature. glSetProperty/glGetProperty provide great mechanism for that - they just ignore the call if you used the wrong access mode.

So, ARB guys… where were the problems of the programming models you discussed? I really can’t imagine that those state objects really lead to the fall of LP

[Edit]

I forgot to mention: glSetState() is unlike glBindTexture() or glUseProgram(). It will actually make a copy of the provided state object into the internal state. After glSetState returns, you can freely change the state object again, without hurting the actual context state.

elFarto · August 18, 2008, 11:03am

It seems like a good idea, but without knowing what problems the ARB had, it’s difficult to know if this is necessary.

Regards
elFarto

JoeDoe · August 18, 2008, 11:27am

For D3D10, some HW can convert immutable state object into chunk of command buffer (at creation time), and push it to the GPU when application bind this state object to pipeline. On some HW even internal GPU state object can be created, so when app bind state object, driver simply deliver pointer to state object, stored into GPU memory. Thus switching states is more efficient than in D3D9. And your solution have nothing with this system, e. g. validation occur at set time, but should be at creation time, you change part of the state object, thus GPU cannot precreate it and so on.

Fully immutable state objects has one problem - it’s hard to manage by programmer. Probably it’s better to keep two distinct mechanism in single API - fully immutable state objects for fast render code, and per-state switching for convinience, where fast rendering is not critical for application. What choosing - should be app developer decision.

Sorry for my English.

Rob_Barris · August 18, 2008, 11:36am

There is more to it - if you examine the history of hardware, over time you can spot things which were once data (simple register write) and how they tend to become state (part of a compiled program).

Every time the boundary moves, your assumptions about what should or should not have been part of an immutable state object definition may be challenged or invalidated.

You hit on another issue which is that with an all-immutable state object model, the work of state vector creation/tracking/purging is outsourced to all applications. For a beginner this is a steep hill to climb. It’s a big permutation space to navigate.

We did discuss a middle ground where state objects could be mutable but at some performance cost, and with opt-in immutability. IIRC this ran into conflict because LP wasn’t really designed with any mutability at all for its objects, and this looked like a wart.

An old tension is “fast enough” vs “optimal”. If you always provide only the optimal way to do something, you can arrive at the belief that only optimal programs will be written. But if you scare off some developers because it’s too hard to get off the ground, there is an unwanted penalty there. A beginner developer might be happy to take the fast enough path at first until they feel confident enough to go down the “optimal” path. Older example: immediate mode vs VBO.

(edit: I see JoeDoe’s comments are in a similar vein)

skynet · August 18, 2008, 11:57am

See, with my proposal it might be very well possible for the driver to create and attach those HW-command-buffers to the state object when it is first being validated. Any subsequent glSetProperty() on the state object would automatically invalidate those cached data, forcing a rebuild at the next glSetState() time. And glSetState() is never called without purpose! At this time, I know that I ‘finetuned’ the state object enough and _will’ use it now for rendering. That is why I suggested, to make it possible (but not necessary!) for the app to keep many state objects around. And if the compile-at-first-use convention is not preferred, one might even introduce another command that explictly prepares a state object for use without making it “current” at the same time.

Besides that, doing a brief look into the DX10 specs… how do you retrieve, say, the depth-test enable bit that is currently being used? You call OMGetDepthStencilState() and then…?

Korval · August 18, 2008, 12:01pm

The current OpenGL programming model can be seen as “fully immutable” - I can change almost everything anytime.

Um, immutable means “unchangable”. You’ve got it backwards.

skynet · August 18, 2008, 12:38pm

There is more to it - if you examine the history of hardware, over time you can spot things which were once data (simple register write) and how they tend to become state (part of a compiled program).

Every time the boundary moves, your assumptions about what should or should not have been part of an immutable state object definition may be challenged or invalidated.

But what happened now? In the fear of unknown hardware to come in the next 2-4 years, OpenGL3.0 sticks to a programming model that applies to 16 year old hardware.

A beginner developer might be happy to take the fast enough path at first until they feel confident enough to go down the “optimal” path. Older example: immediate mode vs VBO.

That is true. But its a bit like BASIC. Easy for beginners, but beginners become easily “spoiled”. Same applies to display lists. Many “professional” OpenGL programmers still believe that DLs are the best and fastest way to achieve something - I have recently used GLintercept to watch a BIG scenegraph API while compiling a display list for a cube. Yeah, they used a very elegant way to achieve “per-face-normal shading” for the sides of the cube but from the driver writer’s point of view it must be horrible to see this kind of mixed immediate-mode-vertex-array-rendering.

Better they learn the “good” way right from the beginning. And this can be achieved best by providing only the good way Convenience libraries can be easily put on top of that.

Anyway… how would my state objects evolve with the hardware?
I suggest, by introducing new classes of templates and declaring old ones deprecated.

In my example… lets say the new hardware separates depth and stencil state:

glCreateTemplates(GL_CLASS_DEPTH_STENCIL_STATE, 1, &object);

would fail with GL_DEPRECATED. Instead one must now call:

glCreateTemplates(GL_CLASS_DEPTH_STATE_31, 1, &depthstate);
glCreateTemplates(GL_CLASS_STENCIL_STATE_31, 1, &stencilstate);

elFarto · August 18, 2008, 12:43pm

Perhaps we’re going about this the wrong way. Take the blend modes for example. They have been, and currently are fixed functionality. Now it’s likely that these will become programmable like the rest of the pipeline.

So instead of having a special method for setting it, just make it into another shader. Now at the moment the creation of this shader would be fairly ridged, with basically srcOp, destOp and equation but later on this can be upgraded near seamlessly with Longs Peak’s templates. You would just supply the shader’s text instead.

LP also has the ‘composable’ feature, where you can choose exactly which shaders you want to make up your program so you would only need to create one of each blend mode.

Depth, stencil and blending can all use this pattern.

This method neatly sidesteps the need for state objects, by moving the required functionality into it’s proper place, rather than having lumped together.

Regards
elFarto

Xmas · August 19, 2008, 4:05am

This sounds a bit misleading, as on the API level the future direction is clearly “less state, more code and data”. I.e. some shader code plus uniforms/buffers/textures. State and state objects are going to shrink as deprecated features get removed.

But I guess what you meant was that, on the hardware level, state changes that used to be relatively cheap (a simple register write) now require a modification of shader code which may be a lot more expensive.

An old tension is “fast enough” vs “optimal”. If you always provide only the optimal way to do something, you can arrive at the belief that only optimal programs will be written.

I’m not sure that’s how it works. Any API can be used in a non-optimal way. If you only provide immutable objects, some people will end up creating new ones all the time. DX10 at least provides an explicit internal cache so different parts of an application can share the same state object without communicating with each other.

ID3D10DepthStencilState::GetDesc

I think that’s a bad example because it’s not the fault of display lists. The DL concept, grouping many GL calls into a single call, is a sound one IMO. The OpenGL implementation of DLs is a bit bloated, though, and optimisation potential is missed because they always start with the current state which is unknown at list compile time.

I believe OpenGL is lacking objects that represent the entire GL state. Why does it take more than a single call to set up OpenGL to render a mesh that always uses the same state?

Anyway… how would my state objects evolve with the hardware?
I suggest, by introducing new classes of templates and declaring old ones deprecated.

Is a fixed division into different kinds of state objects required? It makes sense to group blocks of state which exist multiple times, but wouldn’t grouping state into objects best be left to the application in the end?

skynet · August 19, 2008, 5:03am

So instead of having a special method for setting it, just make it into another shader.

In my opinion, one should refuse to spoil the current API for a hardware feature that might come but is not there yet. Best example are fragment shaders: Instead of saying “hey, we got register combiners - in 8 years we will have a fully programmable hardware, lets invent GLSL for them now”, the API for programming the shaders naturally evolved with the hardware. We got register combiners, texture shaders, fragment programs and finally GLSL. Each of it at the right time, each of it abstracting the hardware appropriate. If not done this way, you end up with missing features, incomplete implementations and software emulated paths.

When programmable blending is finally available, there will be an extension. The extension (nd the hardware) will evolve a bit and get into the core as soon as it is mature.

I think that’s a bad example because it’s not the fault of display lists.

The initial idea may have been good, 16 years ago. People have always been told “DLs are fast” and they still believe it. And since they never got removed, the tale went on from father to son. Maybe 3.1 removes them… nobody knows.

I believe OpenGL is lacking objects that represent the entire GL state. Why does it take more than a single call to set up OpenGL to render a mesh that always uses the same state?

Today, almost no mesh is rendered twice with the same state. You’ll always change uniforms, blendmodes, textures, colors etc…

Is a fixed division into different kinds of state objects required? It makes sense to group blocks of state which exist multiple times, …

It makes sense, if that grouping is reasonable for the hardware (we don’t know this, NV and ATI know). But since they acknowledged DX10, I assume that DX10’s grouping is not far away from the hardware.

but wouldn’t grouping state into objects best be left to the application in the end?

So, you suggest I emulate state objects in my App and inside them I use the old finegrained glEnable()/glDepthFunc()/glDepthMask() calls? I could even compile them transparently into display lists for faster state-switching
But then… I’d still use the old API with all its disadvantages for the driver. No. There must be explicit API support if grouped states is the way we should talk top the hardware today.

elFarto · August 19, 2008, 6:12am

A ‘shader’ really is the wrong word. Blending (aswell as depth and stencil) are just part of the pipeline. Vertex, geometry and fragment processing are parts of that pipeline.

All this idea does is evolve the idea of a GLSL program into the pipeline itself, with attachment points for all the different aspects of it.

Also, I’m not talking about creating a language for blend shaders. Just an object to represent that part of the pipeline. E.g:

GLtemplate template = glCreateTemplate(GL_BLEND_OBJECT);
glTemplateAttribt_i(template, GL_SRC_FACTOR, GL_ONE);
glTemplateAttribt_i(template, GL_DST_FACTOR, GL_ONE_MINUS_SRC_ALPHA);
glTemplateAttribt_i(template, GL_EQUATION, GL_FUNC_ADD);

GLobject blend = glCreateBlend(template);

This is basically what you’d need for any state object for blending. There are other parameters for separate colour/alpha values.

Where my idea comes in, instead of making some new method for combining state objects, we use the existing program object functionality.

Now, if blend shaders do make an appearance, it’s simple to integrate them in, e.g:

GLtemplate template = glCreateTemplate(GL_BLEND_OBJECT);
glTemplateAttribt_o(template, GL_SHADER_SOURCE, source);

GLobject blend = glCreateBlend(template);

Regards
elFarto

Xmas · August 19, 2008, 7:01am

The initial idea is still good. For some reason some people think display lists are an alternative to VBOs. But these are completely orthogonal concepts. Display lists have nothing to do with how you submit your vertices, they’re just a collection of OpenGL calls wrapped into one. You can have display lists that contain state changes only. I think deprecating display lists without providing an alternative to set many states at once is a mistake.

Today, almost no mesh is rendered twice with the same state. You’ll always change uniforms, blendmodes, textures, colors etc…

I don’t consider uniforms state. They’re data and should be stored in a buffer object anyway. Colours are uniforms in a shader environment.

It would be quite unusual for the majority of objects in a virtual world to change their appearance every few frames. How often do you change blend modes or textures for a specific mesh instance?

It makes sense, if that grouping is reasonable for the hardware (we don’t know this, NV and ATI know). But since they acknowledged DX10, I assume that DX10’s grouping is not far away from the hardware.

Surely not all hardware is identical. And what if that grouping is not reasonable for the software? I believe there is no need for the API to exactly map to some specific hardware in this case. What matters is the complete state vector at the time of a draw call, not the bits that make it up.

So, you suggest I emulate state objects in my App and inside them I use the old finegrained glEnable()/glDepthFunc()/glDepthMask() calls?

No, I’m suggesting state objects in the API which can contain any part of the complete state vector the application wishes to manage.

Korval · August 19, 2008, 10:54am

I don’t consider uniforms state.

Whether you consider them state or not, they are state.

Xmas · August 19, 2008, 4:28pm

That’s correct. What I wanted to express is that uniform values don’t have to be part of a state object. Though the important bit is that uniforms should be mutable, whether they’re part of a state object or not.