How Expensive are redundant State Changes?

Title says it all really. Are redundant state changes (i.e. setting state that is already set) caught by the driver before causing a hardware flush? I kind of assume they are, so the only overhead in setting redundant state is the API call. Are there any situations where redundant state changes can be very bad?

Cheers,

Henry

Don’t send redundant state changes. Just don’t.

Either design your code to avoid them or wrap GL functions to mirror state in your app. Test whether you’re doing a redundant state change, and if you are, don’t call into GL.

  • Matt

Ok, deleted the previous answer since it was obviosly the wrong one.

Since I’m now forced to write an state mirror layer now , I was wondering the reason why. I would quess that the OpenGL states are mirrored in the system memory for fast access, but the api-overhead and maybe some other faces are making the redundant changes noticeable slower than using own state checker?

Well I had a somewhat working prototype for mirroring the states, so I quess it’s time to dust it off and put it back to use

They can end up being very bad, I know from experience. The last game I worked on I was brought in to rewrite the OpenGL engine. When I came on board the game was getting maybe 2 fps on a PIII/500 with GeForce256. When I got done writing the new engine for it the game would stay pegged at 60 fps on this same setup. The biggest thing I did as far as improving upon the old engine in terms of speeding things up was that the old one made redundant state changes up the yang-yang, and my new engine makes no redundant state changes. So yes, they can be very bad. Never do that!

Thanks for the replies. I’d be interested in knowing, if possible, why they can be so bad.

Thing being, I now may have to litter if…else clauses all around my renderer, or actually write a wrapper for GL, which I don’t really want to have to do - adding another level of indirection to API calls doesn’t seem like a good idea, but I will if redundant state changes turn out to be really bad.

Cheers,

Henry

Writing a wrapper for the needed state changes is probably better than just throwing a bunch of if statements around your code. First of all it’s easier to read the code and you still must pass the information about the state around in some meaningfull way and a singleton type of class comes really handy for the job, if you use an object oriented language.

As for adding another abstraction layer on top of OpenGL, well I quess it’s pretty much inevitable, since in some ways even if you keep track of the state in your code it is no different than an abstraction layer from a functional point of view

And obviously you probably don’t need to mirror all the OpenGL states, just the ones you really need to keep track on. This should speed up the creation of the wrapper significantly since you can easily add later more states to be mirrored. You could also add functions that handle a group of state changes by once and to possibly reduce the number of case statements.

  • Janne

Every time you call what looks like a “simple” OpenGL function like glBlendFunc, the following would happen with a “typical” driver, not necessarily representative of all drivers:

  • You call into a function pointer table in opengl32.dll. This table jumps to the actual function in the driver.
  • The driver accesses the thread-local state to get the current GL context (GC). This uses direct access to OS data structures; it’s cheap (1 instruction) on NT, but a little more expensive on 9x (3 instructions, I think).
  • The driver checks whether you are inside a Begin/End.
  • The driver checks whether the source blend factor is valid.
  • The driver checks whether the destination blend factor is valid.
  • The driver copies the src and dst blend factors into shadows inside the GC.
  • The driver sets a bit to indicate that some rasterization state has changed.

Later, when you do a Begin, DrawElements, etc.:

  • The driver looks at what dirty bits are set.
  • If any dirty bits are set, the driver makes sure that it can still render in HW.
  • Because the rasterization dirty bit is set, the driver grabs all the rasterization state and munges it into the way the HW expects to be told how to do rasterization (often a packed bitfield or somesuch). The driver sends this to the HW.

Variants include not setting a dirty bit and instead sending the state immediately to the HW, or not shadowing the state in the driver at all, but the summary is that a redundant state change is simply a bad thing.

I haven’t even considered the possibility of a stall inside the HW. We do a pretty good job of making sure that such stalls are minimal, but sometimes they still exist.

If you avoid sending redundant states, on a typical driver, you will at avoid at least a check for Begin/End and an error check on each parameter. That by itself is a pretty nice savings.

Most drivers don’t actually look for redundant state changes explicitly because this would penalize smart apps that don’t send redundant state changes to help dumb apps that do.

  • Matt

I am not very clear about the wrapper Janne refered, how can those if-else statement be avoided? I must check the current value to see if I can skip setting it. Can you explain it a bit more?
Does those switch state change cost heavily? something like glEnable*() and glDisable*().

[This message has been edited by Nil_z (edited 01-17-2001).]

Also, when I’m talking about wrappers, it’s nothing more complicated than:

void MyBlendFunc(GLenum src, GLenum dst)
{
static GLenum curSrc = GL_ONE;
static GLenum curDst = GL_ZERO;

if ((src == curSrc) && (dst == curDst)) {
return;
}
curSrc = src;
curDst = dst;
glBlendFunc(src, dst);
}

But you can do other clever things too. Here are two examples I can think of:

void MyEnableLights(int n)
{
int i;
static int lastN = 0;

if (n < lastN) {
for (i = n; i < lastN; i++) {
glDisable(GL_LIGHT0 + i);
}
} else if (n > lastN) {
for (i = lastN; i < n; i++) {
glEnable(GL_LIGHT0 + i);
}
}
}

enum BLEND_MODE {
NO_BLENDING,
TRANSPARENCY,
TRANSPARENCY_ADDITIVE,
ADDITIVE,
SRC_TIMES_DST,
BLEND_MODE_COUNT,
};

GLenum blendModeTable[BLEND_MODE_COUNT][2] = {
GL_ONE, GL_ZERO,
GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA,
GL_SRC_ALPHA, GL_ONE,
GL_ONE, GL_ONE,
GL_ZERO, GL_SRC_COLOR,
};

static void MyBlendFunc(BLEND_MODE mode)
{
static BLEND_MODE lastMode = NO_BLENDING;

if (mode == lastMode) {
return;
}

if (mode == NO_BLENDING) {
glDisable(GL_BLEND);
} else {
if (lastMode == NO_BLENDING) {
glEnable(GL_BLEND);
}
glBlendFunc(blendModeTable[mode][0], blendModeTable[mode][1]);
}
lastMode = mode;
}

I take no responsibility for any bugs in this code. You get the idea, though. OpenGL is a low-level graphics API. That gives you the freedom to wrap any higher-level API you like on top of it, and that higher-level API can take responsibility for ensuring that there are no redundant state changes.

  • Matt

Originally posted by mcraighead:
I take no responsibility for any bugs in this code.

Got an error on this line:

GLenum blendModeTable[BLEND_MODE_COUNT][2] =

‘BLEND_MODE_COUNT’: undeclared identifier

Eric

Well, I did catch one bug later, in MyEnableLights: it doesn’t update lastN.

Not sure why BLEND_MODE_COUNT is undeclared. It’s in the enum!

In any case, pay attention to the ideas, not the specific code.

  • Matt

My fault, Matt ! I had modified the enum (don’t know how…).

Anyway, I understand the principles…

Actually, I had never ever thought about redundant state changes… The only ‘optimization’ I am doing right now is materials sorting (i.e. glBindTexture + glMaterial).

Do you think it would be worth to check that type of thing for glMaterial as well ? I mean, it happens that when changing material, you switch texture, ambient and specular but not diffuse (sounds stupid actually but that’s an example !).

Should I check for (lastDiffuse==newDiffuse) similar to your blend mode ???

I guess I have to start writing OpenMY (hehe, replace all “gl” calls by “my” calls !).

Regards.

Eric

does it really worth spending the time to find out if state switching should be done or not? i mean, does state changes cost more time than thos if’s and for’s?

is it true that I Can leave blending always enabled and when calling my disableBlending function put it to ONE, ZERO?
Is it the same in terms of performance?
Or is it better the Matt example?
In the last pages of the Red Book, when talking about invariance it uses this example but I don’t know if the performance will suffer.
I’m using it and it seems to work properly.

Are redundant glEnable, glDisable, glEnableClientState and glDisableClientState so expensive too (besides the dll overhead)?
The thing is I want to have a balance between fast code and well structured code (dont want to have n state variables in my code all over the place).

yes i backup what matt saiz i have found a speedup for doing the checking myself eg

instead of going
glEnable(GL_LIGHTING);
draw car
glEnable(GL_LIGHTING);
draw house

using this will result in a speed increase

if (!GL_STATE_is_the_lighting_enabled)
{ glEnable(GL_LIGHTING);
GL_STATE_is_the_lighting_enabled = true;
}
draw car
if (!GL_STATE_is_the_lighting_enabled)
{ glEnable(GL_LIGHTING);
GL_STATE_is_the_lighting_enabled = true;
}
draw house

i know theres a bit more typing envolved but its typing u can do in braindead mode
i suppose u could make a GL_STATE structure and chuck all the opengl state statuses in there.
but persoanlly i prefer globals

Yes, glEnable/glDisable are also slow. Think of how many enable/disable states there are in OpenGL (in unextended GL, there must be about 50 or so), and then consider that they are not contiguous enumerant values. That means you end up with a big switch statement. Not only that, but sending a redundant Enable or Disable will cause us to set dirty bits, etc., causing more work at a later time.

I will repeat what I said before. If you care about performance, follow the advice of my previous post:

[b]
Don’t send redundant state changes. Just don’t.

Either design your code to avoid them or wrap GL functions to mirror state in your app. Test whether you’re doing a redundant state change, and if you are, don’t call into GL.
[/b]

Obviously, it is preferable to design your code such that you never send a redundant state change in the first place rather than to use a wrapper.

Any wrapper will have a break-even point. If more than X% of the GL calls it replaces would have been redundant, the wrapper will be a speedup. Depending on the implementation and the wrapper, I’d generally peg X between around 2% to 20%. This is not a scientific estimate. You might want to do your own tests. But if, say, 50% of your Enable or Disable calls are redundant (which could be true if the values were completely random), a wrapper is a CLEAR win.

Note also that you can make wrappers be inline functions, and if you’re wrapping Enable/Disable, you should have a separate function for each enable you wrap so you don’t need a big switch statement:

int lightingEnabled = 0;

inline static void MyEnableLighting(void)
{
if (!lightingEnabled) {
glEnable(GL_LIGHTING);
lightingEnabled = 1;
}
}

inline static void MyDisableLighting(void)
{
if (lightingEnabled) {
glDisable(GL_LIGHTING);
lightingEnabled = 0;
}
}

Or, alternately, if you prefer:

int lightingEnabled = 0;

// because this is an inline function, the compiler should optimize the case where enabled is a constant value 0 or 1
inline static void MySetEnableLighting(int enabled)
{
if (enabled == lightingEnabled) {
return;
}
if (enabled) {
glEnable(GL_LIGHTING);
} else {
glDisable(GL_LIGHTING);
}
lightingEnabled = enabled;
}

Watch out, if you sometimes bypass your wrapper, your program will get confused. If you wrap a GL state function, ALWAYS use the wrapper.

  • Matt

I don’t know what my whole obsession with “static” for the functions’ declarations was. By no means must those functions be static.

  • Matt

About “Watch out, if you sometimes bypass your wrapper”, that is the kind of things I think can make my code less clean, and more prone to bugs, but I get the point “state changes are expensive”.
I have read alot about “minimize state changes” stuff, but really got the idea of how much of a performance hit is in this topic.

[This message has been edited by coco (edited 01-18-2001).]

Would that inline be nothing more than

#define EnableLighting(enabled) (
if (enabled != lightingEnabled)
if(enabled) glEnable(GL_LIGHTING);
else glDisable(GL_LIGHTING);
lightingEnabled = enabled;

That should work or not? Does the precompiler do that find and replace faster? I mean, it simply has to cut and paste, where the compiler needs to replace the function call with the actual function. Well, okay…

[This message has been edited by Michael Steinberg (edited 01-18-2001).]