Skeletal animation shader, use one compiled program or multiple?

I have a skeletal animation shader in which I send all bone translations as uniform matrices to the shader.

I use a single shader program right now for multiple skeletons. This means though that if a skeleton doesn’t move then I still need to send the gpu the translation matrices since a single shader program is using uniforms for the data.

My question is is this how people typically do it? Or do others create multiple instances of the same shader so they can potentially send less data to the GPU if a skeleton isn’t moving?

Mostly I just want to know how most people use shader programs for this (one program vs multiple) so I know whether I am or am not doing something dumb here.

I think most people just use one program and change the uniforms. If this is faster than switching programs is not generally answerable, as this depends on your GPU and driver.

If you don’t have to support old OpenGL, it may be simpler and faster to keep your bones in Uniform Buffer Objects. That way the data is (hopefully) kept in GPU memory and you can reuse it as often as you want, even with different programs.

Definitely start with only one shader.

You can then do measurements to find if it is a performance problem. The test is simple, run the shader without updating the uniforms, and compare the time.

It is far from trivial, and requires a lot of experience, to guess where the performance bottlenecks are in OpenGL.

I’m unsure why your saying that. Let me ask you: At any given time, are your skeletal characters only playing back pre-modeled animation tracks (i.e. sequences of joint pose keyframes)? If so, then all this is uniform data that can be uploaded to the GPU once and left there (in textures, UBOs, etc.). There shouldn’t be any re-uploading involved (unless you’ve just got such a huge animation palette that it can’t fit on the GPU). Just re-use it (bind it for your shader to use). With that, what you then need per character instance is just the tiny amount of data needed to know where to lookup into the joint pose transform palette.

OTOH, if you are doing dynamically-generated animation using IK, physics, complex blending, etc. then you may need to upload considerably more per character depending on where you do those calculations.

But to your specific question, yeah, generally prefer fewer program changes to fewer uniform changes. Uniform changes are relatively cheap. There are a number of ways to manage that though, such as state sorting. Where you want to think about having separate programs is when you’d otherwise have a lot of divergent fragment shader code across a filled area (think “if” or “for” statements where the value of the condition is not very coherent across polygons). This can result in many of your GPU cores wasting their time in “no-op” mode.