Bones Skinning!

Hi all!

I hope ill be clear enough to explain my question…

The situation is that im trying to feed bones weights values to a vertex shader… to do so, im using one vertex attribute… it all make sense to send one set of weights through a vertex attribute… considering many bones can affect a single vertex, im confused about sending more then one set of weights values.

How to:

1 – use one vertex attribute per bones.
2 – use the same vertex attribute but send more then one set of values in it.

Considering the solution number 2, since the vertex attribute values are suppose to be in the same order as the vertices… how do you organize the values? And especially how do you retrieve them? perhaps there is a simple solution to this… I ll really appreciate if someone could help me figure this out.

thx

In general, programs adhere to a maximum of 4 bones affecting one vertex. If you wish to push beyond this limitation (and I strongly suggests that you investigate the reasons why you want to. Those wanting the feature should be required to explain themselves, and the explanation had better be good), it’s quite simple: use another attribute.

Your code then loops over the components of attribute 1, then loops over the components of attribute 2. If the weight is 0, you do the math anyway (which is one reason to limit it to 4).

Got it, thx korval!

I think I just found something really interesting, if it is true, you guys probably know this for awhile now… couple mounts ago, I did try using vertex attributes over id 15… and it didn’t work on nvidia cards back then… im currently using 91.47 driver and it seams to work now… I even tried up till id 60 and it did work…(good news!!! O please don’t burst my bubble)…. am I taking crazy pills or we got upgraded?

Thx again

Did you actually use 60 indices, or just index #60?

#60 as a test with tangents attributes and it worked…

It is said on the Orange Book that attributes ids should be kept low anyway for performance reasons.

But in any case, you can not actually use more ids. It seems they just enhance the driver to detect which ids are really used.

Can you think of a more elegant way to use the attributes? I ve tried:

attribute float a_Weights[2]; technique, but it did not work…

attribute float a_Weights0;
attribute float a_Weights1;

uniform mat4 u_BoneMatrix[2];
uniform int u_BoneCount;

void main(void)
{
	vec4 l_position = vec4(0.0, 0.0, 0.0, 0.0);
	for (int i = 0; i < u_BoneCount; ++i)
	{
		if (i == 0)
			l_position += a_Weights0 * (u_BoneMatrix[i] * gl_Vertex);
		else if (i == 1)
			l_position += a_Weights1 * (u_BoneMatrix[i] * gl_Vertex);
	}

	gl_Position = gl_ModelViewProjectionMatrix * l_position;
}

You can pack the weights into one vec2 attribute.

If you are rendering character that has skeleton with multiple bones, it is useful to upload several matrices (as many as you can) and send bone indices together with the weights using a single attribute (for the two bone case) or in separate attribute (for four bone case).

Hi Komat, this is interesting….

it is useful to upload several matrices (as many as you can) and send bone indices together with the weights using a single attribute
Did you mean one matrix per weight value?

attribute mat4 Weights;

Weights[0] = bone1 weight value
Weights[1] = bone2 weight value
Weights[2] = bone3 weight value
…

It brings up another question… im using VOBs… set to DYNAMIC_DRAW is there any ways not to update glBufferData each frame when updating the weights? Im asking this because packing the data each frame to matrix4 would be somehow heavy process.

edit: nvm the last question… i just switch from ffp skinning to glsl… and it is does not require to be updated each frame anymore…

Originally posted by Golgoth:
[b]Did you mean one matrix per weight value?

attribute mat4 Weights;

Weights[0] = bone1 weight value
Weights[1] = bone2 weight value
Weights[2] = bone3 weight value
[/b]
Something like:

attribute vec4 Weights;
attribute vec4 Indices;

uniform mat4 Matrices[ 20 ] ;

void main( void )
{
   vec4 position = vec4( 0.0 ) ;

   for ( int i = 0 ; i < 4 ; i++ ) {
        mat4 matrix = Matrix[ int( Indices[ i ] * 255.0 ) ] ;
        position += Weights[ i ] * ( matrix * gl_Vertex );
   }

   gl_Position = gl_ModelViewProjectionMatrix * position ;
}

humm… it all seams to make sense, but im not quite sure what you meant there…

is the loop above is for 4 bones?.. did you meant:

bone1 = Weights.x;
bone2 = Weights.y;
bone3 = Weights.z;
bone4 = Weights.w;
Indices = bone ids; ?

what is the 255.0 stands for? offset of some kind?

why would u have 20 matrices? brings up the possibilities to 20 bones at once?

thx for youe patience, i guess im a slow learner :slight_smile:

Originally posted by Golgoth:
[b]
is the loop above is for 4 bones?.. did you meant:

bone1 = Weights.x;
bone2 = Weights.y;
bone3 = Weights.z;
bone4 = Weights.w;
Indices = bone ids; ?
[/b]
Yes. For vec values the Weights.x is the same as Weights[ 0 ].


what is the 255.0 stands for? offset of some kind?

This is written for situation when the bone indices are sent to the shader using four normalized unsigned bytes (GL_UNSIGNED_BYTE) to save the space so the 255 is for conversion from the normalized <0,1> range the shader gets into the original <0,255> range of the unsigned bytes.


why would u have 20 matrices? brings up the possibilities to 20 bones at once?

It allows to draw mesh that has up to 20 bones in one draw call. Each vertex can be skinned by up to four bones from those 20 bones.

Oh, great great, im starting to see it sparkle!

If we send matrices in the right order can we expect it to work without sending the indices?

	for (int i = 0; i < u_BoneCount; ++i)
		l_position += Weights[i] * (Matrix[i] * gl_Vertex);

A little off topic, but im confused about one things, I still can use your wisedom here if it is not to much to ask.

my Envelope class looks like this. those Envelopes are the data holders for the bone weights. Regarding to your proposal, I ll have to go trough all of them and pack the data in a vector4, make a vob and send them to the shader.

class Envelope: public Object
{
	DeclareClass(Envelope, Object);

	public:

		inline Envelope(): Object(),
			m_Bone(NULL) {}

		inline ~Envelope()
		{
			m_Weights.Delete();
			m_Bone = NULL;
		}
		inline Bone *GetBone()				{return m_Bone;}
		inline Array<Float> &GetWeights()		{return m_Weights;}


		void SetBone(Bone *in_bone);

	protected:

		Bone *m_Bone;
		Array<Float> m_Weights;
};

As you can see, I have no structure of Bone Chains per say… so I cant really declare several set of weights in an Array<Vector4> anywhere for now. So I guess ll have to do it on the fly.

for our case, how would you pack 4 Array<Float> in one Array<Vector4>? In fact, do I really need to? Is there any way tu use my Array<Float>s and send them to the shader as vec4 without packing them first in a Vector4? my last choice would be to loop through all of the weight values and assign them to a Array<vector4> but im hoping to find a smarter way to do that.

thx

Originally posted by Golgoth:

If we send matrices in the right order can we expect it to work without sending the indices?

Yes. The main purpose of the indices was to increase speed of rendering of meshes with high number of bones and bone combinations.


for our case, how would you pack 4 Array<Float> in one Array<Vector4>? In fact, do I really need to? Is there any way tu use my Array<Float>s and send them to the shader as vec4 without packing them first in a Vector4?

Because each bone has its own array of floats, there is no direct way to send weights for several bones as one vec4. You have to pack the values into different array however you can store that array somewhere within the mesh and update it only when the envelopes change.

Thank you O grand wizard :slight_smile:

im still working on a mega shader, so, to cover the widest range of possibilities, let say that we ll be sending 1 mat4 instead of a vec4 to store weight values at any time… which would be a 16 bones cap. even if we re only using 1-4 bones… how bad would the penalties be? none, we can leave with it… on the edge or it is insane, forget about it?

Originally posted by Golgoth:

im still working on a mega shader, so, to cover the widest range of possibilities, let say that we ll be sending 1 mat4 instead of a vec4 to store weight values at any time… which would be a 16 bones cap. even if we re only using 1-4 bones… how bad would the penalties be? none, we can leave with it… on the edge or it is insane, forget about it?

You need to measure it in your case to find if it is acceptable.

There are several types of cost:

  • Cost of the resources used. The number of attributes and total size of uniforms accessible from the shader is limited. Number of shader instructions is also limited. It is possible that with the mega shader you hit such limit and the shader will fail to compile.

  • Runtime cost on the GPU. If you are not limited by vertex processing speed, doing some unnecessary work inside the shader will probably not matter on HW with separate vertex shading units. On the latest GPUs with unified architecture, unnecessary work within the vertex shader might reduce performance of the fragment shading.

  • CPU/GPU cost associated with API calls and GPU state changes. If you upload many values and use them only for few vertices, it will consume more CPU power and the GPU might not operate on its full efficiency.

Alright, I ll make some tests… I just got everything in place and it is working… there is another thing that we have discuss in the past.

those combination works:

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights[0] * (u_BoneMatrix[i] * gl_Vertex);

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights.x * (u_BoneMatrix[i] * gl_Vertex);

for (int i = 0; i < 2; ++i)
	if (i == 0)
		l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);

but,

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);

Return error C1011: cannot index a non-array value

You ll say it is not to much of a problem since I can use:

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights[0] * (u_BoneMatrix[i] * gl_Vertex);
else if (i == 1)
l_position += a_Weights[1] * (u_BoneMatrix[i] * gl_Vertex);
and so on…

with all the tests I ve made I came to the conclusion very quick that, let say I use else if (i == 2516) here, which I ll never going to meet of course… but, but ,but, I ve find out that what is inside the condition is going to take process time even though the condition returns false, it is going to take as much process as if it were returning true. Weird? yeah tell me about it, this is a really painful downside for me. Hope im clear enough in my explanation because this is really slowing me down regarding the mega shader design architecture. since this issue was tested with gl_light[x] variables I ll try more tests with this current example, because the same thing will happen if only one bone is being process here.

thx again

Originally posted by Golgoth:
[b]
those combination works:

for (int i = 0; i < 2; ++i)
	if (i == 0)
		l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);

but,

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);

Return error C1011: cannot index a non-array value
[/b]
The vec4 is not real array. The a_Weights[0] is just different way how to write a_Weights.x and the index can not be changed dynamically (hw does not support that). When the number of loops is known like in the first situation, the compiler will unroll the loop appropriate number of times and will insert .x|.y|.z|.w masks corresponding to individual values of i. In the second situation the compiler can not do that because it does not know how big the u_BoneCount can be and you will get that error.


but, but ,but, I ve find out that what is inside the condition is going to take process time even though the condition returns false, it is going to take as much process as if it were returning true. Weird?

This is even worse in fragment shaders. The GPUs are based on massive parallelism and do have troubles when instruction flow differs between individual pixels or vertices so the drivers often try to avoid dynamic jumps, calculate everything and simply ignore results of calculations that should not happen.

You are not understanding the process that you’re trying to do.

You have a mesh. Artists created this mesh. They attached bones to this mesh, and weighted the vertices to these bones.

So, what you have in your tool is the following:

A list of bones, 0 to nBones.
A list of vertices, 0 to nVerts.

For each vertex, you have (among other things) a number of Weight/Index pairs. These pairs tell the vertex which bone it uses (through the index) and how much weight to apply to that bone (through the weight).

The number of Weight/Index pairs per-vertex was defined by your artist, but you should enforce a specific maximum limit, nIndices.

When you go to generate your in-application mesh for rendering, you store all of this data. You store the list of bones. You store the vertex data. And, for each vertex, you store the weight/index pairs, up to nIndices per vertex.

Your shader will be given the weight/index pairs as attributes, since they vary per-vertex. The list of bones (generated from your animation system in the order expected by your mesh) is a per-object value, so they should be passed as an array of uniform values.

So, your loop for transforming your vertices looks like this:

attribute vec4 indices;
attribute vec4 weights;

...

for(int i = 0; i < 4 /*nIndices*/; i++)
{
  mat4 boneToUse = boneArray[indices[i]];
  float weightToUse = weights[i];

  /* Do math for vertex weighting */
}

That’s why 4 is a reasonable number for nIndices. If your artists disagree, smack them.

This is even worse in fragment shaders. The GPUs are based on massive parallelism and do have troubles when instruction flow differs between individual pixels or vertices so the drivers offten try to avoid dynamic jumps, calculate everything and simply ingore results of calculations that should not happen.

This is the only reason I cant go further with the mega shader… it is really unfortunate… and im going to have to construct 1000s of shaders using string definitions…

Like Korval mention:

for(int i = 0; i < 4; i++)
{
}

If only 2 bones are being used I ll have to make another shader:

for(int i = 0; i < 2; i++)
{
}

I just hate this concept.

Thx for your inputs guys, I think im all set!

regards