Bones Skinning!

golgoth13 · November 30, 2006, 1:37pm

Hi all!

I hope ill be clear enough to explain my questionâ€¦

The situation is that im trying to feed bones weights values to a vertex shaderâ€¦ to do so, im using one vertex attributeâ€¦ it all make sense to send one set of weights through a vertex attributeâ€¦ considering many bones can affect a single vertex, im confused about sending more then one set of weights values.

How to:

1 â€“ use one vertex attribute per bones.
2 â€“ use the same vertex attribute but send more then one set of values in it.

Considering the solution number 2, since the vertex attribute values are suppose to be in the same order as the verticesâ€¦ how do you organize the values? And especially how do you retrieve them? perhaps there is a simple solution to thisâ€¦ I ll really appreciate if someone could help me figure this out.

thx

Korval · November 30, 2006, 3:54pm

In general, programs adhere to a maximum of 4 bones affecting one vertex. If you wish to push beyond this limitation (and I strongly suggests that you investigate the reasons why you want to. Those wanting the feature should be required to explain themselves, and the explanation had better be good), it’s quite simple: use another attribute.

Your code then loops over the components of attribute 1, then loops over the components of attribute 2. If the weight is 0, you do the math anyway (which is one reason to limit it to 4).

golgoth13 · November 30, 2006, 5:41pm

Got it, thx korval!

I think I just found something really interesting, if it is true, you guys probably know this for awhile nowâ€¦ couple mounts ago, I did try using vertex attributes over id 15â€¦ and it didnâ€™t work on nvidia cards back thenâ€¦ im currently using 91.47 driver and it seams to work nowâ€¦ I even tried up till id 60 and it did workâ€¦(good news!!! O please donâ€™t burst my bubble)â€¦. am I taking crazy pills or we got upgraded?

Thx again

Korval · November 30, 2006, 5:56pm

Did you actually use 60 indices, or just index #60?

golgoth13 · November 30, 2006, 6:41pm

#60 as a test with tangents attributes and it worked…

ZbuffeR · December 1, 2006, 2:28am

It is said on the Orange Book that attributes ids should be kept low anyway for performance reasons.

But in any case, you can not actually use more ids. It seems they just enhance the driver to detect which ids are really used.

golgoth13 · December 1, 2006, 7:25am

Can you think of a more elegant way to use the attributes? I ve tried:

attribute float a_Weights[2]; technique, but it did not workâ€¦

attribute float a_Weights0;
attribute float a_Weights1;

uniform mat4 u_BoneMatrix[2];
uniform int u_BoneCount;

void main(void)
{
	vec4 l_position = vec4(0.0, 0.0, 0.0, 0.0);
	for (int i = 0; i < u_BoneCount; ++i)
	{
		if (i == 0)
			l_position += a_Weights0 * (u_BoneMatrix[i] * gl_Vertex);
		else if (i == 1)
			l_position += a_Weights1 * (u_BoneMatrix[i] * gl_Vertex);
	}

	gl_Position = gl_ModelViewProjectionMatrix * l_position;
}

Komat · December 1, 2006, 8:56am

You can pack the weights into one vec2 attribute.

If you are rendering character that has skeleton with multiple bones, it is useful to upload several matrices (as many as you can) and send bone indices together with the weights using a single attribute (for the two bone case) or in separate attribute (for four bone case).

golgoth13 · December 1, 2006, 10:02am

Hi Komat, this is interestingâ€¦.

it is useful to upload several matrices (as many as you can) and send bone indices together with the weights using a single attribute
Did you mean one matrix per weight value?

attribute mat4 Weights;

Weights[0] = bone1 weight value
Weights[1] = bone2 weight value
Weights[2] = bone3 weight value
â€¦

It brings up another questionâ€¦ im using VOBsâ€¦ set to DYNAMIC_DRAW is there any ways not to update glBufferData each frame when updating the weights? Im asking this because packing the data each frame to matrix4 would be somehow heavy process.

edit: nvm the last question… i just switch from ffp skinning to glsl… and it is does not require to be updated each frame anymore…

Komat · December 1, 2006, 12:05pm

Originally posted by Golgoth:
[b]Did you mean one matrix per weight value?

attribute mat4 Weights;

Weights[0] = bone1 weight value
Weights[1] = bone2 weight value
Weights[2] = bone3 weight value
[/b]
Something like:

attribute vec4 Weights;
attribute vec4 Indices;

uniform mat4 Matrices[ 20 ] ;

void main( void )
{
   vec4 position = vec4( 0.0 ) ;

   for ( int i = 0 ; i < 4 ; i++ ) {
        mat4 matrix = Matrix[ int( Indices[ i ] * 255.0 ) ] ;
        position += Weights[ i ] * ( matrix * gl_Vertex );
   }

   gl_Position = gl_ModelViewProjectionMatrix * position ;
}

golgoth13 · December 2, 2006, 9:27am

humm… it all seams to make sense, but im not quite sure what you meant there…

is the loop above is for 4 bones?.. did you meant:

bone1 = Weights.x;
bone2 = Weights.y;
bone3 = Weights.z;
bone4 = Weights.w;
Indices = bone ids; ?

what is the 255.0 stands for? offset of some kind?

why would u have 20 matrices? brings up the possibilities to 20 bones at once?

thx for youe patience, i guess im a slow learner

Komat · December 2, 2006, 11:59am

Originally posted by Golgoth:
[b]
is the loop above is for 4 bones?.. did you meant:

bone1 = Weights.x;
bone2 = Weights.y;
bone3 = Weights.z;
bone4 = Weights.w;
Indices = bone ids; ?
[/b]
Yes. For vec values the Weights.x is the same as Weights[ 0 ].

what is the 255.0 stands for? offset of some kind?

This is written for situation when the bone indices are sent to the shader using four normalized unsigned bytes (GL_UNSIGNED_BYTE) to save the space so the 255 is for conversion from the normalized <0,1> range the shader gets into the original <0,255> range of the unsigned bytes.

why would u have 20 matrices? brings up the possibilities to 20 bones at once?

It allows to draw mesh that has up to 20 bones in one draw call. Each vertex can be skinned by up to four bones from those 20 bones.

golgoth13 · December 2, 2006, 2:24pm

Oh, great great, im starting to see it sparkle!

If we send matrices in the right order can we expect it to work without sending the indices?

	for (int i = 0; i < u_BoneCount; ++i)
		l_position += Weights[i] * (Matrix[i] * gl_Vertex);

A little off topic, but im confused about one things, I still can use your wisedom here if it is not to much to ask.

my Envelope class looks like this. those Envelopes are the data holders for the bone weights. Regarding to your proposal, I ll have to go trough all of them and pack the data in a vector4, make a vob and send them to the shader.

class Envelope: public Object
{
	DeclareClass(Envelope, Object);

	public:

		inline Envelope(): Object(),
			m_Bone(NULL) {}

		inline ~Envelope()
		{
			m_Weights.Delete();
			m_Bone = NULL;
		}
		inline Bone *GetBone()				{return m_Bone;}
		inline Array<Float> &GetWeights()		{return m_Weights;}


		void SetBone(Bone *in_bone);

	protected:

		Bone *m_Bone;
		Array<Float> m_Weights;
};

As you can see, I have no structure of Bone Chains per sayâ€¦ so I cant really declare several set of weights in an Array<Vector4> anywhere for now. So I guess ll have to do it on the fly.

for our case, how would you pack 4 Array<Float> in one Array<Vector4>? In fact, do I really need to? Is there any way tu use my Array<Float>s and send them to the shader as vec4 without packing them first in a Vector4? my last choice would be to loop through all of the weight values and assign them to a Array<vector4> but im hoping to find a smarter way to do that.

thx

Komat · December 2, 2006, 4:55pm

Originally posted by Golgoth:

If we send matrices in the right order can we expect it to work without sending the indices?

Yes. The main purpose of the indices was to increase speed of rendering of meshes with high number of bones and bone combinations.

for our case, how would you pack 4 Array<Float> in one Array<Vector4>? In fact, do I really need to? Is there any way tu use my Array<Float>s and send them to the shader as vec4 without packing them first in a Vector4?

Because each bone has its own array of floats, there is no direct way to send weights for several bones as one vec4. You have to pack the values into different array however you can store that array somewhere within the mesh and update it only when the envelopes change.

golgoth13 · December 2, 2006, 5:47pm

Thank you O grand wizard

im still working on a mega shader, so, to cover the widest range of possibilities, let say that we ll be sending 1 mat4 instead of a vec4 to store weight values at any time… which would be a 16 bones cap. even if we re only using 1-4 bones… how bad would the penalties be? none, we can leave with it… on the edge or it is insane, forget about it?

Komat · December 3, 2006, 1:50am

Originally posted by Golgoth:

im still working on a mega shader, so, to cover the widest range of possibilities, let say that we ll be sending 1 mat4 instead of a vec4 to store weight values at any time… which would be a 16 bones cap. even if we re only using 1-4 bones… how bad would the penalties be? none, we can leave with it… on the edge or it is insane, forget about it?
You need to measure it in your case to find if it is acceptable.

There are several types of cost:

Cost of the resources used. The number of attributes and total size of uniforms accessible from the shader is limited. Number of shader instructions is also limited. It is possible that with the mega shader you hit such limit and the shader will fail to compile.
Runtime cost on the GPU. If you are not limited by vertex processing speed, doing some unnecessary work inside the shader will probably not matter on HW with separate vertex shading units. On the latest GPUs with unified architecture, unnecessary work within the vertex shader might reduce performance of the fragment shading.
CPU/GPU cost associated with API calls and GPU state changes. If you upload many values and use them only for few vertices, it will consume more CPU power and the GPU might not operate on its full efficiency.

golgoth13 · December 3, 2006, 10:02am

Alright, I ll make some tests… I just got everything in place and it is working… there is another thing that we have discuss in the past.

those combination works:

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights[0] * (u_BoneMatrix[i] * gl_Vertex);

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights.x * (u_BoneMatrix[i] * gl_Vertex);

for (int i = 0; i < 2; ++i)
	if (i == 0)
		l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);

but,

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);

Return error C1011: cannot index a non-array value

You ll say it is not to much of a problem since I can use:

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights[0] * (u_BoneMatrix[i] * gl_Vertex);
else if (i == 1)
l_position += a_Weights[1] * (u_BoneMatrix[i] * gl_Vertex);
and so onâ€¦

with all the tests I ve made I came to the conclusion very quick that, let say I use else if (i == 2516) here, which I ll never going to meet of courseâ€¦ but, but ,but, I ve find out that what is inside the condition is going to take process time even though the condition returns false, it is going to take as much process as if it were returning true. Weird? yeah tell me about it, this is a really painful downside for me. Hope im clear enough in my explanation because this is really slowing me down regarding the mega shader design architecture. since this issue was tested with gl_light[x] variables I ll try more tests with this current example, because the same thing will happen if only one bone is being process here.

thx again

Komat · December 3, 2006, 12:16pm

Originally posted by Golgoth:
[b]
those combination works:

for (int i = 0; i < 2; ++i)
	if (i == 0)
		l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);

but,

for (int i = 0; i < u_BoneCount; ++i)
	if (i == 0)
		l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);
Return error C1011: cannot index a non-array value
[/b]
The vec4 is not real array. The a_Weights[0] is just different way how to write a_Weights.x and the index can not be changed dynamically (hw does not support that). When the number of loops is known like in the first situation, the compiler will unroll the loop appropriate number of times and will insert .x|.y|.z|.w masks corresponding to individual values of i. In the second situation the compiler can not do that because it does not know how big the u_BoneCount can be and you will get that error.

but, but ,but, I ve find out that what is inside the condition is going to take process time even though the condition returns false, it is going to take as much process as if it were returning true. Weird?

This is even worse in fragment shaders. The GPUs are based on massive parallelism and do have troubles when instruction flow differs between individual pixels or vertices so the drivers often try to avoid dynamic jumps, calculate everything and simply ignore results of calculations that should not happen.

Korval · December 3, 2006, 12:17pm

You are not understanding the process that you’re trying to do.

You have a mesh. Artists created this mesh. They attached bones to this mesh, and weighted the vertices to these bones.

So, what you have in your tool is the following:

A list of bones, 0 to nBones.
A list of vertices, 0 to nVerts.

For each vertex, you have (among other things) a number of Weight/Index pairs. These pairs tell the vertex which bone it uses (through the index) and how much weight to apply to that bone (through the weight).

The number of Weight/Index pairs per-vertex was defined by your artist, but you should enforce a specific maximum limit, nIndices.

When you go to generate your in-application mesh for rendering, you store all of this data. You store the list of bones. You store the vertex data. And, for each vertex, you store the weight/index pairs, up to nIndices per vertex.

Your shader will be given the weight/index pairs as attributes, since they vary per-vertex. The list of bones (generated from your animation system in the order expected by your mesh) is a per-object value, so they should be passed as an array of uniform values.

So, your loop for transforming your vertices looks like this:

attribute vec4 indices;
attribute vec4 weights;

...

for(int i = 0; i < 4 /*nIndices*/; i++)
{
  mat4 boneToUse = boneArray[indices[i]];
  float weightToUse = weights[i];

  /* Do math for vertex weighting */
}

That’s why 4 is a reasonable number for nIndices. If your artists disagree, smack them.

golgoth13 · December 3, 2006, 1:28pm

This is even worse in fragment shaders. The GPUs are based on massive parallelism and do have troubles when instruction flow differs between individual pixels or vertices so the drivers offten try to avoid dynamic jumps, calculate everything and simply ingore results of calculations that should not happen.

This is the only reason I cant go further with the mega shaderâ€¦ it is really unfortunateâ€¦ and im going to have to construct 1000s of shaders using string definitionsâ€¦

Like Korval mention:

for(int i = 0; i < 4; i++)
{
}

If only 2 bones are being used I ll have to make another shader:

for(int i = 0; i < 2; i++)
{
}

I just hate this concept.

Thx for your inputs guys, I think im all set!

regards