PDA

View Full Version : Bones Skinning!

Golgoth
11-30-2006, 01:37 PM
Hi all!

I hope ill be clear enough to explain my question…

The situation is that im trying to feed bones weights values to a vertex shader… to do so, im using one vertex attribute… it all make sense to send one set of weights through a vertex attribute… considering many bones can affect a single vertex, im confused about sending more then one set of weights values.

How to:

1 – use one vertex attribute per bones.
2 – use the same vertex attribute but send more then one set of values in it.

Considering the solution number 2, since the vertex attribute values are suppose to be in the same order as the vertices… how do you organize the values? And especially how do you retrieve them? perhaps there is a simple solution to this… I ll really appreciate if someone could help me figure this out.

thx

Korval
11-30-2006, 03:54 PM
In general, programs adhere to a maximum of 4 bones affecting one vertex. If you wish to push beyond this limitation (and I strongly suggests that you investigate the reasons why you want to. Those wanting the feature should be required to explain themselves, and the explanation had better be good), it's quite simple: use another attribute.

Your code then loops over the components of attribute 1, then loops over the components of attribute 2. If the weight is 0, you do the math anyway (which is one reason to limit it to 4).

Golgoth
11-30-2006, 05:41 PM
Got it, thx korval!

I think I just found something really interesting, if it is true, you guys probably know this for awhile now… couple mounts ago, I did try using vertex attributes over id 15… and it didn’t work on nvidia cards back then… im currently using 91.47 driver and it seams to work now… I even tried up till id 60 and it did work…(good news!!! O please don’t burst my bubble)…. am I taking crazy pills or we got upgraded?

Thx again

Korval
11-30-2006, 05:56 PM
Did you actually use 60 indices, or just index #60?

Golgoth
11-30-2006, 06:41 PM
#60 as a test with tangents attributes and it worked...

ZbuffeR
12-01-2006, 02:28 AM
It is said on the Orange Book that attributes ids should be kept low anyway for performance reasons.

But in any case, you can not actually use more ids. It seems they just enhance the driver to detect which ids are really used.

Golgoth
12-01-2006, 07:25 AM
Can you think of a more elegant way to use the attributes? I ve tried:

attribute float a_Weights[2]; technique, but it did not work…

attribute float a_Weights0;
attribute float a_Weights1;

uniform mat4 u_BoneMatrix[2];
uniform int u_BoneCount;

void main(void)
{
vec4 l_position = vec4(0.0, 0.0, 0.0, 0.0);
for (int i = 0; i < u_BoneCount; ++i)
{
if (i == 0)
l_position += a_Weights0 * (u_BoneMatrix[i] * gl_Vertex);
else if (i == 1)
l_position += a_Weights1 * (u_BoneMatrix[i] * gl_Vertex);
}

gl_Position = gl_ModelViewProjectionMatrix * l_position;
}

Komat
12-01-2006, 08:56 AM
You can pack the weights into one vec2 attribute.

If you are rendering character that has skeleton with multiple bones, it is useful to upload several matrices (as many as you can) and send bone indices together with the weights using a single attribute (for the two bone case) or in separate attribute (for four bone case).

Golgoth
12-01-2006, 10:02 AM
Hi Komat, this is interesting….

it is useful to upload several matrices (as many as you can) and send bone indices together with the weights using a single attributeDid you mean one matrix per weight value?

attribute mat4 Weights;

Weights[0] = bone1 weight value
Weights[1] = bone2 weight value
Weights[2] = bone3 weight value

It brings up another question… im using VOBs… set to DYNAMIC_DRAW is there any ways not to update glBufferData each frame when updating the weights? Im asking this because packing the data each frame to matrix4 would be somehow heavy process.

edit: nvm the last question... i just switch from ffp skinning to glsl... and it is does not require to be updated each frame anymore...

Komat
12-01-2006, 12:05 PM
Originally posted by Golgoth:
Did you mean one matrix per weight value?

attribute mat4 Weights;

Weights[0] = bone1 weight value
Weights[1] = bone2 weight value
Weights[2] = bone3 weight value
Something like:

attribute vec4 Weights;
attribute vec4 Indices;

uniform mat4 Matrices[ 20 ] ;

void main( void )
{
vec4 position = vec4( 0.0 ) ;

for ( int i = 0 ; i < 4 ; i++ ) {
mat4 matrix = Matrix[ int( Indices[ i ] * 255.0 ) ] ;
position += Weights[ i ] * ( matrix * gl_Vertex );
}

gl_Position = gl_ModelViewProjectionMatrix * position ;
}

Golgoth
12-02-2006, 09:27 AM
humm... it all seams to make sense, but im not quite sure what you meant there...

is the loop above is for 4 bones?... did you meant:

bone1 = Weights.x;
bone2 = Weights.y;
bone3 = Weights.z;
bone4 = Weights.w;
Indices = bone ids; ?

what is the 255.0 stands for? offset of some kind?

why would u have 20 matrices? brings up the possibilities to 20 bones at once?

thx for youe patience, i guess im a slow learner :)

Komat
12-02-2006, 11:59 AM
Originally posted by Golgoth:

is the loop above is for 4 bones?... did you meant:

bone1 = Weights.x;
bone2 = Weights.y;
bone3 = Weights.z;
bone4 = Weights.w;
Indices = bone ids; ?
Yes. For vec values the Weights.x is the same as Weights[ 0 ].

what is the 255.0 stands for? offset of some kind?
This is written for situation when the bone indices are sent to the shader using four normalized unsigned bytes (GL_UNSIGNED_BYTE) to save the space so the 255 is for conversion from the normalized <0,1> range the shader gets into the original <0,255> range of the unsigned bytes.

why would u have 20 matrices? brings up the possibilities to 20 bones at once?
It allows to draw mesh that has up to 20 bones in one draw call. Each vertex can be skinned by up to four bones from those 20 bones.

Golgoth
12-02-2006, 02:24 PM
Oh, great great, im starting to see it sparkle!

If we send matrices in the right order can we expect it to work without sending the indices?

for (int i = 0; i < u_BoneCount; ++i)
l_position += Weights[i] * (Matrix[i] * gl_Vertex);A little off topic, but im confused about one things, I still can use your wisedom here if it is not to much to ask.

my Envelope class looks like this. those Envelopes are the data holders for the bone weights. Regarding to your proposal, I ll have to go trough all of them and pack the data in a vector4, make a vob and send them to the shader.

class Envelope: public Object
{
DeclareClass(Envelope, Object);

public:

inline Envelope(): Object(),
m_Bone(NULL) {}

inline ~Envelope()
{
m_Weights.Delete();
m_Bone = NULL;
}
inline Bone *GetBone() {return m_Bone;}
inline Array<Float> &amp;GetWeights() {return m_Weights;}

void SetBone(Bone *in_bone);

protected:

Bone *m_Bone;
Array<Float> m_Weights;
};As you can see, I have no structure of Bone Chains per say… so I cant really declare several set of weights in an Array<Vector4> anywhere for now. So I guess ll have to do it on the fly.

for our case, how would you pack 4 Array<Float> in one Array<Vector4>? In fact, do I really need to? Is there any way tu use my Array<Float>s and send them to the shader as vec4 without packing them first in a Vector4? my last choice would be to loop through all of the weight values and assign them to a Array<vector4> but im hoping to find a smarter way to do that.

thx

Komat
12-02-2006, 04:55 PM
Originally posted by Golgoth:

If we send matrices in the right order can we expect it to work without sending the indices?
Yes. The main purpose of the indices was to increase speed of rendering of meshes with high number of bones and bone combinations.

for our case, how would you pack 4 Array<Float> in one Array<Vector4>? In fact, do I really need to? Is there any way tu use my Array<Float>s and send them to the shader as vec4 without packing them first in a Vector4?
Because each bone has its own array of floats, there is no direct way to send weights for several bones as one vec4. You have to pack the values into different array however you can store that array somewhere within the mesh and update it only when the envelopes change.

Golgoth
12-02-2006, 05:47 PM
Thank you O grand wizard :)

im still working on a mega shader, so, to cover the widest range of possibilities, let say that we ll be sending 1 mat4 instead of a vec4 to store weight values at any time... which would be a 16 bones cap. even if we re only using 1-4 bones... how bad would the penalties be? none, we can leave with it... on the edge or it is insane, forget about it?

Komat
12-03-2006, 01:50 AM
Originally posted by Golgoth:

im still working on a mega shader, so, to cover the widest range of possibilities, let say that we ll be sending 1 mat4 instead of a vec4 to store weight values at any time... which would be a 16 bones cap. even if we re only using 1-4 bones... how bad would the penalties be? none, we can leave with it... on the edge or it is insane, forget about it? You need to measure it in your case to find if it is acceptable.

There are several types of cost:
* Cost of the resources used. The number of attributes and total size of uniforms accessible from the shader is limited. Number of shader instructions is also limited. It is possible that with the mega shader you hit such limit and the shader will fail to compile.

* Runtime cost on the GPU. If you are not limited by vertex processing speed, doing some unnecessary work inside the shader will probably not matter on HW with separate vertex shading units. On the latest GPUs with unified architecture, unnecessary work within the vertex shader might reduce performance of the fragment shading.

* CPU/GPU cost associated with API calls and GPU state changes. If you upload many values and use them only for few vertices, it will consume more CPU power and the GPU might not operate on its full efficiency.

Golgoth
12-03-2006, 10:02 AM
Alright, I ll make some tests... I just got everything in place and it is working... there is another thing that we have discuss in the past.

those combination works:

for (int i = 0; i < u_BoneCount; ++i)
if (i == 0)
l_position += a_Weights[0] * (u_BoneMatrix[i] * gl_Vertex);

for (int i = 0; i < u_BoneCount; ++i)
if (i == 0)
l_position += a_Weights.x * (u_BoneMatrix[i] * gl_Vertex);

for (int i = 0; i < 2; ++i)
if (i == 0)
l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);but,

for (int i = 0; i < u_BoneCount; ++i)
if (i == 0)
l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);Return error C1011: cannot index a non-array value

You ll say it is not to much of a problem since I can use:

for (int i = 0; i < u_BoneCount; ++i)
if (i == 0)
l_position += a_Weights[0] * (u_BoneMatrix[i] * gl_Vertex);
else if (i == 1)
l_position += a_Weights[1] * (u_BoneMatrix[i] * gl_Vertex);
and so on…with all the tests I ve made I came to the conclusion very quick that, let say I use else if (i == 2516) here, which I ll never going to meet of course… but, but ,but, I ve find out that what is inside the condition is going to take process time even though the condition returns false, it is going to take as much process as if it were returning true. Weird? yeah tell me about it, this is a really painful downside for me. Hope im clear enough in my explanation because this is really slowing me down regarding the mega shader design architecture. since this issue was tested with gl_light[x] variables I ll try more tests with this current example, because the same thing will happen if only one bone is being process here.

thx again

Komat
12-03-2006, 12:16 PM
Originally posted by Golgoth:

those combination works:

for (int i = 0; i < 2; ++i)
if (i == 0)
l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);but,

for (int i = 0; i < u_BoneCount; ++i)
if (i == 0)
l_position += a_Weights[i] * (u_BoneMatrix[i] * gl_Vertex);Return error C1011: cannot index a non-array value
The vec4 is not real array. The a_Weights[0] is just different way how to write a_Weights.x and the index can not be changed dynamically (hw does not support that). When the number of loops is known like in the first situation, the compiler will unroll the loop appropriate number of times and will insert .x|.y|.z|.w masks corresponding to individual values of i. In the second situation the compiler can not do that because it does not know how big the u_BoneCount can be and you will get that error.

but, but ,but, I ve find out that what is inside the condition is going to take process time even though the condition returns false, it is going to take as much process as if it were returning true. Weird?
This is even worse in fragment shaders. The GPUs are based on massive parallelism and do have troubles when instruction flow differs between individual pixels or vertices so the drivers often try to avoid dynamic jumps, calculate everything and simply ignore results of calculations that should not happen.

Korval
12-03-2006, 12:17 PM
You are not understanding the process that you're trying to do.

You have a mesh. Artists created this mesh. They attached bones to this mesh, and weighted the vertices to these bones.

So, what you have in your tool is the following:

A list of bones, 0 to nBones.
A list of vertices, 0 to nVerts.

For each vertex, you have (among other things) a number of Weight/Index pairs. These pairs tell the vertex which bone it uses (through the index) and how much weight to apply to that bone (through the weight).

The number of Weight/Index pairs per-vertex was defined by your artist, but you should enforce a specific maximum limit, nIndices.

When you go to generate your in-application mesh for rendering, you store all of this data. You store the list of bones. You store the vertex data. And, for each vertex, you store the weight/index pairs, up to nIndices per vertex.

Your shader will be given the weight/index pairs as attributes, since they vary per-vertex. The list of bones (generated from your animation system in the order expected by your mesh) is a per-object value, so they should be passed as an array of uniform values.

attribute vec4 indices;
attribute vec4 weights;

...

for(int i = 0; i < 4 /*nIndices*/; i++)
{
mat4 boneToUse = boneArray[indices[i]];
float weightToUse = weights[i];

/* Do math for vertex weighting */
}That's why 4 is a reasonable number for nIndices. If your artists disagree, smack them.

Golgoth
12-03-2006, 01:28 PM
This is even worse in fragment shaders. The GPUs are based on massive parallelism and do have troubles when instruction flow differs between individual pixels or vertices so the drivers offten try to avoid dynamic jumps, calculate everything and simply ingore results of calculations that should not happen.
This is the only reason I cant go further with the mega shader… it is really unfortunate… and im going to have to construct 1000s of shaders using string definitions…

Like Korval mention:

for(int i = 0; i < 4; i++)
{
}If only 2 bones are being used I ll have to make another shader:

for(int i = 0; i < 2; i++)
{
}I just hate this concept.

Thx for your inputs guys, I think im all set!

regards

Korval
12-03-2006, 04:51 PM
If only 2 bones are being used I ll have to make another shader:Um, no.

The number of bones used per-vertex is a... well, per-vertex quantity. It is not a uniform, nor is it something that you can bake into a shader at compile time. It is either an attribute or an understood constant (as in my example).

If a particular vertex doesn't actually use all 4 bones that it could, it simply uses a 0 weight for the others (and a valid bone index. Like 0). This is a common technique in vertex program-based skinning, and you should use it.

Golgoth
12-03-2006, 05:02 PM
it simply uses a 0 weight for the others Oh, I see the light now... awesome!

merci!

Golgoth
12-03-2006, 05:09 PM
Is it right to say that it is gonna take pretty much the same process to calculate 1 or 4 bones?

Golgoth
12-04-2006, 07:35 AM
That's why 4 is a reasonable number for nIndices. If your artists disagree, smack them.I don’t know how you rig a character with only 4 bones, but slapping the artists will make me look like a fool that’s is for sure.

since vertex attributes only takes 4 components I guess I ll have to use other attributes to send more bones? damn, my faith in technology starts to fading out...

Komat
12-04-2006, 09:01 AM
Originally posted by Golgoth:
I don’t know how you rig a character with only 4 bones, but slapping the artists will make me look like a fool that’s is for sure.
This are 4 bones influencing one vertex. Not 4 bones for entire character. Of course you have to split the character into parts by groups of bones and draw each part independently. The indexed method mentioned above will allow you to have reasonably sized parts. Without it you will have use separate draw for each combination of bones and you might be even unable to skin some triangles if number of disjunct bones influencing theirs vertices is higher than number of bones you combine together.

It seems that you are thing about the skinning in following way:

On GPU:
foreach vertex {
foreach bone_in_character {
apply bone
}
}What you need to do is:
For each vertex select four bones with the highest weight & report error if there are more bones with significant weight. It is good to allow the artists to have more bones assigned to the vertex inside the modeling tool if only four of them have significant weight (e.g. >0.01) Construct groups of bones. Maximal size of one group depends on number of matrices you can upload to the uniform variables (e.g. 20 from my post). Assign triangles to the groups. Triangle can be assigned to group which contains all bones selected for triangle vertices. Actually the construction and assignment needs to operate simultaneously so each triangle can be assigned to one group and the groups do not contain bones that are not used by any triangle assigned to them. Of course this means that one bone will very likely end in several groups, this is unavoidable. During the assignment you will generate four bone indices for each vertex. They will represent index of corresponding bone within the group.
You will then draw the mesh using following:

On CPU:
foreach group {
upload bone matrices for the group to the GPU
draw triangles from the group
}

On GPU:
foreach vertex within the group {
foreach bone_index_assigned_to_vertex (there are 4 of them) {
apply bone with specified index inside the group
}
}

Korval
12-04-2006, 09:11 AM
Of course you have to split the character into parts by groups of bones and draw each part independently.Note: you only need to do that if the total number of bones needed for the mesh exceeds the number of uniforms that you wish to devote to the shader. With quaternion-based bones (rather than matrices), and cards with 256 (or more?) vec4 uniforms, there's substantially less reason to run into effective limits. 96 bones * 2 is often a good number of uniforms to work with on said hardware; that still gives you 64 uniforms for other needs.

On CPU during mesh load:For performance-critical applications, this sort of stuff should be done as a preprocessing step and the results stored in a properly formatted file, not something that happens anytime you load a mesh.

Komat
12-04-2006, 09:33 AM
Originally posted by Korval:
For performance-critical applications, this sort of stuff should be done as a preprocessing step and the results stored in a properly formatted file, not something that happens anytime you load a mesh. It depends on the nature of the application. If it needs to stream the data on the fly, the preprocessing is a must. If it loads all skinned meshes during level load and the algorithm is sufficiently fast (e.g. something greedy), doing the operation at runtime will allow you to adapt the size to capabilities of current hw without need to have files for several prepared combinations.

Korval
12-04-2006, 10:06 AM
If it loads all skinned meshes during level load and the algorithm is sufficiently fastThat means that tri-stripping has to be a load-time thing too. This means that you can't use the most effective algorithm, which means you're going to lose runtime performance just from standard rendering calls.

Also, making a performance-critical application run slower at load for no real reason is not a good idea. It's better to properly format your data as needed, perhaps at install time.

Like I said, for performance-critical applications, it should be a preprocess.

Golgoth
12-04-2006, 05:28 PM
Ok, I think you guys went ahead of what I planned to do!

First, I totally agree with the 4 bones per vertex… it totally make sense and I wont go beyond this limitation. Which btw is due to the max of components per vertex attributes if my understanding is correct.

I m the one pass only type. So I ll avoid multi passes at all cost. I just hate this concept. Once again, if I do get this right, the groups of bones and/or caching into a file technique is one way to find which bone matrices goes with which vertex. My current choice would be to process this when importing the data. Parse all vertices and create the bone indices on the fly.

Then again you guys rock. Thx for sharing your knowledge with me, I greatly appreciate it.