Vertex weighting ideas

I have implemented skeletal animation in my engine, and I am thinking about ideas to make it faster. Currently I can display one Doom 3 model at 420 FPS. This on a pentium 2.53, so its a little slow.

Since graphics cards are faster for matrix calculations, I tried uploading the bone matrix, transforming every vertex attached to the bone, then getting the result for each. This gained about 10 FPS in a 500 FPS simulation, so the speed gains were pretty negligible, and would probably be even less so on a faster CPU.

The test I am running is transforming 1300 vertices each frame. I have also written implemented smarter updating, so it only transforms vertices attached to bones that have actually moved, but for testing purposes I am updating the whole mesh each frame.

I know it is possible to do this in a vertex program. Since characters usually use one specific texture/shader I don’t see a problem with having a special shader written for characters. What would this involve? Are there any other ideas I could try to increase my performance on the CPU? I might even try cycling through the vertices and only updating a limited number of vertices each frame. This would cause visual glitches, but at 100+ FPS I am not sure if it would be very noticable.

Since graphics cards are faster for matrix calculations, I tried uploading the bone matrix, transforming every vertex attached to the bone, then getting the result for each.
That’s not how you do skinning on the GPU. Since the purpose of the GPU is to render the mesh, there is no point in “getting the result” back. Simply do the skinning on the GPU, and let the rendering take care of it. The main CPU doesn’t need to know or care about the result of the skinning.

The test I am running is transforming 1300 vertices each frame.
That’s not even a real number these days. You’re not bottlenecked on anything.

Try shoving a 100,000 polygon skinned mesh through the system.

I know what I tried isn’t GPU skinning, I just (correctly) thought it might provide a small speed increase.

How would you do the skinning on the GPU? The data uploading of matrices would be pretty complicated for a shader. How would you store that data?

Originally posted by Korval:
That’s not how you do skinning on the GPU. Since the purpose of the GPU is to render the mesh, there is no point in “getting the result” back. Simply do the skinning on the GPU, and let the rendering take care of it. The main CPU doesn’t need to know or care about the result of the skinning.

For some rendering algorithms the CPU needs to know the skinned positions. For example to construct additional geometry based on silhouette (e.g. the shadow volumes or the smoothies). In that case use of GPU as “matrix multiplication accelerator” might be advantageous if the delay necessary to retrieve the data does not negate the gains.

I kind of like keeping it on the CPU, because then I don’t have to have two routines for shader and non-shader cards, but I’ll probably end up doing a vert shader.

Are there any third-party libs to handle this? I’m open to anything that might be faster. My animation routine is very very fast, and I can’t imagine a third-party lib being any faster, but the skin weighting is definitely a bottleneck. Has anyone tried Cal3D?

Cal3d is CPU based…

Here is an example how it’s possible to use a shader for all that transformations:
http://lumina.sourceforge.net/index.php?id=27
CPU has only to calculate the bonematrices (or quaternion and joints)

Originally posted by Leadwerks:

How would you do the skinning on the GPU? The data uploading of matrices would be pretty complicated for a shader. How would you store that data?

The vertex shader has uniform variable which contains array with matrices for individual bones.
The GPU skinning is usually limited to maximum of four bones per vertex so each vertex has additional eight values passed trough the generic attributes or texture coordinates. Four from those values are indices into the array of matrices and remaining four values are weights of individual bones. The shader then transforms the position with all four matrices and combines them using a weighted sum (or calculates weighted sum of the matrices and transforms the position). If the vertex is influenced by less than four matrices, excessive weights will be zero and corresponding indices would point to some valid matrix (usualy a first one).

what i still dislike about this approach is that you have to re-upload all the bone matrices for each mesh for each pass. i didn’t notice any speed increase over cpu skinning when i tried myself.

what i still dislike about this approach is that you have to re-upload all the bone matrices for each mesh for each pass.
Then stop making so many passes. It’s one of the reasons I have a strong dislike for any multipass technique.

Though with the ability to buffer post-vert/geom output in the new hardware, there’s no real need to worry. Just capture the T&L into a vertex buffer.

Originally posted by Vexator:
i didn’t notice any speed increase over cpu skinning when i tried myself.
This depends on many factors. How powerful your CPU is and what tasks it needs to do in addition to the skinning (e.g. run physics simulation, do AI decisions, find paths). How complex your skinned character is (e.g. number of bones per vertex, number of vertices). What the bottleneck in your program is (e.g. pixel shaders, vertex shaders, rendering command queueing, CPU).

One additional advantage the GPU skinning has is that the geometry data for the character are static so they can be easily stored in static VBO and shared by any instance of the character regardless of which animation it plays.

You can optimise code with skin matrix:

attribute vec4 vWeight;
attribute vec4 vBoneIndex;

uniform mat4 bones[30];

varying vec4 col;

mat4 BuildSkinMatrix()
{
	vec4 b = vBoneIndex;
	vec4 w = vWeight;	

	mat4 result;
	int i;
	for (i=0; i<4; i++)
	{
	 result += (w[i] * bones[int(b[i])]);
	}
	
	return result;
}

void main(void)
{
	vec4 vtx;
	vec4 nrm;

	mat4 skinmatrix = BuildSkinMatrix();
	
	vtx = skinmatrix * gl_Vertex;
	nrm = skinmatrix * vec4(gl_Normal, 0.0);
	nrm = vec4(gl_NormalMatrix * nrm.xyz, 0.0);
	col = nrm;
	gl_Position = gl_ModelViewProjectionMatrix * vtx;
}

You have to upload uniform mat4 bones[30] for each character.
bones[index] = frame_matrix[index] * bind_pose_matrix_inv[index];

If you have multipass rendering in your app, this shader is not good idea, because GPU have to skin same character in each pass. It’s better to skin model on CPU and upload skinned vertices in VBO.

It’s better to cast the skinmatrix down to a mat3. (It can save some instructions)
And it’s possible to save one vertex stream if the weights are added to the indices. Splitting costs only 1 instruction, but it saves bandwidth and ram.

And it’s possible to save one vertex stream if the weights are added to the indices.
How do you go about unsplitting them? I don’t recall an instruction that turns a float into an integer (floor operation).

the int() cast is like floor(). To get the weight it’s enough to use fract(). If a weight is 1.0 (or greater than 0.9999) it’s need to be split into 2 weighs of 0.5.

Well, here is what I came up with weighting vertices on the CPU. It’s pretty fast, considering it is updating 1700 vertices each frame:
http://www.leadwerks.com/post/anim2.zip