Skeleton Animation Retargeting, Orientation only, same skeleton.

I’m trying to understand the 3D formulas to retarget an animation authored for a Source skeleton to a Target skeleton, very similar (humanoid, identical bone hierarchy, 1-to-1 mappping, easy).

I’m not interested to properly retargeting translations for now. Only rotations.

Let’s say the Source skeleton S have its rest/bind pose “Sb”, and a neutral T-pose “St”.
The same is for the Target skeleton, with its rest/bind pose “Tb” and “Tt”.
So:

Sb = source rest / bind pose
St = source neutral T-pose
Tb = target rest / bind pose
Tt = target neutral T-pose

In local space (before computing the global space transform of the animation) I want to convert / retarget the source animation, and the formula should be like (for every joint):

Tb * D * inverse(Sb) * ( Sb * A )

Where D is the matrix I should find, and ( Sb * A ) is the keyframe of the animation.

  1. If Tb == Tt and Sb == St, clearly D should be the Identity.

  2. If Tb == Tt but Sb =/= St I suspect the formula should be one of the following:

a) D = inverse(St) * Sb
b) D = inverse(Sb) * St

Which one is correct?

  1. If Tb =/= Tt and Sb == St

should be one of the following, but wich one?
a) D = inverse(Tt) * Tb
b) D = inverse(Tb) * Tt

Thanks in advance for any response!

I think I understand what you’re trying to do. But I don’t understand how you’re trying to accomplish it.

First, let me restate what I think you’re trying to accomplish. Both to make sure we’re on the same page, as well as to make the solution more obvious.

I believe you are starting with two joint skeletons with 1) the same number of joints (bones) and 2) the same relative connections between those joints, but which have different translations between those joints. Also, each of these two joint skeletons has their own bind pose (that is, the pose in which they were rigged to their joint skeleton), and these bind poses may be very different. Further, you have the animation transforms for each joint skeleton to re-pose them from their own unique bind pose into a T-pose. And in this T-pose, I infer (in your #2) that you’re presuming that the 3D object-space orientation (rotation only) of all the joints in both joint skeletons is the same. You’re also presuming that retargeting the animations like this won’t (due to different translations between the joints, or different mesh sizes) result in mesh interference (e.g. limbs penetrating the torso).

Is this right?

If so, then (high-level) what we conceptually want to do is walk the animation transforms for skeleton A (which re-pose skeleton A from its own bind pose into a T-pose) “into” animation transforms for skeleton B (which re-pose skeleton B from its own bind pose into a T-pose). And we’re going to do this through the T-pose (because in the T-pose, the object-space rotations of the joints in the two skeletons is the same.

That is:

GS * AS == GT * AT

so…

GT-1 * GS * AS == AT

Where AS = the animation transforms for skeleton S, GS = the joint global transforms for skeleton S, and GT == the joint global transforms for skeleton T … except that we’ve omitted the translation component of each joint orientation transform when building the two G transforms. That is, this equation is rotations only! (no translations).

In other words, what this equation does is:

Skeleton S’s animation transforms (joint space)
-> object-space (T-pose)
-> Skeleton T’s animation transforms (joint space)

For a terminology reference, see this post and the one two up from it in the same thread.

Now you might be thinking “How in the heck am I supposed to know GT, because that contains the animation transforms for skeleton T (AT), which is what we’re trying to find?!” Answer: Run this equation starting at the skeleton root joint first, and then iterate down to the leaf joints in topological order. By the time you need to know an AT to compute a GT, you will.

Now as to your #3, that makes no sense to me. If your bind pose transforms are the same (rotations only), then your animation transforms are the same, and thus the target poses are the same (rotations only). This is the simplest case, because there’s no retargeting work to do here (with the assumptions stated above).

Thank you for your response.
I think I found the solution I was searching for.

I believe you are starting with two joint skeletons with 1) the same number of joints (bones) and 2) the same relative connections between those joints, but which have different translations between those joints. Also, each of these two joint skeletons has their own bind pose (that is, the pose in which they were rigged to their joint skeleton), and these bind poses may be very different.

Yes, the skeletons are similar but with different proportions. Just imagine different type of humanoid characters.

To use the same animation clip for different skeletons at runtime the local joints must be transformed anyway, so I think it can be cool to support models rigged differently as long as I can import the T-pose as reference. I never used Unity but I think that is what the engine does.

I don’t quite understand why you use the global transform. I don’t walk the animation because I transform the local transforms separately.

Using your terminology:

I import animation keyframes that are exported in the bind pose of source skeleton.
When the process of fetching and interpolation is complete I have an array of O*A, one for every joint.

For the source skeleton I can use that directly and build the global transforms and then the palette for rendering.

Keyframes of the source animation are joints in the form L = O*A
global transform for joint i :
G(i) = L(0) * L(1) * L(2) * … * L(i)
palette transform for joint i :
F(i) = G(i) * inverse( O(0) * O(1) * … * O(i))

For the other skeletons, the target skeletons, I must transform the local transform Ls of the joints to a new Lt.

I use the following formula that apparently works:

Lt = OtB * ( OtB-1 * OtT) * ( OsT-1 * OsB ) * OsB-1 * Ls

OsB: source bind local transform
OsT: source T-pose local transform
OtB: target bind local transform
OtT: target T-pose local transform

That to my surprise simplify to an elegant:

Lt = OtT * OsT-1 * Ls

I use global because you have to account for parent joints that were oriented (rotated) differently in the skeleton S bind pose and the skeleton T bind pose.

For instance, consider if skeleton S was rigged with the right arm sticking straight out (e.g. like a T-pose), but skeleton T was rigged with the right arm pointing diagonally down (e.g. like an A-pose). In the posed result, you want to ensure that (for example) the pointer finger on that hand points the same direction in object-space on both joint skeletons. So you have to account for the parent transforms when computing the child transform.

Ops.
Yeah, my solution only works if the local orientations of the individual joints match, so it’s not so generic…
Thanks Dark Photon!

Sure thing! Good luck!