I think I understand what you’re trying to do. But I don’t understand how you’re trying to accomplish it.
First, let me restate what I think you’re trying to accomplish. Both to make sure we’re on the same page, as well as to make the solution more obvious.
I believe you are starting with two joint skeletons with 1) the same number of joints (bones) and 2) the same relative connections between those joints, but which have different translations between those joints. Also, each of these two joint skeletons has their own bind pose (that is, the pose in which they were rigged to their joint skeleton), and these bind poses may be very different. Further, you have the animation transforms for each joint skeleton to re-pose them from their own unique bind pose into a T-pose. And in this T-pose, I infer (in your #2) that you’re presuming that the 3D object-space orientation (rotation only) of all the joints in both joint skeletons is the same. You’re also presuming that retargeting the animations like this won’t (due to different translations between the joints, or different mesh sizes) result in mesh interference (e.g. limbs penetrating the torso).
Is this right?
If so, then (high-level) what we conceptually want to do is walk the animation transforms for skeleton A (which re-pose skeleton A from its own bind pose into a T-pose) “into” animation transforms for skeleton B (which re-pose skeleton B from its own bind pose into a T-pose). And we’re going to do this through the T-pose (because in the T-pose, the object-space rotations of the joints in the two skeletons is the same.
That is:
GS * AS == GT * AT
so…
GT-1 * GS * AS == AT
Where AS = the animation transforms for skeleton S, GS = the joint global transforms for skeleton S, and GT == the joint global transforms for skeleton T … except that we’ve omitted the translation component of each joint orientation transform when building the two G transforms. That is, this equation is rotations only! (no translations).
In other words, what this equation does is:
Skeleton S’s animation transforms (joint space)
-> object-space (T-pose)
-> Skeleton T’s animation transforms (joint space)
For a terminology reference, see this post and the one two up from it in the same thread.
Now you might be thinking “How in the heck am I supposed to know GT, because that contains the animation transforms for skeleton T (AT), which is what we’re trying to find?!” Answer: Run this equation starting at the skeleton root joint first, and then iterate down to the leaf joints in topological order. By the time you need to know an AT to compute a GT, you will.
Now as to your #3, that makes no sense to me. If your bind pose transforms are the same (rotations only), then your animation transforms are the same, and thus the target poses are the same (rotations only). This is the simplest case, because there’s no retargeting work to do here (with the assumptions stated above).