MD5 Animation. Performance issues

A time has come when i had to implement an Animation to my Engine. I started with standart MD5 mesh/anim format since i find it easy to understand. I managed to implement a simple MD5 loader and i can load MD5 meshes along with their animations. Animations work fine , i am using GPU skinning and CPU interpolation. Here comes my Problem: Performance (you can say i am obsessed with it). Currently i use Tutorial from OglDev which implements MD5 Skeletal Animation Tutorial as an example. As i said animations work fine but performance is horrible. With one Single MD5 animated Model in my scene i get ~60-80 FPS. If i add more models (~3-4 models) FPS drops to ~12-20. I know what causes that FPS drop , since i am doing a lot of computations CPU side every frame: Interpolating, filling Bone Matrices etc. Here is The Hefty CPU side part which runs every frame to compute Each Bone Transformation Matrices , from there i upload this Bone transformation Matrices to The shader and apply the skinning.


void SkinnedMesh::BoneTransform(float TimeInSeconds, vector<Matrix4f>& Transforms)
{
Matrix4f Identity;
Identity.InitIdentity();

float TicksPerSecond = (float)(m_pScene->mAnimations[0]->mTicksPerSecond != 0 ? m_pScene->mAnimations[0]->mTicksPerSecond : 25.0f);
float TimeInTicks = TimeInSeconds * TicksPerSecond;
float AnimationTime = fmod(TimeInTicks, (float)m_pScene->mAnimations[0]->mDuration);

ReadNodeHeirarchy(AnimationTime, m_pScene->mRootNode, Identity);

Transforms.resize(m_NumBones);

for (uint i = 0 ; i < m_NumBones ; i++) {
    Transforms[i] = m_BoneInfo[i].FinalTransformation;
}
}

Function Above takes vector of transformation matrices for each bone (i am working with a model that has ~71 Bones)

This function calculates interpolations etc…


void SkinnedMesh::ReadNodeHeirarchy(float AnimationTime, const aiNode* pNode, const Matrix4f& ParentTransform)
{    
string NodeName(pNode->mName.data);

const aiAnimation* pAnimation = m_pScene->mAnimations[0];

Matrix4f NodeTransformation(pNode->mTransformation);

const aiNodeAnim* pNodeAnim = FindNodeAnim(pAnimation, NodeName);

if (pNodeAnim) {
    // Interpolate scaling and generate scaling transformation matrix
    aiVector3D Scaling;
    CalcInterpolatedScaling(Scaling, AnimationTime, pNodeAnim);
    Matrix4f ScalingM;
    ScalingM.InitScaleTransform(Scaling.x, Scaling.y, Scaling.z);

    // Interpolate rotation and generate rotation transformation matrix
    aiQuaternion RotationQ;
    CalcInterpolatedRotation(RotationQ, AnimationTime, pNodeAnim);        
    Matrix4f RotationM = Matrix4f(RotationQ.GetMatrix());

    // Interpolate translation and generate translation transformation matrix
    aiVector3D Translation;
    CalcInterpolatedPosition(Translation, AnimationTime, pNodeAnim);
    Matrix4f TranslationM;
    TranslationM.InitTranslationTransform(Translation.x, Translation.y, Translation.z);

    // Combine the above transformations
    NodeTransformation = TranslationM * RotationM * ScalingM;
}

Matrix4f GlobalTransformation = ParentTransform * NodeTransformation;

if (m_BoneMapping.find(NodeName) != m_BoneMapping.end()) {
    uint BoneIndex = m_BoneMapping[NodeName];
    m_BoneInfo[BoneIndex].FinalTransformation = m_GlobalInverseTransform * GlobalTransformation * m_BoneInfo[BoneIndex].BoneOffset;
}

for (uint i = 0 ; i < pNode->mNumChildren ; i++) {
    ReadNodeHeirarchy(AnimationTime, pNode->mChildren[i], GlobalTransformation);
}
}

I wont upload the following functions which are being called in ReadNodeHeirarchy , but you get the idea . The CPU calculations per frame are A LOT. Thus i get terrible performance. I am Quite new at animation and skinning. Somewhere i was reading about using Quaternions insted of Bone Transformation matrices. I am open to any suggestions on how i may improve or even change my animating technique.

PS: Vertex Skinning Shader Code:


#version 330                                                                        

layout (location = 0) in vec3 Position;                                             
layout (location = 1) in vec2 TexCoord;                                             
layout (location = 2) in vec3 Normal;                                               
layout (location = 3) in ivec4 BoneIDs;
layout (location = 4) in vec4 Weights;

out vec2 TexCoord0;
out vec3 Normal0;                                                                   
out vec3 WorldPos0;                                                                 

const int MAX_BONES = 120;

uniform mat4 gWVP;
uniform mat4 gWorld;
uniform mat4 gBones[MAX_BONES];

void main()
{       
mat4 BoneTransform = gBones[BoneIDs[0]] * Weights[0];
BoneTransform     += gBones[BoneIDs[1]] * Weights[1];
BoneTransform     += gBones[BoneIDs[2]] * Weights[2];
BoneTransform     += gBones[BoneIDs[3]] * Weights[3];

vec4 PosL    = BoneTransform * vec4(Position, 1.0);
gl_Position  = gWVP * PosL;
TexCoord0    = TexCoord;
vec4 NormalL = BoneTransform * vec4(Normal, 0.0);
Normal0      = (gWorld * NormalL).xyz;
WorldPos0    = (gWorld * PosL).xyz;                                
}

[QUOTE=Asmodeus;1265988]Animations work fine , i am using GPU skinning and CPU interpolation.

Here comes my Problem: Performance … performance is horrible. With one Single MD5 animated Model in my scene i get ~60-80 FPS. If i add more models (~3-4 models) FPS drops to ~12-20.[/QUOTE]

Have you disabled VSync? If not, do so. All of the above are nice, round numbers that are precise multiples of 60Hz VSync intervals (16.6ms), suggesting you have VSync enabled. Always disable VSync when profiling rendering performance.

After that, I suggest that you report performance with frame time (in ms – aka milliseconds). FPS is a really poor way to measure performance (for instance, read this).

I know what causes that FPS drop , since i am doing a lot of computations CPU side every frame: Interpolating, filling Bone Matrices etc. … The Hefty CPU side part which runs every frame to compute Each Bone Transformation Matrices , from there i upload this Bone transformation Matrices to The shader and apply the skinning. …The CPU calculations per frame are A LOT. Thus i get terrible performance.

You can pretty easily move the keyframe interpolation to the GPU. Been there, done that. Just upload all your animation track joint transforms to the GPU in a texture (all joints, all timesteps), and then sample and interpolate the appropriate joint transforms in your shader before skinning the vertex. When animating a character with a single pre-modeled skeletal animation track, this completely removes the need to 1) perform any CPU-side joint transform computations and 2) perform CPU-to-GPU joint transform palette uploads.

You can represent these joint transforms in whatever form is convenient for you. I’d recommend you use Dual Quaternions: they have some very nice advantages (search the archives of these forums for details: link). Quaternion/Translation form is another option, but it’s not as flexible and it requires some expensive special handling you’d probably rather just skip. That said, both of these are better than Matrices (in skinning quality and size), but I’d start with Matrices since you’ve got those handy and you know how they work. Add Dual Quaternions once you get GPU-side keyframe interpolation working with Matrices.

I should also mention that if interpolating joint transforms for 3-4 models with 71 joints each on the CPU is bringing your render to it’s knees (12-20 fps = 50-83ms = 4-5 full 60Hz VSync intervals!!!), then even with CPU-side interpolation, something is likely very wrong in your code. On a decent CPU, you can perform interpolations for 100+ characters without slowing your frame rate below 16.6ms (60 FPS). I would do some CPU-side profiling to check the cost of each phase of your CPU processing (time them in ms). It’s also possible that the method you are using to upload data to the GPU is slow. I’d profile that too (disable the GPU upload and see how performance changes).

Thank you very much. Useful Information
I realize that FPS is not appropriate to mesure performance, but for the sake of the argument i will use it just for the example below.
But First off this piece of code is straight from OGL Dev Tutorial 38. First tests i made were directly on his tutorial. The VSync is disabled (i presume) because if i try to render other static objects FPS goes up to above +100. Also when i implemented this in my Engine (where i am using glfw and i have disabled VSync with this - glfwSwapIntervals(0)) I can render static objects with ~1000 FPS , if i add in the Animated Model fps drops to ~70. I have not touched the code in any means , its straight from OGL dev Where he gets same low performance. I do not pretend that i understand everything in his code but from first glance everyone can see that its performing hefty CPU tasks every frame.
Two Days ago i decided to ditch his tutorial and look up more in the internet and i found an example which uses Quaternions. This Example does not use shaders (Pure Old Fixed Pipeline) therefore Calculations are again performed CPU side. In that particular example i render the very same MD5 Model with more that 1700 k steady FPS. I have looked up in the code but sadly i am not very familiar with the Quaternion math. But my assumption is that i can upload those calculations to the Shaders and make the GPU move its a*s a bit.
My only Conclusion is that OGL’s tutorial is only a simple example of skinning (skeleton animation) and not something you can implement straight away from there (maybe just guidelines).
Here are some snippets from the MD5 Fixed Pipeline Example i was talking about above:


void md5load::InterpolateSkeletons (const struct md5_joint_t *skelA, const struct md5_joint_t *skelB, int num_joints, float interp, struct md5_joint_t *out)
{
  int i;

  for (i = 0; i < num_joints; ++i)
    {
      /* Copy parent index */
      out[i].parent = skelA[i].parent;

      /* Linear interpolation for position */
      out[i].pos[0] = skelA[i].pos[0] + interp * (skelB[i].pos[0] - skelA[i].pos[0]);
      out[i].pos[1] = skelA[i].pos[1] + interp * (skelB[i].pos[1] - skelA[i].pos[1]);
      out[i].pos[2] = skelA[i].pos[2] + interp * (skelB[i].pos[2] - skelA[i].pos[2]);

      /* Spherical linear interpolation for orientation */
      Quat_slerp (skelA[i].orient, skelB[i].orient, interp, out[i].orient);
    }
}

void md5load::Animate (const struct md5_anim_t *anim, struct anim_info_t *animInfo, double dt)
{
  int maxFrames = anim->num_frames - 1;

  animInfo->last_time += dt;

  /* move to next frame */
  if (animInfo->last_time >= animInfo->max_time)
    {
      animInfo->curr_frame++;
      animInfo->next_frame++;
      animInfo->last_time = 0.0;

      if (animInfo->curr_frame > maxFrames)
	animInfo->curr_frame = 0;

      if (animInfo->next_frame > maxFrames)
	animInfo->next_frame = 0;
    }
}

/**
 * Prepare a mesh for drawing.  Compute mesh's final vertex positions
 * given a skeleton.  Put the vertices in vertex arrays.
 */
void md5load::PrepareMesh (const struct md5_mesh_t *mesh, const struct md5_joint_t *skeleton)

{
  int i, j, k;

  /* Setup vertex indices */
  for (k = 0, i = 0; i < mesh->num_tris; ++i)
    {
      for (j = 0; j < 3; ++j, ++k)
	vertexIndices[k] = mesh->triangles[i].index[j];
    }

  /* Setup vertices */
  for (i = 0; i < mesh->num_verts; ++i)
    {
      vec3_t finalVertex = { 0.0f, 0.0f, 0.0f };

      /* Calculate final vertex to draw with weights */
      for (j = 0; j < mesh->vertices[i].count; ++j)
	{
	  const struct md5_weight_t *weight
	    = &mesh->weights[mesh->vertices[i].start + j];
	  const struct md5_joint_t *joint
	    = &skeleton[weight->joint];

	  /* Calculate transformed vertex for this weight */
	  vec3_t wv;
	  Quat_rotatePoint (joint->orient, weight->pos, wv);

	  /* The sum of all weight->bias should be 1.0 */
	  finalVertex[0] += (joint->pos[0] + wv[0]) * weight->bias;
	  finalVertex[1] += (joint->pos[1] + wv[1]) * weight->bias;
	  finalVertex[2] += (joint->pos[2] + wv[2]) * weight->bias;
	}

      vertexArray[i][0] = finalVertex[0];
      vertexArray[i][1] = finalVertex[1];
      vertexArray[i][2] = finalVertex[2];
	  vertexArray[i][3] = mesh->vertices[i].st[0];
	  vertexArray[i][4] = 1.0f - mesh->vertices[i].st[1];
    }

  
}

void md5load::draw (float x, float y, float z, float scale)
//void md5load::draw ()
{
  int i;
  static float angle = 0;
  static double curent_time = 0;
  static double last_time = 0;

  last_time = curent_time;
  curent_time = (double)glutGet (GLUT_ELAPSED_TIME) / 1000.0;

  glLoadIdentity ();

  if (drawTexture == true)
  {
	glPolygonMode (GL_FRONT_AND_BACK, GL_FILL);
  }
  else
  {
	glPolygonMode (GL_FRONT_AND_BACK, GL_LINE);
  }

  glTranslatef (x, y, z);
  //glTranslatef(0.0f, -35.0f, -150.0f);

  glRotatef (-90.0f, 1.0, 0.0, 0.0);

  glScalef(scale, scale, scale);

  if (rotate == true)
  {
	glRotatef (angle, 0.0, 0.0, 1.0);
  }

  angle += 25 * (curent_time - last_time);

  if (angle > 360.0f)
    angle -= 360.0f;

  if (animated)
    {
 //     /* Calculate current and next frames */
      Animate (&md5anim, &animInfo, curent_time - last_time);

 //     /* Interpolate skeletons between two frames */
      InterpolateSkeletons (md5anim.skelFrames[animInfo.curr_frame],
			    md5anim.skelFrames[animInfo.next_frame],
			    md5anim.num_joints,
			    animInfo.last_time * md5anim.frameRate,
			    skeleton);
    }
  else
    {
      /* No animation, use bind-pose skeleton */
      skeleton = md5file.baseSkel;
    }


 // /* Draw skeleton */
  if (drawSkeleton == true)
  {
	DrawSkeleton (skeleton, md5file.num_joints);
  }


 //// /* Draw each mesh of the model */
  for (i = 0; i < md5file.num_meshes; ++i)
    {

	  glBindTexture(GL_TEXTURE_2D, modeltexture);

      PrepareMesh (&md5file.meshes[i], skeleton);

      glVertexPointer (3,GL_FLOAT,sizeof(GL_FLOAT)*5,vertexArray);

	  char *evilPointer = (char *)vertexArray;
	  evilPointer+=sizeof(GL_FLOAT)*3;
	  glTexCoordPointer(2,GL_FLOAT,sizeof(GL_FLOAT)*5,evilPointer);

      glDrawElements (GL_TRIANGLES, md5file.meshes[i].num_tris * 3, GL_UNSIGNED_INT, vertexIndices);
    }

}

I assume that i can (??!!) move some of the above calculations (executed per frame) to the Shader. Which will give some breathing room to the CPU.

UPDATE: Gathered some information using milliseconds
1. My Engine: Matrix Skinning

  • Rendered Objects: Terrain, Trees , Player -> Milliseconds: ~2 MS
  • Rendered Objects: Only Animated MD5 Mesh -> Milliseconds: ~ 18MS

2. OGL Dev Tutorial 38: Matrix Skinning

  • Rendered Objects: None -> Milliseconds: ~1 MS
  • Rendered Objects: Only Animated MD5 Mesh -> Milliseconds: ~ 17 MS
  • Rendered Objects: Animated Mesh x5 -> Milliseconds: ~ 47 MS (Lag)

3. MD5 Fixed Pipeline Skinning: CPU Quaternions Calculation

  • Rendered Objects: None -> Milliseconds: ~15 MS (Strange Here i get MS delay without any rendering)
  • Rendered Objects: Only Animated MD5 Mesh -> Milliseconds: ~ 16 MS
  • Rendered Objects: Animated Mesh x 20 -> Milliseconds: ~ 16 MS
    (for some god forsaken reason this does not MOVE even slightly , i tried to render 20 animated models and its still 16 MS. FPS drops to 140.
  • Rendered Objects: Animated Mesh x 110 -> Milliseconds ~ 47 MS.

I am confused !

UPDATE 2: For some other god forsaken reason in Release mode : after rendering my entire scene + animated model performance is at steady 1 MS (in my engine that is) - big improvement indeed. If i try to render my scene + 110-120 Animated Models i get about 17-18 MS.
Still am sure that i can pull out even more performance out of this

PS: Just want to ask Question and straight this up in my head, please forgive my Ignorance:
So far i have seen two methods to implement skinning

  • Quaternions
  • Matrices
    My Questions is:
    1.Is it a good practice (or possible at all) to upload all the information to the shader and make ALL calculations ONLY there? 2.Since OGL dev’s tutorial does a lot of CPU math before he uploads to the shader , is it okay if i move most of this to shader or thats not how its done ?
  1. Same Question applies for Quaternions. Can’t i just upload all static data and calculate what ever needed in the shader or this will lead to unwanted CPU->GPU transfer of huge data? (now i am uploading ~72 matrices every frame to the shader needed for the skinning technique)
    Just for some reason i want to avoid having calculations connected with animation on the CPU

[QUOTE=Asmodeus;1266009]So far i have seen two methods to implement skinning

  • Quaternions
  • Matrices

My Questions is:

[ol]
[li] Is it a good practice (or possible at all) to upload all the information to the shader and make ALL calculations ONLY there? [/li][li] Since OGL dev’s tutorial does a lot of CPU math before he uploads to the shader , is it okay if i move most of this to shader or thats not how its done ? [/li][li] Same Question applies for Quaternions. Can’t i just upload all static data and calculate what ever needed in the shader or this will lead to unwanted CPU->GPU transfer of huge data? (now i am uploading ~72 matrices every frame to the shader needed for the skinning technique)[/li]Just for some reason i want to avoid having calculations connected with animation on the CPU
[/ol]

[/QUOTE]
Re #1 and #2, if you need more performance (lower time consumption per frame so you can fit more content in), you do whatever you need to to optimize your bottleneck. If you’re CPU bound, then you look at decreasing your CPU consumption by offloading calculations to the GPU. And it’s definitely practical (and possible) as I’ve done it.

Re 3, yes. Same question AFAICT. Further, matrix uploads are 12 floats per element, whereas Quaternion-Translation is 7 floats and Dual Quaternion is 8 floats. So you save with the latter two. But in this case, you’re only uploading in prep/setup, so the upload cost is less of an issue. However, smaller size per element nets you better texture cache coherency on the GPU, which affects run-time performance when you’re sampling and interpolating transforms on the GPU.

Very nice thanks i will try my best and see what comes out. I was wondering if someone knows how to load multiple animations for md5 models from assimp. So far i am able to only load one animation only if the mesh name matches the animation name. I had a look at the folder /code in assimp there i found a simple md5 importer and i inspected the source and from it i clearly see that LoadAnim method loads 1 animation with the same name as the mesh

If it helps, I just preload all of them when I load my models. I’m using a convention of “one assimp scene <-> one model (with multiple meshes)” so all of the animations belong to a single model anyway:

for (size_t i = 0; i < scene->mNumAnimations; ++i) {
if (scene->mAnimations[i]->mDuration > 0.0f) {
// add the animations
_animations.push_back(AnimEvaluator(scene->mAnimations[i]));
}
}

Are you sure you are talking about ms5 models ? Because i can not load more than one animation. Num animations is only 1 always

Are you certain the file has more than one animation?

I am using doom 3 models as samples . As far as i know md5 does not have any context in the .md5mesh specifying names of animations (similar to how obj has mtl file specified) or does it have . I am not certain here
Ps: how would you specify that model yo use multiple animations . Md5 documentations are somehow poor and assimp’s documentation on md5 isnt any better

After looking at assimp’s documentation, it’s coded to only load one XYZ.md5anim file corresponding to the XYZ.md5mesh file. However, nothing’s stopping you from manually loading all of the animation files yourself.
Use the importer to manually import each md5anim file and extract the animations from the returned aiScene pointer. Should work.

L.E.: Here’s the interesting bit (search “InternReadFile”). It’s perfectly happy to load just a md5mesh, md5camera or md5anim file, or any combination of them.

Very nice i am now able to load multiple animations. I was just wondering should i just stick with ogl devs tutorial because using that i get some horrible performance in debug mode i realize that i should not care about debug that much Anyway but its strange and ANNOYING.
So bottom line should i use it in my engine is it efficient enough or not ?

Ps: also i have tried running the ogls loader on my old laptop with intel core 2 duo and old x4000 radeon 512 mb. Strangely enough i get the exact performance as posed to my desktop pc with quad core amd chip and x7000 hd radeon - 17-18 Millsecs to render the scene in debug with the model being rendered only.

I have some update here. I played around with the OGL loader implemented it in my Game Engine with multiple texturing / multiple materials , multiple animation handling. I now notice very ANNOYING BUG , and i’d really appreciate if SOMEONE helps. (My Last post left unanswered !?)
Now I render the frame (with the model only) for ~17 MS , but if i try to move the GLFW window around my screen it starts lagging and stuttering. Seems like if the window is close to the top left of the screen , everything is good , but if i move it anywhere else on the screen it starts lagging and stuttering , like WTF ???
Also i really need an answer to my LAST post since i feel like this piece of code is not anywhere near OPTIMAL. However i have seen multiple people recommending OGL’s tutorial about skinning.
Thanks in advance

OK, for your debug post:
Me, and I assume most users that still post around here, either haven’t touched an OpenGL tutorial in a long time, so we have no idea what OGL’s code is all about or how fast it is, or have written their own tutorials and will obviously tell you to ditch OGL and use the links they will surely provide instead. Either way, it’s hard to get a clear answer.
As to the question itself, everyone gets bad performance in debug builds, but do the basic stuff: disable iterator debugging, secure scl, debug heap(if you’re brave), and so forth. But you will loose all the good bits about a debug build. Create a profile build (release with symbols) and profile the code. Look for obvious bottlenecks. You’re surely CPU bottlenecked (and your code is definitely singlethreaded) so a core 2 duo and the quad core AMD will perform close to each other.

Now, to the update:
Is it lagging and stuttering after you drop it off at the target position? Try setting the window to a position that it stutters at in your code and see if it’s related to moving the window or just displaying at a certain location (both shouldn’t happen, but hey, GLFW performs fine on my end).

Ok I will try what you suggest.
About the OGL dev , i have looked at the code countless times. The Basic Thing that it does(per frame) is Position/Rotation(Quat)/Scaling interpolation. I have found that for a particular Model with 71 Bones , a Certain Function Runs ~74 Times total (for all Parents and Children). That Function Performs the interpolations and calculates final matrices before sent to the shader. The particular function performs 3 Separate Matrix Calculations for each iteration => 74x3 = 222 Matrix Calculations CPU side every frame for a single model. That is basically how the code works (roughly).
Here is how it Looks. It is a recursive function. Starts off with the parent , down to the children


void MD5Import::ReadNodeHeirarchy(GLuint &anim_index, float AnimationTime, const aiNode* pNode, const aiMatrix4x4& ParentTransform)
{
	string NodeName(pNode->mName.data);
	const aiAnimation* pAnimation = m_pAnim[anim_index]->mAnimations[0];
	const aiNodeAnim* pNodeAnim = FindNodeAnim(pAnimation, NodeName);
	aiMatrix4x4 NodeTransformation(pNode->mTransformation);

	if (pNodeAnim) {
		aiVector3D Scaling;
		aiVector3D Translation;
		aiQuaternion RotationQ;

		CalcInterpolatedScaling(Scaling, AnimationTime, pNodeAnim);
		CalcInterpolatedRotation(RotationQ, AnimationTime, pNodeAnim);
		CalcInterpolatedPosition(Translation, AnimationTime, pNodeAnim);

		NodeTransformation.Scaling(Scaling, NodeTransformation);
		NodeTransformation.Translation(Translation, NodeTransformation);
		NodeTransformation *= aiMatrix4x4(RotationQ.GetMatrix());

	}
	aiMatrix4x4 GlobalTransformation = ParentTransform*NodeTransformation;

	if (m_BoneMapping.find(NodeName) != m_BoneMapping.end()) {
		uint BoneIndex = m_BoneMapping[NodeName];
		m_BoneInfo[BoneIndex].FinalTransformation = m_GlobalInverseTransform * GlobalTransformation * m_BoneInfo[BoneIndex].BoneOffset;
	}

	for (uint i = 0; i < pNode->mNumChildren; i++) {
		ReadNodeHeirarchy(anim_index,AnimationTime, pNode->mChildren[i], GlobalTransformation);
	}
}

  1. NodeTransformation = aiMatrix4x4(RotationQ.GetMatrix());
    2.aiMatrix4x4 GlobalTransformation =ParentTransform
    NodeTransformation;
    3.m_BoneInfo[BoneIndex].FinalTransformation = m_GlobalInverseTransform * GlobalTransformation * m_BoneInfo[BoneIndex].BoneOffset;

Don’t know why but i still think that 222 multiplications of 4x4 matrices seems very expensive to me!

Now add multiple, different nodes, each with their unique animations all in view. Won’t be pretty.
I highly recommend taking a look atScott’s assimp animation importer. Hes approach is to basically cache animations at load time and just looking up the relative transforms for the current timestamp for the current animation index. You’ll be trading up memory usage for performance, but if you handle animation lifetime manually, that shouldn’t be much of an issue.
The ultimate goal would be to just store dual-quaternion rotations and compute animation transforms on the GPU, but using the above code, I’m nowhere near bottlenecked by animation code.
Be warned though, it’s pretty complicated and in depth.

P.S.: The code was originally written for a D3D renderer, but porting to OGL just required a couple of transposes here and there.

Thanks i will have a look.
So you share my thoughts on the OGL dev’s code (atleast that snippet which is the main bottleneck) being non Optimal , Non-Efficient ?

I don’t think any tutorial’s main concern is efficiency as much as readability. Add animations into the mix, which are math intensive and it becomes really hard to have both easy to read code and fast to run code.

The snippet does what it’s suppose to do: shows you how to properly calculate all of the bone transforms for the current time stamp. Nothing more. You could cache it yourself at a higher level and suddenly, it becomes better. For example, get the animation frame, use it as a key in a map that has all of the transforms for said frame calculated with that function.

Basically what you mean is to pre-calculate and Cache (Store) all needed transformations for each frame and push whatever set of transformations needed to the shader each frame and let the shader do the skinning as it currently does ?

Yes.
Avoid calculating all of that stuff multiple times if the result is always the same and your RAM allows it.

Okay after trying to Cache the matrices i run into a rather strange problem. The Animation Renders but its very quick . (Now i am caching matrices, for example for an animation with duration 17 and 24 Ticks a Second i generate 140 matrices and in the game loop , each game loop i increment the index from the matrices stack) Assuming that now when it does not calculates all that over and over the upload to the shader happen rather instantaneous
I have came up with something like the code below. Keep in mind it is just an example (i tested) and not something that is going to stick, just for the sake of test


if (delay% 13 == 0)
		{
			->Start Shader
			->Load Current Matrix to Shader
			->Stop Shader
		}
		delay++;
                RenderMesh();

Code Above gives me pretty satisfying results in terms of slowing down the animation while not slowing down the entire pipeline or thread. Also the animation is very smooth i do not see any stutering and the scene renders for 0-1MS (over 1kFPS) Is it a good resolution or i should look up for something else