PDA

View Full Version : MD5 Animation. Performance issues



Asmodeus
04-25-2015, 04:35 AM
A time has come when i had to implement an Animation to my Engine. I started with standart MD5 mesh/anim format since i find it easy to understand. I managed to implement a simple MD5 loader and i can load MD5 meshes along with their animations. Animations work fine , i am using GPU skinning and CPU interpolation. Here comes my Problem: Performance (you can say i am obsessed with it). Currently i use Tutorial from OglDev which implements MD5 Skeletal Animation Tutorial as an example. As i said animations work fine but performance is horrible. With one Single MD5 animated Model in my scene i get ~60-80 FPS. If i add more models (~3-4 models) FPS drops to ~12-20. I know what causes that FPS drop , since i am doing a lot of computations CPU side every frame: Interpolating, filling Bone Matrices etc. Here is The Hefty CPU side part which runs every frame to compute Each Bone Transformation Matrices , from there i upload this Bone transformation Matrices to The shader and apply the skinning.



void SkinnedMesh::BoneTransform(float TimeInSeconds, vector<Matrix4f>& Transforms)
{
Matrix4f Identity;
Identity.InitIdentity();

float TicksPerSecond = (float)(m_pScene->mAnimations[0]->mTicksPerSecond != 0 ? m_pScene->mAnimations[0]->mTicksPerSecond : 25.0f);
float TimeInTicks = TimeInSeconds * TicksPerSecond;
float AnimationTime = fmod(TimeInTicks, (float)m_pScene->mAnimations[0]->mDuration);

ReadNodeHeirarchy(AnimationTime, m_pScene->mRootNode, Identity);

Transforms.resize(m_NumBones);

for (uint i = 0 ; i < m_NumBones ; i++) {
Transforms[i] = m_BoneInfo[i].FinalTransformation;
}
}


Function Above takes vector of transformation matrices for each bone (i am working with a model that has ~71 Bones)

This function calculates interpolations etc..



void SkinnedMesh::ReadNodeHeirarchy(float AnimationTime, const aiNode* pNode, const Matrix4f& ParentTransform)
{
string NodeName(pNode->mName.data);

const aiAnimation* pAnimation = m_pScene->mAnimations[0];

Matrix4f NodeTransformation(pNode->mTransformation);

const aiNodeAnim* pNodeAnim = FindNodeAnim(pAnimation, NodeName);

if (pNodeAnim) {
// Interpolate scaling and generate scaling transformation matrix
aiVector3D Scaling;
CalcInterpolatedScaling(Scaling, AnimationTime, pNodeAnim);
Matrix4f ScalingM;
ScalingM.InitScaleTransform(Scaling.x, Scaling.y, Scaling.z);

// Interpolate rotation and generate rotation transformation matrix
aiQuaternion RotationQ;
CalcInterpolatedRotation(RotationQ, AnimationTime, pNodeAnim);
Matrix4f RotationM = Matrix4f(RotationQ.GetMatrix());

// Interpolate translation and generate translation transformation matrix
aiVector3D Translation;
CalcInterpolatedPosition(Translation, AnimationTime, pNodeAnim);
Matrix4f TranslationM;
TranslationM.InitTranslationTransform(Translation. x, Translation.y, Translation.z);

// Combine the above transformations
NodeTransformation = TranslationM * RotationM * ScalingM;
}

Matrix4f GlobalTransformation = ParentTransform * NodeTransformation;

if (m_BoneMapping.find(NodeName) != m_BoneMapping.end()) {
uint BoneIndex = m_BoneMapping[NodeName];
m_BoneInfo[BoneIndex].FinalTransformation = m_GlobalInverseTransform * GlobalTransformation * m_BoneInfo[BoneIndex].BoneOffset;
}

for (uint i = 0 ; i < pNode->mNumChildren ; i++) {
ReadNodeHeirarchy(AnimationTime, pNode->mChildren[i], GlobalTransformation);
}
}

I wont upload the following functions which are being called in ReadNodeHeirarchy , but you get the idea . The CPU calculations per frame are A LOT. Thus i get terrible performance. I am Quite new at animation and skinning. Somewhere i was reading about using Quaternions insted of Bone Transformation matrices. I am open to any suggestions on how i may improve or even change my animating technique.

PS: Vertex Skinning Shader Code:


#version 330

layout (location = 0) in vec3 Position;
layout (location = 1) in vec2 TexCoord;
layout (location = 2) in vec3 Normal;
layout (location = 3) in ivec4 BoneIDs;
layout (location = 4) in vec4 Weights;

out vec2 TexCoord0;
out vec3 Normal0;
out vec3 WorldPos0;

const int MAX_BONES = 120;

uniform mat4 gWVP;
uniform mat4 gWorld;
uniform mat4 gBones[MAX_BONES];

void main()
{
mat4 BoneTransform = gBones[BoneIDs[0]] * Weights[0];
BoneTransform += gBones[BoneIDs[1]] * Weights[1];
BoneTransform += gBones[BoneIDs[2]] * Weights[2];
BoneTransform += gBones[BoneIDs[3]] * Weights[3];

vec4 PosL = BoneTransform * vec4(Position, 1.0);
gl_Position = gWVP * PosL;
TexCoord0 = TexCoord;
vec4 NormalL = BoneTransform * vec4(Normal, 0.0);
Normal0 = (gWorld * NormalL).xyz;
WorldPos0 = (gWorld * PosL).xyz;
}

Dark Photon
04-25-2015, 06:50 PM
Animations work fine , i am using GPU skinning and CPU interpolation.

Here comes my Problem: Performance ... performance is horrible. With one Single MD5 animated Model in my scene i get ~60-80 FPS. If i add more models (~3-4 models) FPS drops to ~12-20.

Have you disabled VSync? If not, do so. All of the above are nice, round numbers that are precise multiples of 60Hz VSync intervals (16.6ms), suggesting you have VSync enabled. Always disable VSync when profiling rendering performance.

After that, I suggest that you report performance with frame time (in ms -- aka milliseconds). FPS is a really poor way to measure performance (for instance, read this (http://www.humus.name/index.php?page=Comments&ID=279)).


I know what causes that FPS drop , since i am doing a lot of computations CPU side every frame: Interpolating, filling Bone Matrices etc. ... The Hefty CPU side part which runs every frame to compute Each Bone Transformation Matrices , from there i upload this Bone transformation Matrices to The shader and apply the skinning. ...The CPU calculations per frame are A LOT. Thus i get terrible performance.

You can pretty easily move the keyframe interpolation to the GPU. Been there, done that. Just upload all your animation track joint transforms to the GPU in a texture (all joints, all timesteps), and then sample and interpolate the appropriate joint transforms in your shader before skinning the vertex. When animating a character with a single pre-modeled skeletal animation track, this completely removes the need to 1) perform any CPU-side joint transform computations and 2) perform CPU-to-GPU joint transform palette uploads.

You can represent these joint transforms in whatever form is convenient for you. I'd recommend you use Dual Quaternions: they have some very nice advantages (search the archives of these forums for details: link (https://www.google.com/search?q=%22dual+quaternion%22+site%3Aopengl.org)) . Quaternion/Translation form is another option, but it's not as flexible and it requires some expensive special handling you'd probably rather just skip. That said, both of these are better than Matrices (in skinning quality and size), but I'd start with Matrices since you've got those handy and you know how they work. Add Dual Quaternions once you get GPU-side keyframe interpolation working with Matrices.

I should also mention that if interpolating joint transforms for 3-4 models with 71 joints each on the CPU is bringing your render to it's knees (12-20 fps = 50-83ms = 4-5 full 60Hz VSync intervals!!!), then even with CPU-side interpolation, something is likely very wrong in your code. On a decent CPU, you can perform interpolations for 100+ characters without slowing your frame rate below 16.6ms (60 FPS). I would do some CPU-side profiling to check the cost of each phase of your CPU processing (time them in ms). It's also possible that the method you are using to upload data to the GPU is slow. I'd profile that too (disable the GPU upload and see how performance changes).

Asmodeus
04-26-2015, 01:02 AM
Thank you very much. Useful Information
I realize that FPS is not appropriate to mesure performance, but for the sake of the argument i will use it just for the example below.
But First off this piece of code is straight from OGL Dev Tutorial 38. First tests i made were directly on his tutorial. The VSync is disabled (i presume) because if i try to render other static objects FPS goes up to above +100. Also when i implemented this in my Engine (where i am using glfw and i have disabled VSync with this - glfwSwapIntervals(0)) I can render static objects with ~1000 FPS , if i add in the Animated Model fps drops to ~70. I have not touched the code in any means , its straight from OGL dev Where he gets same low performance. I do not pretend that i understand everything in his code but from first glance everyone can see that its performing hefty CPU tasks every frame.
Two Days ago i decided to ditch his tutorial and look up more in the internet and i found an example which uses Quaternions. This Example does not use shaders (Pure Old Fixed Pipeline) therefore Calculations are again performed CPU side. In that particular example i render the very same MD5 Model with more that 1700 k steady FPS. I have looked up in the code but sadly i am not very familiar with the Quaternion math. But my assumption is that i can upload those calculations to the Shaders and make the GPU move its a*s a bit.
My only Conclusion is that OGL's tutorial is only a simple example of skinning (skeleton animation) and not something you can implement straight away from there (maybe just guidelines).
Here are some snippets from the MD5 Fixed Pipeline Example i was talking about above:


void md5load::InterpolateSkeletons (const struct md5_joint_t *skelA, const struct md5_joint_t *skelB, int num_joints, float interp, struct md5_joint_t *out)
{
int i;

for (i = 0; i < num_joints; ++i)
{
/* Copy parent index */
out[i].parent = skelA[i].parent;

/* Linear interpolation for position */
out[i].pos[0] = skelA[i].pos[0] + interp * (skelB[i].pos[0] - skelA[i].pos[0]);
out[i].pos[1] = skelA[i].pos[1] + interp * (skelB[i].pos[1] - skelA[i].pos[1]);
out[i].pos[2] = skelA[i].pos[2] + interp * (skelB[i].pos[2] - skelA[i].pos[2]);

/* Spherical linear interpolation for orientation */
Quat_slerp (skelA[i].orient, skelB[i].orient, interp, out[i].orient);
}
}

void md5load::Animate (const struct md5_anim_t *anim, struct anim_info_t *animInfo, double dt)
{
int maxFrames = anim->num_frames - 1;

animInfo->last_time += dt;

/* move to next frame */
if (animInfo->last_time >= animInfo->max_time)
{
animInfo->curr_frame++;
animInfo->next_frame++;
animInfo->last_time = 0.0;

if (animInfo->curr_frame > maxFrames)
animInfo->curr_frame = 0;

if (animInfo->next_frame > maxFrames)
animInfo->next_frame = 0;
}
}

/**
* Prepare a mesh for drawing. Compute mesh's final vertex positions
* given a skeleton. Put the vertices in vertex arrays.
*/
void md5load::PrepareMesh (const struct md5_mesh_t *mesh, const struct md5_joint_t *skeleton)

{
int i, j, k;

/* Setup vertex indices */
for (k = 0, i = 0; i < mesh->num_tris; ++i)
{
for (j = 0; j < 3; ++j, ++k)
vertexIndices[k] = mesh->triangles[i].index[j];
}

/* Setup vertices */
for (i = 0; i < mesh->num_verts; ++i)
{
vec3_t finalVertex = { 0.0f, 0.0f, 0.0f };

/* Calculate final vertex to draw with weights */
for (j = 0; j < mesh->vertices[i].count; ++j)
{
const struct md5_weight_t *weight
= &mesh->weights[mesh->vertices[i].start + j];
const struct md5_joint_t *joint
= &skeleton[weight->joint];

/* Calculate transformed vertex for this weight */
vec3_t wv;
Quat_rotatePoint (joint->orient, weight->pos, wv);

/* The sum of all weight->bias should be 1.0 */
finalVertex[0] += (joint->pos[0] + wv[0]) * weight->bias;
finalVertex[1] += (joint->pos[1] + wv[1]) * weight->bias;
finalVertex[2] += (joint->pos[2] + wv[2]) * weight->bias;
}

vertexArray[i][0] = finalVertex[0];
vertexArray[i][1] = finalVertex[1];
vertexArray[i][2] = finalVertex[2];
vertexArray[i][3] = mesh->vertices[i].st[0];
vertexArray[i][4] = 1.0f - mesh->vertices[i].st[1];
}


}

void md5load::draw (float x, float y, float z, float scale)
//void md5load::draw ()
{
int i;
static float angle = 0;
static double curent_time = 0;
static double last_time = 0;

last_time = curent_time;
curent_time = (double)glutGet (GLUT_ELAPSED_TIME) / 1000.0;

glLoadIdentity ();

if (drawTexture == true)
{
glPolygonMode (GL_FRONT_AND_BACK, GL_FILL);
}
else
{
glPolygonMode (GL_FRONT_AND_BACK, GL_LINE);
}

glTranslatef (x, y, z);
//glTranslatef(0.0f, -35.0f, -150.0f);

glRotatef (-90.0f, 1.0, 0.0, 0.0);

glScalef(scale, scale, scale);

if (rotate == true)
{
glRotatef (angle, 0.0, 0.0, 1.0);
}

angle += 25 * (curent_time - last_time);

if (angle > 360.0f)
angle -= 360.0f;

if (animated)
{
// /* Calculate current and next frames */
Animate (&md5anim, &animInfo, curent_time - last_time);

// /* Interpolate skeletons between two frames */
InterpolateSkeletons (md5anim.skelFrames[animInfo.curr_frame],
md5anim.skelFrames[animInfo.next_frame],
md5anim.num_joints,
animInfo.last_time * md5anim.frameRate,
skeleton);
}
else
{
/* No animation, use bind-pose skeleton */
skeleton = md5file.baseSkel;
}


// /* Draw skeleton */
if (drawSkeleton == true)
{
DrawSkeleton (skeleton, md5file.num_joints);
}


//// /* Draw each mesh of the model */
for (i = 0; i < md5file.num_meshes; ++i)
{

glBindTexture(GL_TEXTURE_2D, modeltexture);

PrepareMesh (&md5file.meshes[i], skeleton);

glVertexPointer (3,GL_FLOAT,sizeof(GL_FLOAT)*5,vertexArray);

char *evilPointer = (char *)vertexArray;
evilPointer+=sizeof(GL_FLOAT)*3;
glTexCoordPointer(2,GL_FLOAT,sizeof(GL_FLOAT)*5,ev ilPointer);

glDrawElements (GL_TRIANGLES, md5file.meshes[i].num_tris * 3, GL_UNSIGNED_INT, vertexIndices);
}

}


I assume that i can (??!!) move some of the above calculations (executed per frame) to the Shader. Which will give some breathing room to the CPU.

UPDATE: Gathered some information using milliseconds
1. My Engine: Matrix Skinning
- Rendered Objects: Terrain, Trees , Player -> Milliseconds: ~2 MS
- Rendered Objects: Only Animated MD5 Mesh -> Milliseconds: ~ 18MS

2. OGL Dev Tutorial 38: Matrix Skinning
- Rendered Objects: None -> Milliseconds: ~1 MS
- Rendered Objects: Only Animated MD5 Mesh -> Milliseconds: ~ 17 MS
- Rendered Objects: Animated Mesh x5 -> Milliseconds: ~ 47 MS (Lag)

3. MD5 Fixed Pipeline Skinning: CPU Quaternions Calculation
- Rendered Objects: None -> Milliseconds: ~15 MS (Strange Here i get MS delay without any rendering)
- Rendered Objects: Only Animated MD5 Mesh -> Milliseconds: ~ 16 MS
- Rendered Objects: Animated Mesh x 20 -> Milliseconds: ~ 16 MS
(for some god forsaken reason this does not MOVE even slightly , i tried to render 20 animated models and its still 16 MS. FPS drops to 140.
- Rendered Objects: Animated Mesh x 110 -> Milliseconds ~ 47 MS.

I am confused !

UPDATE 2: For some other god forsaken reason in Release mode : after rendering my entire scene + animated model performance is at steady 1 MS (in my engine that is) - big improvement indeed. If i try to render my scene + 110-120 Animated Models i get about 17-18 MS.
Still am sure that i can pull out even more performance out of this


PS: Just want to ask Question and straight this up in my head, please forgive my Ignorance:
So far i have seen two methods to implement skinning
- Quaternions
- Matrices
My Questions is:
1.Is it a good practice (or possible at all) to upload all the information to the shader and make ALL calculations ONLY there? 2.Since OGL dev's tutorial does a lot of CPU math before he uploads to the shader , is it okay if i move most of this to shader or thats not how its done ?
3. Same Question applies for Quaternions. Can't i just upload all static data and calculate what ever needed in the shader or this will lead to unwanted CPU->GPU transfer of huge data? (now i am uploading ~72 matrices every frame to the shader needed for the skinning technique)
Just for some reason i want to avoid having calculations connected with animation on the CPU

Dark Photon
04-28-2015, 07:29 AM
So far i have seen two methods to implement skinning
- Quaternions
- Matrices

My Questions is:


Is it a good practice (or possible at all) to upload all the information to the shader and make ALL calculations ONLY there?
Since OGL dev's tutorial does a lot of CPU math before he uploads to the shader , is it okay if i move most of this to shader or thats not how its done ?
Same Question applies for Quaternions. Can't i just upload all static data and calculate what ever needed in the shader or this will lead to unwanted CPU->GPU transfer of huge data? (now i am uploading ~72 matrices every frame to the shader needed for the skinning technique)
Just for some reason i want to avoid having calculations connected with animation on the CPU



Re #1 and #2, if you need more performance (lower time consumption per frame so you can fit more content in), you do whatever you need to to optimize your bottleneck. If you're CPU bound, then you look at decreasing your CPU consumption by offloading calculations to the GPU. And it's definitely practical (and possible) as I've done it.

Re 3, yes. Same question AFAICT. Further, matrix uploads are 12 floats per element, whereas Quaternion-Translation is 7 floats and Dual Quaternion is 8 floats. So you save with the latter two. But in this case, you're only uploading in prep/setup, so the upload cost is less of an issue. However, smaller size per element nets you better texture cache coherency on the GPU, which affects run-time performance when you're sampling and interpolating transforms on the GPU.

Asmodeus
04-28-2015, 11:05 AM
Very nice thanks i will try my best and see what comes out. I was wondering if someone knows how to load multiple animations for md5 models from assimp. So far i am able to only load one animation only if the mesh name matches the animation name. I had a look at the folder /code in assimp there i found a simple md5 importer and i inspected the source and from it i clearly see that LoadAnim method loads 1 animation with the same name as the mesh

IonutCava
04-28-2015, 12:13 PM
If it helps, I just preload all of them when I load my models. I'm using a convention of "one assimp scene <-> one model (with multiple meshes)" so all of the animations belong to a single model anyway:

for (size_t i = 0; i < scene->mNumAnimations; ++i) {
if (scene->mAnimations[i]->mDuration > 0.0f) {
// add the animations
_animations.push_back(AnimEvaluator(scene->mAnimations[i]));
}
}

Asmodeus
04-28-2015, 01:55 PM
Are you sure you are talking about ms5 models ? Because i can not load more than one animation. Num animations is only 1 always

Alfonse Reinheart
04-28-2015, 01:59 PM
Are you certain the file has more than one animation?

Asmodeus
04-28-2015, 02:03 PM
I am using doom 3 models as samples . As far as i know md5 does not have any context in the .md5mesh specifying names of animations (similar to how obj has mtl file specified) or does it have . I am not certain here
Ps: how would you specify that model yo use multiple animations . Md5 documentations are somehow poor and assimp's documentation on md5 isnt any better

IonutCava
04-28-2015, 03:00 PM
After looking at assimp's documentation, it's coded to only load one XYZ.md5anim file corresponding to the XYZ.md5mesh file. However, nothing's stopping you from manually loading all of the animation files yourself.
Use the importer to manually import each md5anim file and extract the animations from the returned aiScene pointer. Should work.

L.E.: Here's the interesting bit (https://github.com/assimp/assimp/blob/master/code/MD5Loader.cpp) (search "InternReadFile"). It's perfectly happy to load just a md5mesh, md5camera or md5anim file, or any combination of them.

Asmodeus
04-28-2015, 11:47 PM
Very nice i am now able to load multiple animations. I was just wondering should i just stick with ogl devs tutorial because using that i get some horrible performance in debug mode i realize that i should not care about debug that much Anyway but its strange and ANNOYING.
So bottom line should i use it in my engine is it efficient enough or not ?


Ps: also i have tried running the ogls loader on my old laptop with intel core 2 duo and old x4000 radeon 512 mb. Strangely enough i get the exact performance as posed to my desktop pc with quad core amd chip and x7000 hd radeon - 17-18 Millsecs to render the scene in debug with the model being rendered only.

Asmodeus
05-01-2015, 02:00 AM
I have some update here. I played around with the OGL loader implemented it in my Game Engine with multiple texturing / multiple materials , multiple animation handling. I now notice very ANNOYING BUG , and i'd really appreciate if SOMEONE helps. (My Last post left unanswered !?)
Now I render the frame (with the model only) for ~17 MS , but if i try to move the GLFW window around my screen it starts lagging and stuttering. Seems like if the window is close to the top left of the screen , everything is good , but if i move it anywhere else on the screen it starts lagging and stuttering , like WTF ??????
Also i really need an answer to my LAST post since i feel like this piece of code is not anywhere near OPTIMAL. However i have seen multiple people recommending OGL's tutorial about skinning.
Thanks in advance

IonutCava
05-01-2015, 07:01 AM
OK, for your debug post:
Me, and I assume most users that still post around here, either haven't touched an OpenGL tutorial in a long time, so we have no idea what OGL's code is all about or how fast it is, or have written their own tutorials and will obviously tell you to ditch OGL and use the links they will surely provide instead. Either way, it's hard to get a clear answer.
As to the question itself, everyone gets bad performance in debug builds, but do the basic stuff: disable iterator debugging, secure scl, debug heap(if you're brave), and so forth. But you will loose all the good bits about a debug build. Create a profile build (release with symbols) and profile the code. Look for obvious bottlenecks. You're surely CPU bottlenecked (and your code is definitely singlethreaded) so a core 2 duo and the quad core AMD will perform close to each other.

Now, to the update:
Is it lagging and stuttering after you drop it off at the target position? Try setting the window to a position that it stutters at in your code and see if it's related to moving the window or just displaying at a certain location (both shouldn't happen, but hey, GLFW performs fine on my end).

Asmodeus
05-01-2015, 07:20 AM
Ok I will try what you suggest.
About the OGL dev , i have looked at the code countless times. The Basic Thing that it does(per frame) is Position/Rotation(Quat)/Scaling interpolation. I have found that for a particular Model with 71 Bones , a Certain Function Runs ~74 Times total (for all Parents and Children). That Function Performs the interpolations and calculates final matrices before sent to the shader. The particular function performs 3 Separate Matrix Calculations for each iteration => 74x3 = 222 Matrix Calculations CPU side every frame for a single model. That is basically how the code works (roughly).
Here is how it Looks. It is a recursive function. Starts off with the parent , down to the children



void MD5Import::ReadNodeHeirarchy(GLuint &anim_index, float AnimationTime, const aiNode* pNode, const aiMatrix4x4& ParentTransform)
{
string NodeName(pNode->mName.data);
const aiAnimation* pAnimation = m_pAnim[anim_index]->mAnimations[0];
const aiNodeAnim* pNodeAnim = FindNodeAnim(pAnimation, NodeName);
aiMatrix4x4 NodeTransformation(pNode->mTransformation);

if (pNodeAnim) {
aiVector3D Scaling;
aiVector3D Translation;
aiQuaternion RotationQ;

CalcInterpolatedScaling(Scaling, AnimationTime, pNodeAnim);
CalcInterpolatedRotation(RotationQ, AnimationTime, pNodeAnim);
CalcInterpolatedPosition(Translation, AnimationTime, pNodeAnim);

NodeTransformation.Scaling(Scaling, NodeTransformation);
NodeTransformation.Translation(Translation, NodeTransformation);
NodeTransformation *= aiMatrix4x4(RotationQ.GetMatrix());

}
aiMatrix4x4 GlobalTransformation = ParentTransform*NodeTransformation;

if (m_BoneMapping.find(NodeName) != m_BoneMapping.end()) {
uint BoneIndex = m_BoneMapping[NodeName];
m_BoneInfo[BoneIndex].FinalTransformation = m_GlobalInverseTransform * GlobalTransformation * m_BoneInfo[BoneIndex].BoneOffset;
}

for (uint i = 0; i < pNode->mNumChildren; i++) {
ReadNodeHeirarchy(anim_index,AnimationTime, pNode->mChildren[i], GlobalTransformation);
}
}


1. NodeTransformation *= aiMatrix4x4(RotationQ.GetMatrix());
2.aiMatrix4x4 GlobalTransformation =ParentTransform*NodeTransformation;
3.m_BoneInfo[BoneIndex].FinalTransformation = m_GlobalInverseTransform * GlobalTransformation * m_BoneInfo[BoneIndex].BoneOffset;

Don't know why but i still think that 222 multiplications of 4x4 matrices seems very expensive to me!

IonutCava
05-01-2015, 08:18 AM
Now add multiple, different nodes, each with their unique animations all in view. Won't be pretty.
I highly recommend taking a look at Scott's assimp animation importer (http://nolimitsdesigns.com/game-design/animation-code-update/). Hes approach is to basically cache animations at load time and just looking up the relative transforms for the current timestamp for the current animation index. You'll be trading up memory usage for performance, but if you handle animation lifetime manually, that shouldn't be much of an issue.
The ultimate goal would be to just store dual-quaternion rotations and compute animation transforms on the GPU, but using the above code, I'm nowhere near bottlenecked by animation code.
Be warned though, it's pretty complicated and in depth.

P.S.: The code was originally written for a D3D renderer, but porting to OGL (https://www.youtube.com/watch?v=4YTDHwacPmA) just required a couple of transposes here and there.

Asmodeus
05-01-2015, 08:29 AM
Thanks i will have a look.
So you share my thoughts on the OGL dev's code (atleast that snippet which is the main bottleneck) being non Optimal , Non-Efficient ?

IonutCava
05-01-2015, 08:49 AM
I don't think any tutorial's main concern is efficiency as much as readability. Add animations into the mix, which are math intensive and it becomes really hard to have both easy to read code and fast to run code.

The snippet does what it's suppose to do: shows you how to properly calculate all of the bone transforms for the current time stamp. Nothing more. You could cache it yourself at a higher level and suddenly, it becomes better. For example, get the animation frame, use it as a key in a map that has all of the transforms for said frame calculated with that function.

Asmodeus
05-01-2015, 09:04 AM
Basically what you mean is to pre-calculate and Cache (Store) all needed transformations for each frame and push whatever set of transformations needed to the shader each frame and let the shader do the skinning as it currently does ?

IonutCava
05-01-2015, 09:22 AM
Yes.
Avoid calculating all of that stuff multiple times if the result is always the same and your RAM allows it.

Asmodeus
05-02-2015, 01:13 PM
Okay after trying to Cache the matrices i run into a rather strange problem. The Animation Renders but its very quick . (Now i am caching matrices, for example for an animation with duration 17 and 24 Ticks a Second i generate 140 matrices and in the game loop , each game loop i increment the index from the matrices stack) Assuming that now when it does not calculates all that over and over the upload to the shader happen rather instantaneous
I have came up with something like the code below. Keep in mind it is just an example (i tested) and not something that is going to stick, just for the sake of test


if (delay% 13 == 0)
{
->Start Shader
->Load Current Matrix to Shader
->Stop Shader
}
delay++;
RenderMesh();

Code Above gives me pretty satisfying results in terms of slowing down the animation while not slowing down the entire pipeline or thread. Also the animation is very smooth i do not see any stutering and the scene renders for 0-1MS (over 1kFPS) Is it a good resolution or i should look up for something else

Dark Photon
05-02-2015, 05:44 PM
It's not completely clear what you're doing. But as always, if your performance is sufficient, it's only an academic exercise to optimize further.

I see you loading matrices in your game loop, and that tells me that you haven't uploaded your full joint transform palette to the GPU (one transform per animation/keyframe/joint) as a preprocess, and pushed "game time" transform sampling and keyframe interpolation to the GPU. That's just fine if you don't need more speed.

Re your comment about the animation rendering very quickly and having to "slow it down". What you're doing works but is fragile. If your rendered frame rate changes, you've got to go tweak all those delays (expressed as "number of frames").

I suggest you instead use wall clock time as a global clock to gate the playback speed of your animations. Basically, there is a global time (wall clock time) measured in seconds. Your animation has a specified start time which is given in global time. Based on the current time you can compute how long the animation has been running (seconds). Your skeletal animation has a specified play speed (keyframes per second). So with a simple product you can compute which keyframe you should be on "this frame" (mod this product by the number of keyframes in the animation if this is a looping animation). Use this to determine which two keyframes you need to interpolate between to get a pose which is "in between" the poses specified (the fractional part of this keyframe gives you the blend percentage).

If you use this method, then your animation always updates at the appropriate "real-world" rate, regardless of whether your renderer is rendering frames at 1000Hz, 60Hz, 30Hz, or some other random frame rate -- even a variable one.

If you have Jason Gregory's "Game Engine Architecture" book, read up on the global clock animation method there (see time or clock in the index).

Asmodeus
05-03-2015, 12:31 PM
Got it Working, using the clock. Now the animation runs smooth and equally fast anywhere, very nice ,thanks again for the information. In your post you mentioned "... if this is a looping animation", it is indeed looping animation , but is it possible to apply a simple walk animation to play continuously when i press some key. At the moment i managed to make the animation play when i press a key , but when it reaches the end of the walk animation it loops back to the beginning and the model translates back (takes back several steps). Basically the idea is to make him walk indefinitely while a key is pressed without looping to its starting position

Dark Photon
05-04-2015, 07:07 AM
I assume you're talking about the character's pose, and not just the position at which the character is centered (its object space) as you can keep track of the accumulated position internal to your animation state.

Yeah, in the general case, one animation track isn't sufficient. You need feathered blending between animation tracks to do the start and stop smoothly (legs winding up and winding down). Once you're ready for this added realism, read up on Animation State Machines (aka Action State Machines), or ASMs. These let your friendly content modeler build these so your program can just read it an use it to generate smoothly-varying animation states.

Asmodeus
05-04-2015, 10:03 AM
I don't think we are talking about the same thing here. I think you are talking about smooth transitioning between different animations.Idle/Walk/Attack etc. Which is in my TODO list, as well. But I am talking about something else.
Lets take my Walk Animation for example , it has certain number of matrices generated and the vertex shader interpolates, and does the skinning process. Lets assume that the model starts at point A ,when the whole walk animation is played the model now resides at point B. Now the animation Loops and the model translates back to starting point A repeating the process over and over. My Question was how would i make the model advance in translation( in other words make it walk not just return to its starting position). I assume that after 1 animation cycle is played i should translate the model to the furthest location in space that the animation can get to (position B) Then when the walk start again the model will be at position B and when the animation finishes it will be translated to position C , then i can simply repeat the process and the model will actually walk ahead ??
Am i right or horribly wrong ?

Dark Photon
05-04-2015, 10:10 AM
Ah, I see! You have root motion baked into your animation track. You need to extract that and use it to move your character's object space around (i.e. use this to update the character's modeling transform).

(Sorry, I did misunderstand what you were describing.)

Asmodeus
05-04-2015, 11:07 AM
That's Right! Can you explain more in depth with some example steps ? What does root motion baking stands for ? Not very familiar with all graphics terms yet , thanks

Dark Photon
05-04-2015, 02:09 PM
If you want a good, coherent write-up of this and many other skeletal concepts, I highly recommend you pick up a copy of:

* Game Engine Architecture, 1st edition (http://www.amazon.com/Game-Engine-Architecture-Jason-Gregory/dp/1568814135/ref=sr_1_2/182-2595976-7181522?ie=UTF8&qid=1430769759&sr=8-2&keywords=game+engine+architecture) (Jason Gregory)

It's worth your time and money. There's a 2nd edition; it'll probably work too; but I don't have it.

Or for starters, just websearch it (e.g. root motion, root bone motion, motion extraction, etc.) Combine with the names of various skeletal tools/exporters to get targeted hits (Granny3D, Havok Behavior, Unity3D, Blender, etc.) Or do a targetted search on gamdev.net (site:gamedev.net). You'll find hits in many places. Any decent skeletal animation publishing tool will describe this.

But briefly, root motion is where the motion of the character through the world (translation, turning, rise-and-fall, etc.) is "baked" (pre-encoded) into the skeletal joint transform data somehow. Sometimes this is placed on the root joint. But depending the DCC (3D modeling/animation) tools used, a child joint which encodes the "local origin" of the entity might be used instead. In any case, the animation has some joint through which you can determine how the character needs to move (translate+rotate) through the world as time progresses.

Asmodeus
05-06-2015, 02:49 AM
Good God , i am so frustrated. I have been looking for some good md5 import/export for more recent versions of blender but i can not find any working. I found a working md5 mesh/anim importer but for blender 2.49b.I can import animation and mesh but i have no idea where the dope sheet is (if it even exists in 2.49b ?) Here is the solution on how to remove the root motion in animations but it is for recent versions of blender 2.7+: Removing Root Motion on Blender: https://www.youtube.com/watch?v=WXw73tBd1PM.

PS: I found a very nice Simple Model Loading Tool. But i get strange results when i try to import the md5mesh. (the tool is based on assimp)1794

PS2: Managed to find a workaround using 3DS Max and importing/exporting the models from there. Still Trying to figure out how to extract the root motion , tho !

Alfonse Reinheart
05-06-2015, 07:52 AM
Good God , i am so frustrated. I have been looking for some good md5 import/export for more recent versions of blender but i can not find any working.

Perhaps you should take that as a sign about how widely used MD5 is as a model format.

Asmodeus
05-06-2015, 08:03 AM
Yea i realize that, long before i started doing the implementation of the Loader in my Engine. But i find it somewhat easy to understand and i found really nice tutorial about how to implement animation. In future i plan to have Collada (.dae) as main animated models file format . Other cool thing is that you can find tons of test assets of MD5 in Doom3


UPDATE: Ditched the MD5 files from Doom 3, and tried to use other models. Since i have LoL on my PC i exported a model converted it to FBX and from FBX to MD5. It's working fantastic , animation and mesh look fine but there is a small hitch. The Picture below shows the model , the model is supposed be holding the axe with it's right hand, but the axe appears to be drawn at origin space (at the center of the model). The picture below is from 3DS Max (i tried to load the model in my engine , same exact problem appears) My Guess is that this is happening during conversion from FBX to MD5 (i do this convertion in blender)
1799
PS: Model Looks fine in FBX, model and animations are fine

Dark Photon
05-06-2015, 09:43 AM
Unless you're on a tight budget, you might consider purchasing Granny3D (http://www.radgametools.com/granny.html) and use its GR2 format for skeletal assets (and GState for Animation State Machines). If you needs are similar (and its sounding like they are), it'll save you a lot of trouble publishing and loading skeletal assets.

Asmodeus
05-06-2015, 11:36 AM
Well i have seen it but never really considered the idea. Thanks.
Any suggestions why the axe mesh would appear in origin space (e.g. local space 0,0,0)

Dark Photon
05-06-2015, 05:36 PM
Just a semi-random guess: could be the axe is built to be a skeletal attachment. That is, it could be you're supposed to decide when the axe is rendered with the character, and when it is, position it at the appropriate attachment joint in the character model (e.g. hand, chest mount, back mount, etc.).

Asmodeus
05-09-2015, 12:19 PM
I was wondering if someone here could offer me a nice working Converter from .dae to md5 ? I tried to use the blender converter (to md5) in blender 2.73 but i get some errors during conversion (however using the same blender converter i can convert .fbx to .md5 without any problems, except the axe problem). Also i CAN convert .fbx to md5 but as stated in my other post character's axe is miss placed. Assuming that the problem occurs while converting to md5. The Axe Problem however is not present in the .dae or .fbx files.

PS: I think i have figured out why the axe is miss places. The reason is that the Model has Root bone and The Axe has Root Bone aswell , and we cannot have 2 root bones in an MD5

PS2: Confirmed 1 Root Bone Models Work Fine. I will try to stick to 1 Root Bone Models , until i find out a workaround or i implement a collada loader

PS3: I just tried to load a .dae file to my MD5 loader and it loaded perfectly , with all animations working. No asserts , no exceptions, dead on loaded two different .dae models