velocity map for hardware skinning mesh.

Please excuse my bad english.

After playing Resident evil 5/Killzone 2 on PS3 it amazed me so much how motion blur can make low framerate look smooth/acceptable.

So I decided to try to implement motion blur by starting with the most basic method,the camera blur as describe in GPU gem 3 .

http://http.developer.nvidia.com/GPUGems3/gpugems3_ch27.html

The result look good but it lack the ability to support motion blur of individual object.

So I tried the velocity map method.

This method can handle the movement of rigid body (by there local transformation matrix) fine.

The only thing left is how to handle motion blur of animated mesh.

Since I use hardware skinning there no way I can get the transformed position of each vertex or the matrix that use to transform them back to CPU side.

even if I managed to get the transform matrix/position of the said vertex back , It still to expensive to send them as attribute varible for each vertex.

Can some one with experience on this topic give me some advice ?

You keep the mesh instance’s previous matrices/dual-quats. It’s just extra uniforms, no extra vertex-attribs necessary.


uniform mat4 FrameBones[256];
uniform mat4 mvp; // model_view_proj

uniform mat4 prev_FrameBones[256]; // add this
uniform mat4 prev_mvp; // add this




vec3 AnimateVertex(vec3 bindPosition,mat4 bones[],ivec4 indices,vec4 weights){ // by 4 bones
	...
	// uses FrameBones[]  or prev_FrameBones[]
}


out vec4 curPosition,prevPosition; // send to frag-shader, for clipspace-to-screenspace projection and thus to find the delta motion-vectors


void main(){
	curPosition = mvp * AnimateVertex(glVertex.xyz, FrameBones, attribIndices,attribWeights);
	prevPosition = prev_mvp * AnimateVertex(glVertex.xyz,  prev_FrameBones, attribIndices,attribWeights);
	
	gl_Position = curPosition;	
}

You can try to move skinning to fragment shader. Idea is to store vertex positions, weight, indices and matrices of all models in textures.


Texture 1 with vertices rgb32f:
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAABBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBCCCCCCCCCDDDDD
DDDDDDDDDEEEEEEEEEEEEEEEEE
EEEFFFFFFFFFF...
A, B, C, D, E, F is vertices of animated model A, B, C, D, E, F.

Texture 2 rgba32f weights map. Contains up to 4 weighting coeffs for each vertex:
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAABBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBCCCCCCCCCDDDDD
DDDDDDDDDEEEEEEEEEEEEEEEEE
EEEFFFFFFFFFF...


Texture 3, rgba16i bone index map: Contains bones indices (or mapping coordinates) per vertex. 

Texture 4 rgba32f: Contains bone matrices of all models in current animation state. 
Texture can be 1D or 2D (2048 x 1). 
For example:
1. tex = 2048 x 1; 4 seq texels contains one 4x4 matrix... max 512 bones
2. tex = 2048 x 4; 4 vertical texels contains one 4x4 matrix. max 2048 bones
3. tex = 4 x 2048; 4 horizontal texels contains one 4x4 matrix. max 2048 bones


In fragment shader do something like this:



vec4 pos = texture2D(tex1, texcoord);
vec4 weight = texture2D(tex2, texcoord);
vec4 indices = texture2D(tex3, texcoord);
mat4 skinmatrix = GetSkinMatrix(tex4, weight, indices);

vec4 result = skinmatrix * pos;

gl_FragColor = MV * result;


in GetSkinMatrix you need to fetch 4 times from tex4 based on indices to get one matrix. There is a up to 4 influence matrices per vertex which means 16 fetches per vertex.

mat4 GetSkinMatrix(sampler2D m, vec4 weight, vec4 indices)
{
  mat4 ret;
  for (int i=0; i<4; i++) // loop through indices
  {
    int bindex = indices[i]; // get bone index
    mat4 smat;
    // now we need to fetch 4 texels and build matrix. depending how is matrices stored in texture below code needs to be changed.
    smat[0] = texture2D(m, vec2(bindex, 0/4)); // assume 2048x4 texture
    smat[1] = texture2D(m, vec2(bindex, 1/4)); 
    smat[2] = texture2D(m, vec2(bindex, 2/4));
    smat[3] = texture2D(m, vec2(bindex, 3/4));
    
    // now we have one influence matrix. multiply it by proper weight and sum to final (per vertex) skin matrix
    ret += smat * weight[i];
  } 
  return ret;
}


You must turn off texture filtering and pass correct mapping coordinates.

To use this shader you need to render screen aligned quad in offscreen rgba32f render target. Destination texture will contain all vertices of all models skinned.
Run shader again with prev anim_matrix textures and prev modelview and store result in another offscreen texture.
You can use data from previous frame instead of running shader twice.

Per pixel distance is per vertex 3d motion vector.

You can readback both textures in two PBO’s and rebind PBO’s as VBO’s and use it in vertex shader in real rendering pass.

This is usefull in multipass rendering, when you have to deal with lighting and shadows, and you have to avoid multiple skinning (of same model) per frame. Even with some smart optimisation you can get instancing for free.

Shader work can be extended to offscreen MRT and if you have enough MRTs you can output pos, normal, tangent, binormal and even prev_pos in single (per frame) precalc pass.

Also you can consider OpenCL/CUDA version which can be easier to develop.

Sorry for my late reply.

I will try the method suggest by llian dinev first since it easier to implement in my frame work and shouldn’t affect performance so much (I think the application already at it fillrate limit from all the deferred shading/post processing stuff).

Yooyo,The hardware skinning caching method you suggest is very cool and yes I already facing problem about multiple skinning when dealing with shadowmap.

Are the method you suggest much faster than doing CPU skinning ?

Since after getting back transformed-vertex texture’s data via PBO (never tried it,does it allow directl access to texture data ?) I still have to convert it to VBO.

May be I should try to learn OpenCL and move all the CPU skining code there.

You need to copy texture data to PBO and then rebind that PBO as VBO. Its legal and it should work. Dunno about perofrmances.
You can try with transform feedback too, but you are limited only on newer hardware.

Just to complete the picture: consider using Transform Feedback for your skinning. It’s easier then texture approach, but requires more careful driver support (Catalyst 9.10+ for ATI).
The idea is simple: to send transformed vertex attributes to the VBO memory instead (or together) of the rasterizer.