Vertex skinning problem

I have following code in my vertex shader:

  
mat4 BuildSkinMatrix()
{
 vec4 b = vBoneIndex; // vertex attribute
 vec4 w = vWeight;  // vertex attribute
 mat4 skinmatrix;
 for (int i=0; i<4; i++)
 {
  //if (w[i] != 0.0)  // <- look at this line
  skinmatrix = skinmatrix + (w[i] * bones[int(b[i])]);
 }
	
 return skinmatrix;
}

If I comment if (w[i] != 0.0) statement shader works perfectly, but I want to optimize a bit (because a lot of vertices have 0.0 weight to bone indices 1,2 and 3) to avoid unwanted 0.0 * bone_matrix calculation. So, if I uncomment if (w[i] != 0.0) statement shader doesn’t work. Why?

yooyo

If the loop doesn’t work, does this?

 
mat4 BuildSkinMatrix()
{
  // vBoneIndex; // vertex attribute
  // vWeight; // vertex attribute
  mat4 skinmatrix; // Not initialized?

  if (vWeight.x != 0.0)
    skinmatrix += vWeight.x * bones[int(vBoneIndex.x)];
  if (vWeight.y != 0.0)
    skinmatrix += vWeight.y * bones[int(vBoneIndex.y)];
  if (vWeight.z != 0.0)
    skinmatrix += vWeight.z * bones[int(vBoneIndex.z)];
  if (vWeight.w != 0.0)
    skinmatrix += vWeight.w * bones[int(vBoneIndex.w)]);

  return skinmatrix; // undefined if all weights are 0.0
}

It doesn’t want to work at all. :frowning:

Here is a slightly modified version:

  
mat4 BuildSkinMatrix()
{
  mat4 skinmatrix = vWeight.x * bones[int(vBoneIndex.x)];
  if (vWeight.y != 0.0) skinmatrix += vWeight.y * bones[int(vBoneIndex.y)];
  if (vWeight.z != 0.0) skinmatrix += vWeight.z * bones[int(vBoneIndex.z)];
  if (vWeight.w != 0.0) skinmatrix += vWeight.w * bones[int(vBoneIndex.w)];

  return skinmatrix; 
}

Note that vWeight.x != 0.0 for all vertices.
btw… Im using FX5900 and FW 66.81

yooyo

Just a general question: How is weighted adding of matrices supposed to work for skinning?
For example you shouldn’t add two rotational matrices. Well, you can but the result is not what you wanted.
Last time I did skinning with two matrices some years ago, the resulting positions needed weighting.
If the bones are mat4, what exactly should weight * mat4 be? If it’s mat4(weight) * mat4 then only the diagonal is initialized. I think you (and the compiler) will do component wise multiplication mat4(weight, …(another 15 times)) * mat4.
Still I wouldn’t add matrices. :confused:

Indeed, the correct formula is:

u(t) = SUM {i=0, n-1} ( w[i] Bi M^-1(i) p
where SUM {i=0, n-1} ( w[i] ) = 1 & w[i] >= 0

w[i] = i-th weight
Bi = i-th bone animation matrix world space (so taking into account hierarchy and local animation)
M^-1[i] = i-th inverse of initial bone coord to world coord matrix
p = point/vertex in world coords
t = time

the normal must be transformed with the previous equation with a slight change : it uses the transpose of the inverse of ( Bi M^-1(i) ) matrix.

Sum of all weights are 1.0. Skinig formula look like:
[i]
() vertex_weight[i] is a float
(
) all matrices are mat4
skinned_vertex = sum{i in bones} (vertex_weight[i] * bone_matrix[i] * bind_pose_matrix_inv *bind_pose_vertex)

we can optimize formula by calculation matrices on CPU. So…
[i]
bone_matrix_composition = bone_matrix[i] * bind_pose_matrix_inv

and new formula looks:
[i]
skinned_vertex = sum{i in bones} (vertex_weight[i] * bone_matrix_composition * bind_pose_vertex)

If we are sure that we don’t have any non uniform scale in bone matrices further optimization can be:
[i]
skinned_vertex = sum{i in bones} (vertex_weight[i] * bone_matrix_composition) * bind_pose_vertex

and we can calculate skin matrix like:
[i]
skin_matrix = sum{i in bones} (vertex_weight[i] * bone_matrix_composition)

finally furmula is:

skinned_vertex = skin_matrix * bind_pose_vertex

This can be applyed to normals, tangents and binormals. This attributes are usually vec3.

skinned_normal = skin_matrix * vec4(bind_pose_normal, 0.0)
skinned_tangent = skin_matrix * vec4(bind_pose_tangent, 0.0)
skinned_binormal = skin_matrix * vec4(bind_pose_binormal, 0.0)

after all we have to transform skinned_vertex by glModelViewProjection and skinned_ (normal, tangent and binormal) by glNormalMatrix.

Please read THE OPENGL® SHADING LANGUAGE (GLSLangSpec.Full.1.10.59.pdf) page 32, last paragraph.

If one operand is scalar and the other is a vector or matrix, the scalar is applied component-wise to the vector or matrix, resulting in the same type as the vector or matrix.
yooyo

Ok, cool.
Thanks for the reminder, no excuses here like “Ah, that was new in the 059 spec”, I have that, too. :wink:

So what exactly means

It doesn’t want to work at all.
Have you tried downloading the nvemulate tool from here http://developer.nvidia.com/object/nvemulate.html
and looked at the shader’s assembly?

Interesting optimisation you got there, didn’t thought of it (but didn’t code a skinning shader yet either).

Next time I’ll wait until my brain exit power saving mode before answering, I just did notice “Im using FX5900 and FW 66.81”, and I’m not sure that hardware have branching support, neither what kind it supports. (Dynamic or static)

Try to check that, maybe it’s your problem.

[edit]
ok clearing up things.
Static Branching : branching based on a constant/Uniform.
Dynamic Branching : branching based on per vertex/fragment (depending if it’s VS/FS) value.

I don’t think the GFFX, neither ATI R3xx/R4xx support dynamic branching, which is what you are using.
[/edit]

Originally posted by yooyo:
[b]

  
 ...
 vec4 b = vBoneIndex; // vertex attribute
 ...
  skinmatrix = skinmatrix + (w[i] * bones[int(b[i])]);
 ...

[/b]
Why do you use this conversion to int? Just declare vBoneIndex as ivec4 - not as vec4.

There is glsl_skinning demo somethere on developer.nvidia.com - it has same float-to-int conversion. I just tried to remove this conversion and use ivec4 type for index attribute - resulting assembly is much smaller (69 instructions, 11 R-regs with float-to-int conversion / 34 instructions, 10 R-regs for ivec4 version) and demo runs 1.5X faster.

@Relic:
If I remove if statement shader works perfectly, but with if statement shader doesn’t work. :frowning:

There is a similar example in NVSDK (glsl_skinning) but without my optimization and it works with or without if statement!

I know for nvemulate and I was looking to precompiled shader. It uses BRA instruction for FOR loop but I didn’t see any BRA instruction for if statement.

@Ingenu
FX5900 reports NV_vertex_program2 extension. NVidia introduce branching and few more things in this extension.

Here is a my vertex shader code that work corretly wit asm out:

  
attribute vec4 vWeight;
attribute vec4 vBoneIndex;

uniform mat4 bones[30];
const vec4 nula = vec4(0.0, 0.0, 0.0, 0.0);

varying vec4 col;

mat4 BuildSkinMatrix()
{
	vec4 b = vBoneIndex;
	vec4 w = vWeight;	

	mat4 result;
	int i;
	for (i=0; i<4; i++)
	{
	 result = result + (w[i] * bones[int(b[i])]);
	}
	
	return result;
}

void main(void)
{
	vec4 vtx;
	vec4 nrm;

	mat4 skinmatrix = BuildSkinMatrix();
	
	vtx = skinmatrix * gl_Vertex;
	nrm = skinmatrix * vec4(gl_Normal, 0.0);
	nrm = vec4(gl_NormalMatrix * nrm.xyz, 0.0);
	col = nrm;
	gl_Position = gl_ModelViewProjectionMatrix * vtx;
}

!!VP2.0
# cgc version 1.3.0001, build date Sep 30 2004 14:14:01
# command line args: -q -profile vp30 -entry main -oglsl -D__GLSL_CG_DATA_TYPES -D__GLSL_CG_STDLIB -D__GLSL_SAMPLER_RECT
#vendor NVIDIA Corporation
#version 1.0.02
#profile vp30
#program main
#semantic gl_ModelViewProjectionMatrix
#semantic gl_NormalMatrix
#semantic bones
#var float3 gl_Normal : $vin.ATTR2 : ATTR2 : -1 : 1
#var float4 gl_Vertex : $vin.ATTR0 : ATTR0 : -1 : 1
#var float4 gl_Position : $vout.HPOS : HPOS : -1 : 1
#var float4x4 gl_ModelViewProjectionMatrix :  : c[0], 4 : -1 : 1
#var float3x3 gl_NormalMatrix :  : c[4], 3 : -1 : 1
#var float4 vWeight : $vin.ATTR13 : ATTR13 : -1 : 1
#var float4 vBoneIndex : $vin.ATTR14 : ATTR14 : -1 : 1
#var float4x4 bones[0] :  : c[7], 4 : -1 : 1
#var float4x4 bones[1] :  : c[11], 4 : -1 : 1
#var float4x4 bones[2] :  : c[15], 4 : -1 : 1
#var float4x4 bones[3] :  : c[19], 4 : -1 : 1
#var float4x4 bones[4] :  : c[23], 4 : -1 : 1
#var float4x4 bones[5] :  : c[27], 4 : -1 : 1
#var float4x4 bones[6] :  : c[31], 4 : -1 : 1
#var float4x4 bones[7] :  : c[35], 4 : -1 : 1
#var float4x4 bones[8] :  : c[39], 4 : -1 : 1
#var float4x4 bones[9] :  : c[43], 4 : -1 : 1
#var float4x4 bones[10] :  : c[47], 4 : -1 : 1
#var float4x4 bones[11] :  : c[51], 4 : -1 : 1
#var float4x4 bones[12] :  : c[55], 4 : -1 : 1
#var float4x4 bones[13] :  : c[59], 4 : -1 : 1
#var float4x4 bones[14] :  : c[63], 4 : -1 : 1
#var float4x4 bones[15] :  : c[67], 4 : -1 : 1
#var float4x4 bones[16] :  : c[71], 4 : -1 : 1
#var float4x4 bones[17] :  : c[75], 4 : -1 : 1
#var float4x4 bones[18] :  : c[79], 4 : -1 : 1
#var float4x4 bones[19] :  : c[83], 4 : -1 : 1
#var float4x4 bones[20] :  : c[87], 4 : -1 : 1
#var float4x4 bones[21] :  : c[91], 4 : -1 : 1
#var float4x4 bones[22] :  : c[95], 4 : -1 : 1
#var float4x4 bones[23] :  : c[99], 4 : -1 : 1
#var float4x4 bones[24] :  : c[103], 4 : -1 : 1
#var float4x4 bones[25] :  : c[107], 4 : -1 : 1
#var float4x4 bones[26] :  : c[111], 4 : -1 : 1
#var float4x4 bones[27] :  : c[115], 4 : -1 : 1
#var float4x4 bones[28] :  : c[119], 4 : -1 : 1
#var float4x4 bones[29] :  : c[123], 4 : -1 : 1
#var float4 col : $vout.TEX0 : TEX0 : -1 : 1
#const c[127] = 4 0
BB1:
MOV   o[TEX0].w, c[127].y;
FLR   R1.x, v[14].w;
FLR   R1.w, v[14].x;
FLR   R1.y, v[14].z;
FLR   R1.z, v[14].y;
MUL   R1.z, R1, c[127].x;
MUL   R1.y, R1, c[127].x;
MUL   R1.w, R1, c[127].x;
MUL   R1.x, R1, c[127];
ARL   A0.x, R1;
ARL   A0.w, R1;
ARL   A0.y, R1;
ARL   A0.z, R1;
MAD   R2, v[13].x, c[A0.w + 7], R0;
MAD   R3, v[13].x, c[A0.w + 8], R0;
MAD   R1, v[13].x, c[A0.w + 9], R0;
MAD   R0, v[13].x, c[A0.w + 10], R0;
MAD   R0, v[13].y, c[A0.z + 10], R0;
MAD   R1, v[13].y, c[A0.z + 9], R1;
MAD   R3, v[13].y, c[A0.z + 8], R3;
MAD   R2, v[13].y, c[A0.z + 7], R2;
MAD   R2, v[13].z, c[A0.y + 7], R2;
MAD   R3, v[13].z, c[A0.y + 8], R3;
MAD   R1, v[13].z, c[A0.y + 9], R1;
MAD   R0, v[13].z, c[A0.y + 10], R0;
MAD   R0, v[13].w, c[A0.x + 10], R0;
MAD   R1, v[13].w, c[A0.x + 9], R1;
MAD   R3, v[13].w, c[A0.x + 8], R3;
MAD   R2, v[13].w, c[A0.x + 7], R2;
MUL   R4.xyz, v[2].y, R3;
MUL   R3, v[0].y, R3;
MAD   R3, v[0].x, R2, R3;
MAD   R2.xyz, v[2].x, R2, R4;
MAD   R2.xyz, v[2].z, R1, R2;
MAD   R1, v[0].z, R1, R3;
MAD   R0, v[0].w, R0, R1;
ADD   R2.xyz, R2, c[127].y;
MUL   R1, R0.y, c[1];
MAD   R1, R0.x, c[0], R1;
MUL   R3.xyz, R2.y, c[5];
MAD   R3.xyz, R2.x, c[4], R3;
MAD   R1, R0.z, c[2], R1;
MAD   o[HPOS], R0.w, c[3], R1;
MAD   o[TEX0].xyz, R2.z, c[6], R3;
END
# 44 instructions, 5 R-regs

This shader doen’t work. Difference is if statement in BuildSkinMatrix function.

  
attribute vec4 vWeight;
attribute vec4 vBoneIndex;

uniform mat4 bones[30];
const vec4 nula = vec4(0.0, 0.0, 0.0, 0.0);

varying vec4 col;

mat4 BuildSkinMatrix()
{
	vec4 b = vBoneIndex;
	vec4 w = vWeight;	

	mat4 result;// = w[0] * bones[int(b[0])];
	int i;
	for (i=0; i<4; i++)
	{
	 if (w[i] != 0.0)
		 result = result + (w[i] * bones[int(b[i])]);
	}
	
	return result;
}

void main(void)
{
	vec4 vtx;
	vec4 nrm;

	mat4 skinmatrix = BuildSkinMatrix();
	
	vtx = skinmatrix * gl_Vertex;
	nrm = skinmatrix * vec4(gl_Normal, 0.0);
	nrm = vec4(gl_NormalMatrix * nrm.xyz, 0.0);
	col = nrm;
	gl_Position = gl_ModelViewProjectionMatrix * vtx;
}

!!VP2.0
# cgc version 1.3.0001, build date Sep 30 2004 14:14:01
# command line args: -q -profile vp30 -entry main -oglsl -D__GLSL_CG_DATA_TYPES -D__GLSL_CG_STDLIB -D__GLSL_SAMPLER_RECT
#vendor NVIDIA Corporation
#version 1.0.02
#profile vp30
#program main
#semantic gl_ModelViewProjectionMatrix
#semantic gl_NormalMatrix
#semantic bones
#var float3 gl_Normal : $vin.ATTR2 : ATTR2 : -1 : 1
#var float4 gl_Vertex : $vin.ATTR0 : ATTR0 : -1 : 1
#var float4 gl_Position : $vout.HPOS : HPOS : -1 : 1
#var float4x4 gl_ModelViewProjectionMatrix :  : c[0], 4 : -1 : 1
#var float3x3 gl_NormalMatrix :  : c[4], 3 : -1 : 1
#var float4 vWeight : $vin.ATTR13 : ATTR13 : -1 : 1
#var float4 vBoneIndex : $vin.ATTR14 : ATTR14 : -1 : 1
#var float4x4 bones[0] :  : c[7], 4 : -1 : 1
#var float4x4 bones[1] :  : c[11], 4 : -1 : 1
#var float4x4 bones[2] :  : c[15], 4 : -1 : 1
#var float4x4 bones[3] :  : c[19], 4 : -1 : 1
#var float4x4 bones[4] :  : c[23], 4 : -1 : 1
#var float4x4 bones[5] :  : c[27], 4 : -1 : 1
#var float4x4 bones[6] :  : c[31], 4 : -1 : 1
#var float4x4 bones[7] :  : c[35], 4 : -1 : 1
#var float4x4 bones[8] :  : c[39], 4 : -1 : 1
#var float4x4 bones[9] :  : c[43], 4 : -1 : 1
#var float4x4 bones[10] :  : c[47], 4 : -1 : 1
#var float4x4 bones[11] :  : c[51], 4 : -1 : 1
#var float4x4 bones[12] :  : c[55], 4 : -1 : 1
#var float4x4 bones[13] :  : c[59], 4 : -1 : 1
#var float4x4 bones[14] :  : c[63], 4 : -1 : 1
#var float4x4 bones[15] :  : c[67], 4 : -1 : 1
#var float4x4 bones[16] :  : c[71], 4 : -1 : 1
#var float4x4 bones[17] :  : c[75], 4 : -1 : 1
#var float4x4 bones[18] :  : c[79], 4 : -1 : 1
#var float4x4 bones[19] :  : c[83], 4 : -1 : 1
#var float4x4 bones[20] :  : c[87], 4 : -1 : 1
#var float4x4 bones[21] :  : c[91], 4 : -1 : 1
#var float4x4 bones[22] :  : c[95], 4 : -1 : 1
#var float4x4 bones[23] :  : c[99], 4 : -1 : 1
#var float4x4 bones[24] :  : c[103], 4 : -1 : 1
#var float4x4 bones[25] :  : c[107], 4 : -1 : 1
#var float4x4 bones[26] :  : c[111], 4 : -1 : 1
#var float4x4 bones[27] :  : c[115], 4 : -1 : 1
#var float4x4 bones[28] :  : c[119], 4 : -1 : 1
#var float4x4 bones[29] :  : c[123], 4 : -1 : 1
#var float4 col : $vout.TEX0 : TEX0 : -1 : 1
#const c[127] = 0
BB1:
MOV   R3.xyz, v[2];
MOVC  CC.x, v[13];
MOV   R0, v[0];
MOV   R1.yzw, v[13];
BRA   BB3 (EQ.x);
BB2:
MOV   R0, v[0];
MOV   R3.xyz, v[2];
BB3:
MOVC  CC.x, R1.y;
BRA   BB5 (EQ.x);
BB4:
BB5:
MOVC  CC.x, R1.z;
BRA   BB7 (EQ.x);
BB6:
BB7:
MOVC  CC.x, R1.w;
BRA   BB9 (EQ.x);
BB8:
BB9:
MOV   o[TEX0].w, c[127].x;
MUL   R2.xyz, R3.y, R0;
MUL   R1, R0.y, R0;
MAD   R1, R0.x, R0, R1;
MAD   R2.xyz, R3.x, R0, R2;
MAD   R2.xyz, R3.z, R0, R2;
MAD   R1, R0.z, R0, R1;
MAD   R0, R0.w, R0, R1;
ADD   R2.xyz, R2, c[127].x;
MUL   R1, R0.y, c[1];
MAD   R1, R0.x, c[0], R1;
MUL   R3.xyz, R2.y, c[5];
MAD   R3.xyz, R2.x, c[4], R3;
MAD   R1, R0.z, c[2], R1;
MAD   o[HPOS], R0.w, c[3], R1;
MAD   o[TEX0].xyz, R2.z, c[6], R3;
END
# 29 instructions, 4 R-regs

# 29 instructions, 4 R-regs

yooyo

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.