knackered

04-26-2002, 06:17 AM

If I do this:-

glLoadMatrixf(matrix1);

glMultMatrixf(matrix2);

instead of this:-

MatMult(matrix1, matrix2, matprod);

glLoadMatrixf(matprod);

I get a *significant* performance boost.

(I've shown my mult matrix code below).

Why is this?

The reason I ask is because I gathered from matt and cass that all matrix mults are done on the CPU, and the resultant matrix uploaded to the GPU...so, surely some mad assembly version couldn't be *that* much faster than this simple bit of C ?

outputmatrix[0]=matrix1[0]*matrix2[0]+matrix1[1]*matrix2[4]+matrix1[2]*matrix2[8]+matrix1[3]*matrix2[12];

outputmatrix[1]=matrix1[0]*matrix2[1]+matrix1[1]*matrix2[5]+matrix1[2]*matrix2[9]+matrix1[3]*matrix2[13];

outputmatrix[2]=matrix1[0]*matrix2[2]+matrix1[1]*matrix2[6]+matrix1[2]*matrix2[10]+matrix1[3]*matrix2[14];

outputmatrix[3]=matrix1[0]*matrix2[3]+matrix1[1]*matrix2[7]+matrix1[2]*matrix2[11]+matrix1[3]*matrix2[15];

outputmatrix[4]=matrix1[4]*matrix2[0]+matrix1[5]*matrix2[4]+matrix1[6]*matrix2[8]+matrix1[7]*matrix2[12];

outputmatrix[5]=matrix1[4]*matrix2[1]+matrix1[5]*matrix2[5]+matrix1[6]*matrix2[9]+matrix1[7]*matrix2[13];

outputmatrix[6]=matrix1[4]*matrix2[2]+matrix1[5]*matrix2[6]+matrix1[6]*matrix2[10]+matrix1[7]*matrix2[14];

outputmatrix[7]=matrix1[4]*matrix2[3]+matrix1[5]*matrix2[7]+matrix1[6]*matrix2[11]+matrix1[7]*matrix2[15];

outputmatrix[8]=matrix1[8]*matrix2[0]+matrix1[9]*matrix2[4]+matrix1[10]*matrix2[8]+matrix1[11]*matrix2[12];

outputmatrix[9]=matrix1[8]*matrix2[1]+matrix1[9]*matrix2[5]+matrix1[10]*matrix2[9]+matrix1[11]*matrix2[13];

outputmatrix[10]=matrix1[8]*matrix2[2]+matrix1[9]*matrix2[6]+matrix1[10]*matrix2[10]+matrix1[11]*matrix2[14];

outputmatrix[11]=matrix1[8]*matrix2[3]+matrix1[9]*matrix2[7]+matrix1[10]*matrix2[11]+matrix1[11]*matrix2[15];

outputmatrix[12]=matrix1[12]*matrix2[0]+matrix1[13]*matrix2[4]+matrix1[14]*matrix2[8]+matrix1[15]*matrix2[12];

outputmatrix[13]=matrix1[12]*matrix2[1]+matrix1[13]*matrix2[5]+matrix1[14]*matrix2[9]+matrix1[15]*matrix2[13];

outputmatrix[14]=matrix1[12]*matrix2[2]+matrix1[13]*matrix2[6]+matrix1[14]*matrix2[10]+matrix1[15]*matrix2[14];

outputmatrix[15]=matrix1[12]*matrix2[3]+matrix1[13]*matrix2[7]+matrix1[14]*matrix2[11]+matrix1[15]*matrix2[15];

[This message has been edited by knackered (edited 04-26-2002).]

glLoadMatrixf(matrix1);

glMultMatrixf(matrix2);

instead of this:-

MatMult(matrix1, matrix2, matprod);

glLoadMatrixf(matprod);

I get a *significant* performance boost.

(I've shown my mult matrix code below).

Why is this?

The reason I ask is because I gathered from matt and cass that all matrix mults are done on the CPU, and the resultant matrix uploaded to the GPU...so, surely some mad assembly version couldn't be *that* much faster than this simple bit of C ?

outputmatrix[0]=matrix1[0]*matrix2[0]+matrix1[1]*matrix2[4]+matrix1[2]*matrix2[8]+matrix1[3]*matrix2[12];

outputmatrix[1]=matrix1[0]*matrix2[1]+matrix1[1]*matrix2[5]+matrix1[2]*matrix2[9]+matrix1[3]*matrix2[13];

outputmatrix[2]=matrix1[0]*matrix2[2]+matrix1[1]*matrix2[6]+matrix1[2]*matrix2[10]+matrix1[3]*matrix2[14];

outputmatrix[3]=matrix1[0]*matrix2[3]+matrix1[1]*matrix2[7]+matrix1[2]*matrix2[11]+matrix1[3]*matrix2[15];

outputmatrix[4]=matrix1[4]*matrix2[0]+matrix1[5]*matrix2[4]+matrix1[6]*matrix2[8]+matrix1[7]*matrix2[12];

outputmatrix[5]=matrix1[4]*matrix2[1]+matrix1[5]*matrix2[5]+matrix1[6]*matrix2[9]+matrix1[7]*matrix2[13];

outputmatrix[6]=matrix1[4]*matrix2[2]+matrix1[5]*matrix2[6]+matrix1[6]*matrix2[10]+matrix1[7]*matrix2[14];

outputmatrix[7]=matrix1[4]*matrix2[3]+matrix1[5]*matrix2[7]+matrix1[6]*matrix2[11]+matrix1[7]*matrix2[15];

outputmatrix[8]=matrix1[8]*matrix2[0]+matrix1[9]*matrix2[4]+matrix1[10]*matrix2[8]+matrix1[11]*matrix2[12];

outputmatrix[9]=matrix1[8]*matrix2[1]+matrix1[9]*matrix2[5]+matrix1[10]*matrix2[9]+matrix1[11]*matrix2[13];

outputmatrix[10]=matrix1[8]*matrix2[2]+matrix1[9]*matrix2[6]+matrix1[10]*matrix2[10]+matrix1[11]*matrix2[14];

outputmatrix[11]=matrix1[8]*matrix2[3]+matrix1[9]*matrix2[7]+matrix1[10]*matrix2[11]+matrix1[11]*matrix2[15];

outputmatrix[12]=matrix1[12]*matrix2[0]+matrix1[13]*matrix2[4]+matrix1[14]*matrix2[8]+matrix1[15]*matrix2[12];

outputmatrix[13]=matrix1[12]*matrix2[1]+matrix1[13]*matrix2[5]+matrix1[14]*matrix2[9]+matrix1[15]*matrix2[13];

outputmatrix[14]=matrix1[12]*matrix2[2]+matrix1[13]*matrix2[6]+matrix1[14]*matrix2[10]+matrix1[15]*matrix2[14];

outputmatrix[15]=matrix1[12]*matrix2[3]+matrix1[13]*matrix2[7]+matrix1[14]*matrix2[11]+matrix1[15]*matrix2[15];

[This message has been edited by knackered (edited 04-26-2002).]