Here is a 3DNow! vector normalization:
I believe I simply copied that routine from the 3DNow! SDK. To get maximum performance when working on a bunch of vectors it is much better to use most of the above code inlined and use prefetch (or prefetchw) to fetch the next vector into cache while working with the current vector. Oh and you can lose the pfrsqit1, pfrcpit2, and one pfmul instruction if 15 bit precision is good enough.Code :#include <AMD3D/amd3dx.h> // 3DNow! opcode macros void Normalize3f_3DNow(float *vec) { _asm { femms mov eax, dword ptr [vec] movq mm0, [eax] movq mm3, mm0 pfmul (m0,m0) movd mm1, [eax+8] movq mm4, mm1 pfmul (m1,m1) pfacc (m0,m0) pfadd (m0,m1) pfrsqrt (m1,m0) movq mm2,mm1 pfmul (m2,m2) pfrsqit1 (m2,m0) pfrcpit2 (m2,m1) punpckldq mm2,mm2 pfmul (m3,m2) movq [eax],mm3 pfmul (m4,m2) movd [eax+8],mm4 femms } }
[This message has been edited by DFrey (edited 02-13-2001).]



) state. I understand perfectly why it is on the end, and thought it odd at first when I saw it at the start too. But the white paper on it says the FEMMS instruction is to facilitate "Faster Enter/Exit of MMX or floating-point state".
.
