Hello!
I would like to ask somebody to tell me, why is my SSE code for vector and matrix operations slower. I must do something wrong, but dont know what. My vector addition for example:
(im using intrinsics, xmmintirn.h for vc++6 sp5)
data type:
typedef union
{
__m128 data;
float elements[4];
} vector4d;
with SSE (using intrinsics):
inline vector4d add(vector4d a, vector4d b)
{
vector4d c;
c.data = _mm_add_ps(a.data, b.data);
return c;
};
without:
inline vector4d add_sisd(vector4d a, vector4d b)
{
return set(a.elements[0]+b.elements[0],a.elements[1]+b.elements[1],a.elements[2]+b.elements[2],0);
};
(set() simply returns a vector4d with the parameter




