I’m writing some number crunching function for the CPU and I was wonder how the two function below would compare
inline void fa(float& t)
{
t = t * t * t;
}
inline float fb(float t)
{
return t * t * t;
}
Both functions do the same, but I wonder which one usually results in faster code (say, when using g++). I’ve read that probably fb, as it takes t from the stack, while fa dereferences a reference. On the other hand the reference from fa might be “compiled away” by the compiler, but the same might be true for fb. What do you think?
EDIT: I should have tested it myself and I did:
#include <cstdio>
inline void fa(float& t)
{
t = t * t * t;
}
inline float fb(float t)
{
return t * t * t;
}
int main(int argc, char* argv[])
{
float a(2);
fa(a);
float b(fb(2));
std::printf("%f %f", a, b);
}
g++ tmp.cpp -g -O3 -march=native -o tmp
Produced this code:
Dump of assembler code for function main(int, char**):
0x0000000000400650 <+0>: sub $0x8,%rsp
0x0000000000400654 <+4>: mov $0x40076c,%esi
0x0000000000400659 <+9>: movsd 0x117(%rip),%xmm0 # 0x400778
0x0000000000400661 <+17>: mov $0x1,%edi
0x0000000000400666 <+22>: movapd %xmm0,%xmm1
0x000000000040066a <+26>: mov $0x2,%eax
0x000000000040066f <+31>: callq 0x400528 <__printf_chk@plt>
0x0000000000400674 <+36>: xor %eax,%eax
0x0000000000400676 <+38>: add $0x8,%rsp
0x000000000040067a <+42>: retq
It seems as if the compiler calculated t^3 internally and both functions produce the same code. I consider the syntax of fb clearer though and think that maybe it is therefore better.
I wonder what would happen if we were dealing with a custom floating point type (say a float with an arbitrary precision, like those provided by GMP).