I'm writing some number crunching function for the CPU and I was wonder how the two function below would compare
Code :inline void fa(float& t) { t = t * t * t; } inline float fb(float t) { return t * t * t; }
Both functions do the same, but I wonder which one usually results in faster code (say, when using g++). I've read that probably fb, as it takes t from the stack, while fa dereferences a reference. On the other hand the reference from fa might be "compiled away" by the compiler, but the same might be true for fb. What do you think?
EDIT: I should have tested it myself and I did:
Code :#include <cstdio> inline void fa(float& t) { t = t * t * t; } inline float fb(float t) { return t * t * t; } int main(int argc, char* argv[]) { float a(2); fa(a); float b(fb(2)); std::printf("%f %f", a, b); }
Code :g++ tmp.cpp -g -O3 -march=native -o tmp
Produced this code:
Code :Dump of assembler code for function main(int, char**): 0x0000000000400650 <+0>: sub $0x8,%rsp 0x0000000000400654 <+4>: mov $0x40076c,%esi 0x0000000000400659 <+9>: movsd 0x117(%rip),%xmm0 # 0x400778 0x0000000000400661 <+17>: mov $0x1,%edi 0x0000000000400666 <+22>: movapd %xmm0,%xmm1 0x000000000040066a <+26>: mov $0x2,%eax 0x000000000040066f <+31>: callq 0x400528 <__printf_chk@plt> 0x0000000000400674 <+36>: xor %eax,%eax 0x0000000000400676 <+38>: add $0x8,%rsp 0x000000000040067a <+42>: retq
It seems as if the compiler calculated t^3 internally and both functions produce the same code. I consider the syntax of fb clearer though and think that maybe it is therefore better.
I wonder what would happen if we were dealing with a custom floating point type (say a float with an arbitrary precision, like those provided by GMP).



