reference or stack?

ugluk · April 10, 2011, 4:28am

I’m writing some number crunching function for the CPU and I was wonder how the two function below would compare


inline void fa(float& t)
{
  t = t * t * t;
}

inline float fb(float t)
{
  return t * t * t;
}

Both functions do the same, but I wonder which one usually results in faster code (say, when using g++). I’ve read that probably fb, as it takes t from the stack, while fa dereferences a reference. On the other hand the reference from fa might be “compiled away” by the compiler, but the same might be true for fb. What do you think?

EDIT: I should have tested it myself and I did:


#include <cstdio>

inline void fa(float& t)
{
  t = t * t * t;
}

inline float fb(float t)
{
  return t * t * t;
}

int main(int argc, char* argv[])
{
  float a(2);
  fa(a);

  float b(fb(2));

  std::printf("%f %f", a, b);
}


g++ tmp.cpp -g -O3 -march=native -o tmp

Produced this code:


Dump of assembler code for function main(int, char**):
   0x0000000000400650 <+0>:     sub    $0x8,%rsp
   0x0000000000400654 <+4>:     mov    $0x40076c,%esi
   0x0000000000400659 <+9>:     movsd  0x117(%rip),%xmm0        # 0x400778
   0x0000000000400661 <+17>:    mov    $0x1,%edi
   0x0000000000400666 <+22>:    movapd %xmm0,%xmm1
   0x000000000040066a <+26>:    mov    $0x2,%eax
   0x000000000040066f <+31>:    callq  0x400528 <__printf_chk@plt>
   0x0000000000400674 <+36>:    xor    %eax,%eax
   0x0000000000400676 <+38>:    add    $0x8,%rsp
   0x000000000040067a <+42>:    retq

It seems as if the compiler calculated t^3 internally and both functions produce the same code. I consider the syntax of fb clearer though and think that maybe it is therefore better.

I wonder what would happen if we were dealing with a custom floating point type (say a float with an arbitrary precision, like those provided by GMP).

imported_tksuoran · April 11, 2011, 2:38am

You call your function with a constant value. Compiler can determine the result during compile time. You probably should go through some effort to make sure the compiler can not determine the values at compile time.