Performance questions

Hi there,

just a few questions related to GLSL performance on modern graphics cards:

  1. Are branches still evil (no branch prediction)?

  2. Are loops still slow when using a uniform as max iteration value and not a constant (because they are not unrolled)?

  3. Is there a performance difference when doing calculations on variables marked as in/out compared to doing the same calculations on temporary variables? Or is there no technical difference (memory location, write/read access performance)? I would assume the latter.

  4. What about function calls? Will the code in the function just be inlined or is there a real code jump including pushing some variables on a stack etc?

Thanks!

Are branches still evil (no branch prediction)?

Branches weren’t “evil” due to lack of branch prediction. Branches were not “evil” at all.

The performance issue with branches was due primarily to hardware that didn’t actually have branching (ie: all Radeons pre-HDs). So they had to do both halves of the branch.

In any case, the main issues with branches are due to coherency issues. Branching was fine if everything took the same branch. But if you have each shader arbitrarily taking one branch or another, then you can have significantly lowered use of available shader resources.

And it also depends on how much of a branch it is. Is it a ?: kind of thing? Are the two branches just computing the same variable in different ways? Or are they really running different code? Is one of the branches just discarding the fragment?

All of these things can, and still do, have an impact on performance.

Are loops still slow when using a uniform as max iteration value and not a constant (because they are not unrolled)?

They’re certainly not going to be unrolled now. I don’t recall uniform-based loops being slow. They were either “possible” or “not possible”, depending on the hardware.

They’re possible now.

Is there a performance difference when doing calculations on variables marked as in/out compared to doing the same calculations on temporary variables?

Profile it and find out. It will be different for different compilers, hardware, and hardware generations. There’s no set guideline.

Will the code in the function just be inlined or is there a real code jump including pushing some variables on a stack etc?

GLSL does not have a stack (hence no recursion). The compiler may inline functions. It may not. That’s up to the compiler’s internal algorithms.

You can assume that the compiler will do what it feels is best, given what you asked it to do.

I think data access like samplers and accessing interpolated data from vertex and geometry stage are more likely to hurt your shader performance. You can hide a lot of computation behind that.

Thanks a bunch Alfonse! :slight_smile:

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.