It depends on what you mean by “transformed differently”. It could mean that you have the same program logic for every object, e.g. a chaining of multiple matrices where only the matrices themselves change. Or you could have entirely different transformation logic which in addition might be data dependent. The same goes for other properties like colors etc.
For the first case, where the logic is identical, you don’t need branching in any way in the shader code because, well, the logic is identical. Just change the uniform values or better yet, stick them in a uniform buffer object and just have the GL point to a different data store location with glBindBufferRange(). This also works well with instancing, only in that case, you’d draw a number of instances with the same set of uniform before changing the bound buffer range. This is also applicable to other areas such as material definitions or light source properties and so on. I like uniform buffers a lot.
If the logic is supposed to be different, there are actually three ways to change the code path:
[ul]
[li]dynamic branching, what you suggested first but less desirable IMO [/li][li]different shader programs, work well but need to take care on the application level to not have to many program switches (i.e. you need to batch hard) [/li][li]use shader subroutines, effectively switching portions of functionality at run-time depending on what logic is needed for the current object [/li][/ul]
I can’t speak to what works best in your particular case. For a small number of objects it’s irrelevant anyway. If you go to hundreds or thousands of objects, it’s a different story.
Technically, dynamic branching is the least favorable approach because if stuff goes wrong you loose performance due to branch prediction or additional execution of additional branchen you’re not even trying to reach. At least in the fragment shader. I’m not certain how much impact dynamic branching has on current HD7000 or GTX600 series GPUs in worst case scenarios.
With different programs (or separable programs for that matter) you get what you want but have to be careful to not introduce too many shader program switches, i.e. repeated calls to glUseProgram() or glUseProgramStage() because it alters program state and has to make the executables current for their respective stages, which takes some time, and introduces overhead in the application. If you can aggregate multiple objects into a group which uses the same program or pipeline, you can reduce this overhead. Batching is a good idea in general to reduce API overhead. Do it, and do it hard if possible.
To me, switching functionality at dynamically runtime immediately let’s me think of virtual functions or functions pointers in C/C++. Shader subroutines are technically similar to function pointers. You define multiple subroutines which offer different logic and choose the appropriate one at runtime. Subroutines are basically nothing but functions to which a pointer with a specified name points at a given point in time. In the shader, this “pointer” is declared as a uniform which can be queried and altered by the application. The drawbacks are, at least principle, that you have an additional indirection and of course a function call - which probably cannot be inlined by the compiler. Depending on the number of shader invocations I assume it can have a noticable impact. However, I’m not aware of any actual performance comparisons with real-world code that prove or disprove the value of subroutines in a high-performance context. I’m pretty sure, however, there aren’t many real applications out there which actually use subroutines anyway - or most of the GL4 stuff.
In the end, the only thing that’s gonna give you certainty, at least for the platform you’re developing on, is implementing multiple methods and simply profiling the result. Still, as I said, what works well on your platform is not guaranteed to run as fast on other platforms - but the opposite is true as well. That’s the price we pay for doing cross-platform graphics programming.
HTH.