- Register pressure. GPUs usually allocate registers statically, so the number of registers reserved per shader invocation depends on the worst case path through the shader.
Well, that seems like a quality-of-implementation issue. After all, it’s not like the compiler can’t see a uniform branch in the code; it’s right there. So the compiler ought to be perfectly capable of realizing that if one branch is taken, the other will not be, for any instantiation in the rendering command. And that information ought to be factored into register assignment. Obviously registers are statically assigned, but there are ways to use the same registers in different, mutually exclusive, branches.
And with more developers using ubershaders, there is every reason for IHVs to take that information into account.
- Unused in/out variables that the linker can’t optimise away, potentially increasing bandwidth (internal and/or external, depending on the GPU architecture and shader stage) and cache requirements.
Errr… I’d want to see some evidence for that.
Remember: what defines the logic for what gets pulled from buffers is VAO state, not shader state. Yes, even on AMD hardware where vertex pulling happens via shader logic. What they have to do is modify the shader in-situ by adding some prefix code to handle vertex pulling logic. But that shader prefix is defined by the VAO state (since it has to respect the formatting). So, if an input isn’t being fed by the VAO, then there’s no reason for the vertex pulling logic to pull it.
- The effect of statically using some shader features on the rest of the pipeline, such as clip distances, discard, early fragment tests, or writing gl_FragDepth.
The goal of ubershaders is not to reduce the number of shaders to 1. It’s to reduce it to a fixed, preferably small, number of shaders, so as to minimize shader construction and state changes. You want to render lots of objects with an ubershader, but that doesn’t mean you don’t have specific ubershader variants.
So an engine might have 4 actual variations of ubershaders that can handle different kinds of things that take up resources. Clip distances and depth writing would be such variants, as only very specialized objects generally need such features. These are generally defined by the nature of the object itself.
discard
is the one that is most like to vary based on arbitrary elements of the object’s data, rather than being intrinsic to the object itself. You’re more likely to want to use discard
for things like alpha-testing and the like, which is based on on properties in the texture, not the object.