Efficiency of separate shaders

Is it efficient to treat vertex and fragment shaders as separate?

If I have 2 shaders in which the vertex part is the same and the fragment part is different, sharing the vertex shader is better or it doesn’t matter what I do.

Glslang couples vertex and fragment compiled shaders together into a single program object. Technically, at that point, you can use those compiled shaders in other programs, but you can’t quickly mix-and-match them on the fly like you could with ARB_vp/fp. There is a fairly hefty link process involved in creating a useable program.

Originally posted by Korval:
Glslang couples vertex and fragment compiled shaders together into a single program object. Technically, at that point, you can use those compiled shaders in other programs, but you can’t quickly mix-and-match them on the fly like you could with ARB_vp/fp. There is a fairly hefty link process involved in creating a useable program.
This is one of the most pathetic limitation of GLSL IMO. Whats the use if I can’t decouple shaders and mix and match them. Is this the case with Cg and HLSL too?

I would not do it on the fly cause I have no need.

If someone is targeting older hw and needs to update the shader because a new light source popped into the region, then I guess he is screwed.

I was wondering if sharing shaders has advantages like not wasting so much resources, faster switches between program X and program Y and …

If the hw we have now requires updating both then it wouldn’t make sense for me to change my code to add the sharing flexibility.

This is one of the most pathetic limitation of GLSL IMO. Whats the use if I can’t decouple shaders and mix and match them. Is this the case with Cg and HLSL too?
You can mix (share) your shaders but if you are using varying variables for example, both shaders must match. That’s another issue.

HLSL and Cg are based on the assembly shaders, so they can mix and match and all that.

Yeah, at least in HLSL you use semantics to identify which input/output register variables are linked to in pixel/vertex shaders, so there is no required link process.

Thus you can share shaders as long as the output of the vertex shader corresponds to the input of the pixel shader.

Originally posted by Fastian:
This is one of the most pathetic limitation of GLSL IMO. Whats the use if I can’t decouple shaders and mix and match them. Is this the case with Cg and HLSL too?
In 99% of the cases, a vertex shader and a fragment shader goes together. So I think this makes a lot of sense. Since there’s nothing preventing you from linking the same shader into many program objects, there’s no real loss of functionality either. The only thing is that the storage space needed on the driver side may be larger, but that’s hardly ever going to be a problem.

But there are plenty of things on the plus side. The most important thing is that it helps a lot in debugging. If the shaders doesn’t match, you’ll know it. The driver can also optimize away vertex shader outputs that are never read in the fragment shader, plus give you a warning to hint you about it, and give errors when a fragment shader reads from a vertex shader output that’s never written. Also, the driver gets additional flexibility in selecting which interpolators to write to, and it can freely pack outputs into interpolators anyway it wants. A vec3 and a float can go together, without you explicitely declaring it as vec4, avoiding ugly code like,

varying vec4 lightVector_plus_scale;

as opposed to,

varying vec3 lightVector;
varying float scale;

Also, you don’t need as much run-time verification. The driver knows that linked shaders will match.

In 99% of the cases, a vertex shader and a fragment shader goes together. So I think this makes a lot of sense.
Then you’re wrong.

Since there’s nothing preventing you from linking the same shader into many program objects, there’s no real loss of functionality either. The only thing is that the storage space needed on the driver side may be larger, but that’s hardly ever going to be a problem.

Really? Maybe for demos it won’t be a problem, or games that use a single “world-shader”, but for games that use lots of different shaders, and different permutations of shaders, this can be significant.

While it’s easy enough to say that start-up time is irrelevant, in real applications, there are limits. Are you willing to wait 40 seconds for a level to load just because the game has to link dozens of possible shader combinations? And what about the code bloat to manage all these programs? With ARB_vp/fp, I could just have compiled vertex and fragment programs, and put them together whenever I wished. I could pass around two lightweight objects and slap them together at runtime. Now, I have to have some manager that does a lookup into a remapping table that maps from vertex/fragment shaders to linked programs.

It doesn’t take very many shaders before the combinitorical explosion of shaders becomes pretty large. 10 vertex shaders and 5 fragment shaders makes 50 total linked programs that I need. Linking isn’t a fast process, because I might be linking these shaders together from multiple vertex and fragment shaders (more than one of each type), so it has to do significant work like the linker of a regular program. Each time I add a new fragment shader, that’s 10 more linkages.

Shader LOD’s just became that much more painful to implement.

The linking stuff itself, being able to link multiple compiled shader objects into a single program, is nice. But being forced to link a vertex and fragment program together is very much not. And, if a 3rd type of program opens up, like primitive programming, then we will have to link those together too.

. If the shaders doesn’t match, you’ll know it.
If I want debugging help, I’ll write/use a real shader debugger. Otherwise, I don’t want them sucking my performance down a hole needlessly. But, that’s 3DLabs for you.

The driver can also optimize away vertex shader outputs that are never read in the fragment shader, plus give you a warning to hint you about it
I can do that too. I’m a grown programmer.

and give errors when a fragment shader reads from a vertex shader output that’s never written.
Same as for the other one.

Also, the driver gets additional flexibility in selecting which interpolators to write to, and it can freely pack outputs into interpolators anyway it wants. A vec3 and a float can go together, without you explicitely declaring it as vec4, avoiding ugly code like,
Performance > cleanliness of code.

Also, you don’t need as much run-time verification.
You don’t need any runtime verification. Any invalid shader settings will create undefined results.

Originally posted by Korval:
Then you’re wrong.
Well so enlighten me about how I and the ARB are so non-typical. I have yet to see a situation where the shaders didn’t go together one-to-one. Even back when I did ARB_vp/ARB_fp, it never happened even once. The few cases where I have read any code that shared a shader, it has typically been bad coding style, like reusing a vertex shader for many fragment shaders, and for some shaders not even reading everything that the vertex shader outputs.

Originally posted by Korval:
Now, I have to have some manager that does a lookup into a remapping table that maps from vertex/fragment shaders to linked programs.
Or you can just do it right from the beginning, coding for the target shader paradigm.

Originally posted by Korval:
It doesn’t take very many shaders before the combinitorical explosion of shaders becomes pretty large. 10 vertex shaders and 5 fragment shaders makes 50 total linked programs that I need. Linking isn’t a fast process, because I might be linking these shaders together from multiple vertex and fragment shaders (more than one of each type), so it has to do significant work like the linker of a regular program. Each time I add a new fragment shader, that’s 10 more linkages.
LOL. I’m sure ALL your vertex shaders are used with ALL your fragment shaders. :eek:
If all your shaders even match, then you got some serious problem in your shader writing style.

Originally posted by Korval:
[b]If I want debugging help, I’ll write/use a real shader debugger. Otherwise, I don’t want them sucking my performance down a hole needlessly. But, that’s 3DLabs for you.

I can do that too. I’m a grown programmer.[/b]
Well I assume then since you’re such a grown up programmer that you compile everything to release and never use debug. You probably never left an unused variable ever in your code either. And never mismatched datatypes or data sizes.

The thing is, this is not only a debug help, it’s also a performance booster. When you get the full context, you have much better abilities to optimize.

Originally posted by Korval:
Performance > cleanliness of code.
The point is that you get both.

I have yet to see a situation where the shaders didn’t go together one-to-one.
That’s because you write demos, not games. You don’t have to care about efficiency or other such things; your job is to create a visual effect.

But, if you want an example, I can give you a theoretical one off the top of my head: low-LOD shaders.

At low LOD’s, you’re not going to do bumpmapping. Your per-fragment operation looks very much like what a GeForce 1 or 2 can do: a small set of basic color combining operations and texture accesses.

At low LOD’s, we only have one texture. So that’s not a problem. However, we’d still like to maintain some semblence of high-quality even at low LOD, so we do more work per-vertex.

Let’s say that I have a variety of effects to apply to low LOD shaders. Wood, metal, skin, cloth, and hair. That’s 5 vertex shaders.

But it’s only 2 fragment shaders. The outputs of the vertex programs are really simple; basically, what standard OpenGL 1.2 would use. A primary color, a secondary color, and a texture coordinate. One fragment shader doesn’t use the specular color, the other does.

Well, specular may sometimes be important, and sometimes, it may not. I may decide that on some metals, the specular is too small to matter. The same may go for any of the others.

With ARB_vp/fp, I need 7 programs. With glslang, I need 10. I add another vertex shader, I need 2 more programs. Do you see where this is going?

Indeed, something similar happens at the high-end for shaders, only backwards. High-quality rendering tends to moves most of the logic into the fragments shader, rendering per-vertex operations to be little more than passing data along and doing transformations: GL 1.2-style stuff, with programmable position transforms (re: skinning). So, you have a few basic vertex programs, for the various number of texture coordinate sets you might use, but any symmantic meaning is only given to them by which fragment program you use.

Vertex and fragment programs are as tightly coupled as the user wants them to be; it has nothing to do with any specific nature of them.

Or you can just do it right from the beginning, coding for the target shader paradigm.
Please get out of the Ivory Tower. Shader code was written before the advent of glslang, and it is perfectly reasonable to now want the features of glslang, but without the massive engine re-write.

Well I assume then since you’re such a grown up programmer that you compile everything to release and never use debug. You probably never left an unused variable ever in your code either. And never mismatched datatypes or data sizes.
This isn’t a generic question on debugging; this is a specific case where it is pretty easy to prevent such things from happening, and easy to catch them when they do without API assistance.

The thing is, this is not only a debug help, it’s also a performance booster. When you get the full context, you have much better abilities to optimize.
A theoretical performance boost. There are no practical situations where this is valuable on modern hardware, with the exception of culling vertex shader code that generates a parameter that the fragment shader doesn’t use. But, really, if I do that, I have decided that I’ll pay the performance cost per-vertex and let it be a constant performance penalty than pay it as a sudden lurch in the framerate when I link a shader at runtime, or as a 40 second “load” time as I go about linking hundreds of program combinations.

Originally posted by Korval:
The outputs of the vertex programs are really simple; basically, what standard OpenGL 1.2 would use.
And therein lies the problem as I see it. Fine, in some limited simple cases sharing shaders makes sense. In the general case though, shaders doesn’t crossbreed very well.

Originally posted by Korval:
Indeed, something similar happens at the high-end for shaders, only backwards. High-quality rendering tends to moves most of the logic into the fragments shader, rendering per-vertex operations to be little more than passing data along and doing transformations: GL 1.2-style stuff, with programmable position transforms (re: skinning). So, you have a few basic vertex programs, for the various number of texture coordinate sets you might use, but any symmantic meaning is only given to them by which fragment program you use.
Fine. But with both your examples, you’re still not close to the massive permutation explosion you’re talking about. In fact, there are two situations that are likely, those you mentioned:

Fragment shader A,B,C,D used together with vertex shader X.
Or vertex shader A,B,C,D used together with fragment shader X.

Both these cases generate at worst twice the amount of shaders. Even if things contrary to expectations hits the worst case it’s still well worth it for the benefits it brings IMHO.

And still, I consider this to be “today’s problem” and nothing that we’ll see down the road. The more advanced the shaders get, the less likely are you to be able to crossbreed shaders in any useful way.

Oh, and when did I hear the “permutation explosion” argument last? Yup, when programmable shaders was introduced. Wanna go back to fixed function / register combiner configure model?

Originally posted by Korval:
Please get out of the Ivory Tower. Shader code was written before the advent of glslang, and it is perfectly reasonable to now want the features of glslang, but without the massive engine re-write.
New features sometimes force engine rewrites. Nothing new under the sun. Doesn’t mean it’s a fundamentally flawed design. It just means that existing engines never was written against the new spec. Doesn’t mean the old engine was badly designed, or the new spec is badly designed.

This is not the first paradigm-shift when it comes to shaders. Engines designed around GL_NV_register_combiners didn’t neccesarily have an easy time going over to the program object based model in GL_ATI_fragment_shader and GL_ARB_fragment_program. Doesn’t mean the program based model didn’t make sense, or was a bad design decision. Quite the contrary. I can imagine quite a few developers complaining at first since they can no longer just toggle a single operation or input in the middle of a register combiner setup, and instead have to upload a totally different program object. But today I’d guess the people preferring the old register combiner model is a dying breed.

I expect the same thing to happen with GLSL shaders. Both in the lost flexibility of going high level, and in the lost flexibility in going to shader linking. But down the road people will likely appreciate the benefits of this model more than any temporary gains in current engines by using the old paradigm.

Originally posted by Korval:
This isn’t a generic question on debugging; this is a specific case where it is pretty easy to prevent such things from happening, and easy to catch them when they do without API assistance.
Not in my experience. And I don’t consider myself sloppy. I’m happy of all aid I get from the compiler. In fact, after playing quite a lot with HLSL in DirectX, I really appreciate any help I can get. Hunting errors because of mismatching types or semantics can easily consume hours.

Originally posted by Korval:
But, really, if I do that, I have decided that I’ll pay the performance cost per-vertex and let it be a constant performance penalty than pay it as a sudden lurch in the framerate when I link a shader at runtime, or as a 40 second “load” time as I go about linking hundreds of program combinations.
Linking things in your rendering loop is of course a stupid thing to do. If ever the load time becomes a problem that’s easily solved with an immediate language extension where you can just read back a compiled shader in a binary format and just cache to disc. Should solve any load time problems.

Fine, in some limited simple cases sharing shaders makes sense. In the general case though, shaders doesn’t crossbreed very well.
Who’s “general case”? People developing shaders that can only be run on NV40’s? The “general case” for people developing shaders for common systems will be running low LOD shaders most of the time, so that they can afford the high LOD ones. If I’ve got a crowd of 50 people, I’m not running a 40-instruction fragment program on each one.

Once again, we’re talking about performance applications, not demos.

But with both your examples, you’re still not close to the massive permutation explosion you’re talking about.
That all depends. While the number of incoming texture coordinates will definately force a requirement on which fragment shaders one can use with which vertex shaders, a significant difference in the transformation of positions can cause a jump in the number of linked programs.

Let’s say that you come up with a new method of skinning, that uses splines instead of bones or something. Or you want to apply some kind of mesh deformation technology instead of the regular skinning code. In either case, you’re only doing it for the highest LOD’s.

If you had 6 separate fragment shaders for a skinned mesh (different kinds of clothing/skin. We’re talking about high-quality here. Technically, 6 is a bit low for high-quality character rendering), then adding either of the two methods will double this in the number of programs.

It all depends on how funky your position/normal transformations are.

But today I’d guess the people preferring the old register combiner model is a dying breed.
That’s the thing. I wanted an object-based model from day 1 of NV_RC. I was very happy when the object based extensions came out.

Also, the analogy itself is invalid.

Anyone who thought about programmable shaders for any real length of time can see the inhierent limitations of the register_combiner method, both in terms of a lack of object facilities and a lack of flexibility for the driver/compiler. The minor “nicety” of being able to dramatically change a shader by altering a single register combiner pales in comparison to the burden of a driver/compiler on more advanced underlying hardware.

Nothing similar exists here. No driver developers are complaining about not being able to implement more advanced shaders because of a lack of vertex/fragment program linking. There’s some minor ability to optimize, but it rarely comes up, and is usually due to dubious practices in writing your shader. The best argument for vertex/fragment linking is for debugging.

Note that if another programmable realm opens up, like a programmable texture unit or a programmable primative processor, the number of valid permutations increases, not decreases. So this “feature” isn’t nearly as future-proof as you make it out to be.

The only way that such linking is useful in the future is if all programs are literally sharing the same functional units. This is not something I would suggest for any kind of graphics hardware, as it kind of defeats the purpose of dedicated graphics hardware; such a chip is just a bunch of small CPU’s on a PCIe slot.

If ever the load time becomes a problem that’s easily solved with an immediate language extension where you can just read back a compiled shader in a binary format and just cache to disc.
Consider this: if a system is becoming so onerous that you have to violate one of its founding principles (that the user doesn’t need access to the compiled code) in order to allow it to continue to function, perhaps there’s something fundamentally wrong with the system. Basically, if it comes to pass that such an ARB extension ever exists, glslang will have failed in its design. ARB_vp/fp would never need such a system; only glslang would.

It’s like patching the holes in a rusty ship; it doesn’t mean the ship isn’t rusty.

Originally posted by Korval:
[b]Who’s “general case”? People developing shaders that can only be run on NV40’s? The “general case” for people developing shaders for common systems will be running low LOD shaders most of the time, so that they can afford the high LOD ones. If I’ve got a crowd of 50 people, I’m not running a 40-instruction fragment program on each one.

Once again, we’re talking about performance applications, not demos.[/b]
No, we’re talking about the future. GLSL isn’t about low LOD shaders on today’s or even last generation hardware. If you’re not willing to run 40 instruction fragment shaders, you may just as well keep to ARB_fp. The benefits of high level shaders won’t do much for you anyway.

Originally posted by Korval:
That’s the thing. I wanted an object-based model from day 1 of NV_RC. I was very happy when the object based extensions came out.
That’s you. How about everyone else? There were resistance from the fixed function people, including myself to some extent until I learned the benefits. There were resistance against using text interface too, we remember the ugly GL_EXT_vertex_shader that just plainly sucked. Today text interfact is the only way to go. I’m not surprised there is resistance against program linking today, but I expect it to go away when people see the benefits.

Originally posted by Korval:
[b]Also, the analogy itself is invalid.

Anyone who thought about programmable shaders for any real length of time can see the inhierent limitations of the register_combiner method, both in terms of a lack of object facilities and a lack of flexibility for the driver/compiler. The minor “nicety” of being able to dramatically change a shader by altering a single register combiner pales in comparison to the burden of a driver/compiler on more advanced underlying hardware.[/b]
No it’s a perfectly valid analogy. And the minor nicety of mixing shaders is comparable to the minor nicety of altering opcodes in the middle of a shader.

Originally posted by Korval:
Nothing similar exists here. No driver developers are complaining about not being able to implement more advanced shaders because of a lack of vertex/fragment program linking.
Just like noone complained about how the register combiner model couldn’t be used for more advanced shader. Because it could. The program object model is rather more limited, but so much more convenient. Same with shader linking.

Originally posted by Korval:
Note that if another programmable realm opens up, like a programmable texture unit or a programmable primative processor, the number of valid permutations increases, not decreases. So this “feature” isn’t nearly as future-proof as you make it out to be.
If it makes sense, it could be linked too. Or if it doesn’t, it doesn’t have to. So it’s a non-argument.

Originally posted by Korval:
[b]Consider this: if a system is becoming so onerous that you have to violate one of its founding principles (that the user doesn’t need access to the compiled code) in order to allow it to continue to function, perhaps there’s something fundamentally wrong with the system. Basically, if it comes to pass that such an ARB extension ever exists, glslang will have failed in its design. ARB_vp/fp would never need such a system; only glslang would.

It’s like patching the holes in a rusty ship; it doesn’t mean the ship isn’t rusty.[/b]
I don’t expect this to ever happen for any normal applications. And if it would happen, it’s not going to be because of linking, but rather because that compile times of high level shaders are higher than low-level. If you’re ever in the situation where compile times is a major bottleneck, you should ask yourself if you’re really doing things in the best way.

Further, since when is caching not a valid way of doing things? This is analoguous to be able to upload precompressed textures rather than letting the driver compress it.

No, we’re talking about the future. GLSL isn’t about low LOD shaders on today’s or even last generation hardware.
Then glslang is useless, and should be shelved until it becomes useful. Either the tool can be used on problems, or it can’t. A tool for a problem that doesn’t exist yet isn’t a tool; it’s a paperweight.

And the minor nicety of mixing shaders is comparable to the minor nicety of altering opcodes in the middle of a shader.
The big problem with the analogy is that, if we stayed with NV_RC-style shaders, we would lose all kinds of both shader functionality and performance. If we don’t allow program linking, we lose a slight bit of debuggability. There no corresponding significant loss by not allowing linking.

And here’s the thing: the spec didn’t have to force linking on us. It could have made vertex/fragment program linking purely optional, by allowing a vertex-only program to be resident with a fragment-only program.

Further, since when is caching not a valid way of doing things? This is analoguous to be able to upload precompressed textures rather than letting the driver compress it.
This analogy doesn’t work either. The precompressed format is specified explicitly, and will not change. The format of a compiled shader, however, is implementation specific. Basically, you ask the driver for a block of bits, and you have to reupload those bits exactly as they were saved.

If the driver changes significantly, or if the user installs a new, more powerful, card behind the driver’s back, then there is a problem. Either the spec allows for this kind of program loading to fail, or driver developers have to write porting code to port written programs from older driver code/hardware to newer code/hardware. As if writing an OpenGL driver weren’t hard enough, now they have to support legacy shaders from virtually any driver revision.

This analogy doesn’t work either. The precompressed format is specified explicitly, and will not change. The format of a compiled shader, however, is implementation specific. Basically, you ask the driver for a block of bits, and you have to reupload those bits exactly as they were saved.
A glGet* function was on the table, and from what I remember, it was written that it would be a possible good idea to offer it.

What is required to make this materialize is to have a virtual machine-code version. The driver would process it during upload and I’m sure it would be 100 times more efficient than compiling text based shaders.

If this feature was made available, I would not object.

Originally posted by Korval:
Then glslang is useless, and should be shelved until it becomes useful. Either the tool can be used on problems, or it can’t. A tool for a problem that doesn’t exist yet isn’t a tool; it’s a paperweight.
Then I guess OpenGL was useless too when it was introduced. 95% of the features were never supported by the hardware anyway.

Aiming for the future is a valid way to do things.

Originally posted by Korval:
The big problem with the analogy is that, if we stayed with NV_RC-style shaders, we would lose all kinds of both shader functionality and performance.
Such as …?

Originally posted by Korval:
[b]This analogy doesn’t work either. The precompressed format is specified explicitly, and will not change. The format of a compiled shader, however, is implementation specific. Basically, you ask the driver for a block of bits, and you have to reupload those bits exactly as they were saved.

If the driver changes significantly, or if the user installs a new, more powerful, card behind the driver’s back, then there is a problem. Either the spec allows for this kind of program loading to fail, or driver developers have to write porting code to port written programs from older driver code/hardware to newer code/hardware. As if writing an OpenGL driver weren’t hard enough, now they have to support legacy shaders from virtually any driver revision.[/b]
First of all, you can use a generic compressed format, then query what format the driver chose and read back the compressed texture and store to disk. This without knowing anything about any specific format. So it’s perfectly analogous.

With a shader you could do the same thing. Just read back a bunch of bits and bytes, not knowing what they represent for the driver. Then reuse it at a later time. Of course the driver should be able to fail if it doesn’t support that particular format anymore, just like it could do on compressed textures.

Then I guess OpenGL was useless too when it was introduced. 95% of the features were never supported by the hardware anyway.
And those “features” were dead-weight. Did the accumulation buffer support matter at all before there was hardware support? No; it went unused because everyone knew not to rely on it.

Aiming for the future is a valid way to do things.
Sure. If the future turns out how you think it will. Otherwise, you get modern OpenGL: 4 API’s for everything, but only 1 of them is what you should use.

Such as …?
After thinking about it, it doesn’t make things functionally impossible, but it makes using the system, from both the user-side and the driver-side, harder. Sure, you can screw in a screw with your fingernail too, but why would you want to? It is very clearly not the correct API for anything fairly complicated; that’s why nVidia created NV_Parse.

Of course the driver should be able to fail if it doesn’t support that particular format anymore, just like it could do on compressed textures.
Oh, yeah. As a developer, I love to hear things like, “Oh, this may not work if we decide to do something in our drivers.” :rolleyes:

What something like that means is that you either never use the feature (at which point, it may as well not exist), or you have a very specific understanding that the formats in question will not change often. S3TC isn’t going anywhere, and everybody knows it.

Isn’t OpenGL 2.0 Pure somewhat the answer to your problems ?
Having a major rewrite of the API for more orthogonality and simplicity, as well as being future proof.

I’m not yet believing a statement along the lines that OpenGL 2.0 specification requires a significantly heavier-weight link step than the alternatives do.

Some things to note:

  • If you have multiple compilation units on the same side, e.g. two fragment shaders that have to be linked together, then there is a valid use of a heavy-weight link process to do global optimization between these modules. This time may be well spent if the resulting shader runs faster. It is, of course, an implementation choice to do this.

  • An implementation may have reasons to do such a heavy-weight step at link time, even with a single shader per side, but those reasons may also apply to non-OpenGL 2.0 shaders too.

  • The above heavy-weight steps can be cached if these same shaders are combined with another vertex shader in another program. For example, global optimization applies within a side (fragment or vertex) not across. The time does not have to be spent again, just because of mixing. It’s possible that current immature implementations don’t do this yet.

  • Perhaps an implementation must do state specific or implementation specific modifications to a shader at link time or use time, and these need the heavy-weight involvement of compilation or optimization. If so, I’m not sure why they would be language specific, the alternatives may need them as well.

I think the specification does allow combining 10 fragment shaders with 5 vertex shaders, without the memory footprint of 50 pairs of executables. This depends on your hardware and the trade-offs being made by your driver/compiler implementation. Fundamentally, I believe this, because the expensive content of, say, a fragment shader binary is basically static WRT to what vertex shader it is linked against.

JohnK

If you have multiple compilation units on the same side, e.g. two fragment shaders that have to be linked together, then there is a valid use of a heavy-weight link process to do global optimization between these modules. This time may be well spent if the resulting shader runs faster. It is, of course, an implementation choice to do this.
Nobody is arguing against linking shaders of the same type together. It’s the forced linking of a vertex shader to a fragment shader that causes the problem.

I think the specification does allow combining 10 fragment shaders with 5 vertex shaders, without the memory footprint of 50 pairs of executables.
If the driver can treat them separately, then why can’t the user?

Originally posted by Korval:
Nobody is arguing against linking shaders of the same type together. It’s the forced linking of a vertex shader to a fragment shader that causes the problem.
Good, I was just exploring “heavy-weight” linking issues, which was the perceived problem I was responding to.

Yes, there is a required link step, which serves several purposes, but it need not be heavy-weight.

If the driver can treat them separately, then why can’t the user?
From an informational/validation/state perspective, they have to be looked at together by a link step. The spec. allows this to be light-weight. From a heavy-weight perspective (compilation/optimization/memory-footprint) they can be treated separately, and the link step can stay light-weight.

JohnK