Mix and match shaders ala GL_EXT_separate_shader_.

GL 3.2 came out yesterday, and also in the past few days the extensions

GL_EXT_separate_shader_objects

which is a way to mix and match shaders, however, it has a few issues:

  1. uses gl_TexCoord[], this is mostly cosmetic issue but force one to use compatible contexts.
  2. issues on setting the interpolation type, i.e. flat, centroid, etc.
  3. issues for fragment shader with respect to multiple render targets (see issue 16 of the spec of GL_EXT_separate_shader_objects).

But it is really, really close to being what I want, below is my suggestion for a slightly tweaked version of it which I hope addresses its shortcomings and for the driver writes not so nasty to implement:

In GLSL introduce a way to specify what resource an in or out variable of a shader is to be bound to (i.e. linker no longer decides), lets be lazy and use layout (maybe pragma would be better or a new keyword?)


layout(location, N) in vec4 myinput;

would declare that the input variable myinput is using the N’th location, so in a vertex shader that specifies the index of the attribute, in a fragment shader that specifies the index of the interpolator (essentially gl_TexCoord[N]), and


layout(location, N) out vec4 myoutput;

would be similar, i.e. for vertex shader which interpolator (i.e. essentially gl_TexCoord[N]), for fragment shader which buffer to draw to, etc.

By doing it this way, atleast from a syntax point of view, it is easy to add the interpolation qualifiers for fragment inputs:


layout(location,N) in vec4 flat flatFragInput;

and different types:


layout(location,N) out ivec4 fragIntOutput;

Argh, posted a reply in the wrong thread…

After looking into using the SSO extension I have to agree with kRogue, something needs to be done to remove the restrictions this extension has.

Adding the location value would help, but you would still need some sort of validation stage, to ensure the the vertex and fragment shader you’ve chosen match (i.e. the vertex shader writes to everything the fragment shader is expecting). Whether or not this is done by the location semantic above, or by name doesn’t really matter.

Prehaps a new method is needed:

UseShaderProgramvEXT(uint count, enum *types, uint *programs);

This allows you to set all the shaders in one go, and the GL can check to make sure they are compatible. The GL should be clever enough to know if one of the shaders isn’t being changed (ie, rebinding the current vs with a different ps), and it’s not too much work for the application to keep track of.

Other than that, the only other idea I can think of is to have some new object representing the interface the vs or ps exports/imports and check the shaders upfront. E.g.

interface1 = getInterfaceFromProgram(vs);
checkCompatible(ps, interface1);

I haven’t thought this idea through as much, although I prefer the idea of an upfront cost rather than checking it every time you bind the shaders.

Regards
elFarto

Because this would incur unnecessary overhead, it should only be performed in a “debug” context, which OpenGL does not have yet. However, it’s a neat idea and a motivation to introduce such a debug context, having properly defined what must be checked.

Maybe the issue of validating interfaces could be solved with something based on c++ inheritance.

Let’s say we have an “interface inheritance diagram” like so :

I_Top <= I_Middle <= I_Bottom

I_Top has [varying vec4, varying float]
I_Middle has I_Top + [varying vec3]
I_Bottom has I_Middle + [varying int]

Let’s assume we only have vertex and pixel shaders to simplify the example.

The idea is that the input interface of a PS would have to be an interface class which is the same as the output interface of the VS or a base class of it. For example, if the VS output interface is I_Middle, the PS input interface could be either I_Middle or I_Top.

This ensures that the inputs of the PS are <= than the ouputs of the VS. The VS can output stuff that isn’t consumed by the PS, but the PS cannot try to read input which isn’t produced by the VS.

This is quite different from anything I have yet seen in OpenGL but it’s food for thought.

Another easy (easier) way is to obviously do as suggested before and have fixed (numbered?) binding points for the shader inputs and outputs.

Another easy (easier) way is to obviously do as suggested before and have fixed (numbered?) binding points for the shader inputs and outputs.

One of the points of GL_EXT_separate_shader_objects was the “rendezvous by resource” rather than by name, in all honesty, the easiest thing to do would be this:

expanding on the layout() idea I posted above, for a vertex shader one can query what locations are written to (and what types!) after the compile stage, for geometry shaders, both in’s and out’s could be queried, and in’s and out’s for fragment shaders (the out’s for determining what buffers are reasonable to write to).

This way, the application code can detect if they match, but following this up, one sees that the data for what the shaders take as input and output is then obvious, and the driver could do the check as well, with almost zero performance issues.

Why the “rendezvous by API resource” is bad:

1: You can’t use it in core GL 3.1 or greater. NVIDIA seems has this belief that core GL doesn’t actually exist, but it doesn’t exist until they start making it exist. And this means not pushing for things that don’t work with the core.

2: The whole point of a hardware abstraction is to abstract the hardware. If I have to call a vertex shader output a “texture coordinate” when I have no intention of using it as such, that’s not a hardware abstraction. That’s a blatant hack.

Why “rendezvous by name” does not have to be problematic to performance:

1: The implicit link step consists of a series of string comparisons. It is very, very easy to turn this into a single string comparison by ordering all of the outputs/inputs arbitrarily, then concatenating their strings. If the strings do not match exactly, then raise an error. All the spec needs to say is that the inputs and outputs have to match exactly. A single string comparison per draw call is not an onerous burden to bare.

2: Assuming that #1 is somehow performance-critical, it can be easily encapsulated into a new object type. A Program Set Object, if you will. Pre-rendering, you bind your set of programs to it and build the object. Like FBOs, you call a function to check “completeness”. Once it is “complete”, you can render with it.

Even NVIDIA is not suggesting that the “mini-link” time of attaching these programs together is anything remotely like the full-link time of a regular Program Object. The only drawback of method 2 is creating these (pure-client) objects ahead of time.

The thing I find disconcerting about this extension is how little effort NVIDIA seems to have put into it to make it work with GL 3.0. They don’t know how to make it work with generic inputs/outputs, so they just punt on the issue entirely.

Is this how NVIDIA plans to proceed with OpenGL support? Making extensions that work against whatever APIs they don’t happen to like? And daring to call them “EXT” extensions at that.

Perhaps this is why the extension is not in the OpenGL registry, but is instead on NVIDIA’s website.

I hope this extension works something like the old EXT_render_texture extension that NVIDIA proposed when the ARB logjammed on render-to-texture functionality. That extension never made it past experimental, but its concepts were folded into FBO less than 9 months later. I hope that this is a starting point for the ARB and not an ending point.

Ouch.

I was having an e-mail exchange with one at nVidia, he had a much better and simpler suggestion than my layout above, just use the block deal that is in GL 3.2, so that handles the in/out between shader stages and it turns out one can bind an output fragment destination with GL_EXT_separate_shader_objects, just that is requires some care.

However, I disagree with you on “rendezvous by API resource” is bad:

1: You can’t use it in core GL 3.1 or greater. NVIDIA seems has this belief that core GL doesn’t actually exist, but it doesn’t exist until they start making it exist. And this means not pushing for things that don’t work with the core.

nVidia is made of lots of people, so saying that nVidia seems to believe something is kind of fishy… for what it is worth, the fellow I had the e-mail exchange with did not seem particularly happy with how the extension turned out either.

2: The whole point of a hardware abstraction is to abstract the hardware. If I have to call a vertex shader output a “texture coordinate” when I have no intention of using it as such, that’s not a hardware abstraction. That’s a blatant hack.

I totally agree that forcing to use gl_TexCoord between shaders is a BAD thing, but rendezvous by resource can take several forms:

  1. use shader blocks, where the order of the elements in the block determines how the GLSL compiler packs it, i.e. analogous to declaring a structure in C.
  2. the layout() deal I suggested above, is to have it so that the application writer has control how the shaders are linked together which is necessary to allow for shaders to be interchangeable. Rendezvous by name, but non within a dedicated type, has to me, a dangling feeling, also depending on the how severely the shaders are intermixed naming conventions are not a good thing, for example if you want to insert a geometry shader between a vertex and fragment shader, having just in/out is going to kill you, but having like this:

struct fromVertexShader
{
stuff;
};

and vertex shader:

out struct fromVertexShader outStuff;

and geometry shader:
in struct fromVertexShader inRawData;
out struct fromVertexShader outData;

and fragment shader:
in struct fromVertexShader inputData;

(this is not the block as found in GLSL 1.5 though) lets on freely insert stuff without worrying about if he names line up correctly. It also has an eye for the future if there are additional stages beyond what we have now: vertex-geometry-fragment.

  1. another method the nVidia fellow suggested, ala Cg:
    out type someinterpolator:ATTR0;

The issue of rendezvous by name being a performance killer I actually doubt, after all how had is it to make std::map<std::string, int>? and the check is not per draw call, is is per changing shader (which is less!) The issue to me is clarity and the ability to closely specify how shaders link between, we can control which attribute slots inputs of a vertex shaders are bound to and we can control which channel a fragment shader’s output writes to, so the in between stage we should be able to control too, and that is rendezvous by resource, just as binding a particular vertex input to a particular attribute and binding a particular frag out to a particular channel.

As for the spec not being GL 3.x non-compatible compliant, yes that smells bad; my thought is that they did not want to add to the GLSL language, so they punted and did gl_TexCoord, also by doing that it is able to be used in GL 2.x, which still is a big deal for lots of folks…compromise.

Also, considering that on the desktop, nVidia has by far the best GL implementation, I tend to think that nVidia really, really likes GL and if you look at the credits for the specification, you can clearly see that nVidia has a big hand in making them.

  1. use shader blocks, where the order of the elements in the block determines how the GLSL compiler packs it, i.e. analogous to declaring a structure in C.

Forcing a specific order on the user doesn’t make the implementation faster. For any list of strings (ie: names of the in/outs), you can just use the standard less-than ordering to sort them. If two sets of strings are identical (and thus can link together), they will have the same order.

Doing this makes the whole test boil down to a single, long string equality test. You can even encode the type with an 8-bit integer that’s just another character to test equals with.

Look at the implementation. If programs on average have 6 in/out pairs, and each name is 8 characters long on average, and you can fit all of the types in one byte, then each name+type is 9 characters. The total list is 54 characters. Since all you want is an equals/unequal test, you don’t have to go character-by-character; simply compare whole 32 or 64-bit integers at a time. That makes 7 comparisons to test if these are equal; 8 if you count adding the size of the respective strings to the beginning.

The real performance issue and potential killer is part of the reasoning behind bindless graphics: cache thrashing. Accessing both objects, and reading the values is almost certainly going to cause a lot of uncached read operations.

Of course any kind of linkage checking will require that. Just not quite as much as 54+ bytes worth for each shader stage.

Rendezvous by name, but non within a dedicated type, has to me, a dangling feeling, also depending on the how severely the shaders are intermixed naming conventions are not a good thing, for example if you want to insert a geometry shader between a vertex and fragment shader, having just in/out is going to kill you, but having like this:

Rendezvous by name does not prevent one from defining a particular interface “struct”. All it does is give the user the freedom to not have to. EXT_separate_shader_objects is not about compiling programs with a single shader and a single string. You can still use the separate functionality when compiling programs with multiple shaders, or compiling shaders from multiple strings.

It just seems silly to have to change how you write shaders just because you want to use them differently.

  1. another method the nVidia fellow suggested, ala Cg:
    out type someinterpolator:ATTR0;

Right now, you have to set attributes indices to inputs and draw buffer indices to outputs manually, in C++ code. If this were used as a generic solution to that problem, I’d be willing to accept having to enumerate all inputs/outputs manually. And thus rendezvous by resource.

I don’t like it; I would rather that strings were used instead of attribute and draw buffer indices. But I’m willing to go halfway on it.

However, they should specifically be numbers, not strings like “ATTR0”. So it would look like:


in vec3 position : 0;
out vec4 color : 2;

This might pose a problem for compatibility contexts. I’d be willing to accept that such contexts could define specific strings for their fixed-function attributes. Such as:

in vec3 position : vertex

I’m not fond of it, but I don’t use the fixed attributes anymore either, so I don’t have to use it :wink:

I tend to think that nVidia really, really likes GL and if you look at the credits for the specification, you can clearly see that nVidia has a big hand in making them.

Of course they like OpenGL. But they often treat it as their own personal playground for high-performance NVIDIA-only stuff. Vertex array range comes to mind, as does bindless graphics. OpenGL’s greatest strength is being cross-platform.

The main part I do not like about doing this for shaders:

vertex shader:
in type inattrribute;
out type outinterpolator;

fragment shader:
in type outinterpolator;

is that it makes inserting a geometry shader between a vertex and fragment shader not possible since it would have to do:
in type outinterpolator;
in type outinterpolator;

where as wrapping it into a block avoids that, or specifying the attribute avoids it entirely. This is my main beef with rendezvous by name, and once you do blocks:

vertex shader:
out
{
type0 outV0;
type1 outV1;
} outblock;

fragment shader
in
{
type0 inV0;
type1 inV1;
} inblock;

geometry shdaer:
in
{
type0 inV0;
type1 inV1;
} inblock[];

where then outV0 --> inV0, outV1–>inV1…

so your string deal is not the interpoaltor names, but the variable types and the order they come in.

Interface blocks already exist in GLSL 1.5. They’re optional, so if you don’t need them, you don’t have to use them. But if you want them, then you can use them.

Interface blocks already exist in GLSL 1.5. They’re optional, so if you don’t need them, you don’t have to use them. But if you want them, then you can use them.

Yes, and know they are already 1.50 (which was after the separate shader spec really), what I am leaning towards at this point is to make them required, not optional, this way rendezvous by resource is hidden behind the block syntax, and in doing so allowing the insertion of shaders between shaders (for now it is really just to have or not to have geometry shader, but who knows what GL4 will bring us from DX11)

I don’t see how interface blocks make rendezvous by resource any more or less hidden. One string of inputs is just a string of inputs. A string of outputs is a string of outputs. You compare them. If they’re equal, then they interface. If not, then they don’t.

Whether you define them in an interface block or not doesn’t matter. The driver knows what the input and outputs are for each shader stage.

And as you point out, interface blocks are basically required to allow shaders to be inserted between shaders. So users already have a reason to use them when this becomes important.

After re-reading the GL_EXT_separate_shader spec, I’ve realized that an important part of the spec is dealing with mis-matching programs. That is, programs that only partially line up correctly, where inputs in one stage aren’t fed by outputs from another, and vice-versa.

If the real version of this is to provide equivalent functionality, then it must also provide this behavior. When it was a simple binary “match or can’t use” setup as I originally thought, rendezvous by name was perfectly adequate. The comparison test could be done easily and swiftly.

Once you need to have mis-matching and the filling in of missing inputs, rendezvous by name doesn’t make as much sense. It can certainly work, but since the mis-matching process requires a lot more tests, string name comparisons will start being a factor in program binding calls.

If rendezvous by resource is the only way to go, then the user needs to be able to specify those resources in the shader itself. And these resources should double as attribute and drawbuffer indices, where appropriate.

The syntax I suggested above would work fine.