GLslang: 'promises'

I think it may be useful for shaders to contain addtional usage information in form of application ‘promises’.

I’ll give a brief snippet with first draft syntax

//constants and stuff goes here

promise      //typeless? perhaps, as it should be a keyword
{            //usual block notation to compound multiple things
  alpha_test(greater,0.0);
  //relational expressions of type bool, promised to be true
  0 == (tex_coord0.p);
  1 == (tex_coord0.q , tex_coord1.q);
}

void
main()
{
  <...>
}

This would then signal to the compile step that this shader will only be used with an active alpha test. It would also signal that tex coordinate set 0 will always be of the form (s,t,0,1). To extend upon the idea, I’ve added a constraint on texture coord set 1 also, concatenated by comma.
This one could be extended to full relational expressions, all of which should by virtue of ‘promise’ evaluate to true. I’m not quite sure how to make a nice syntax for that. These should be applicable to uniforms as well.

If one specifies conflicting promises, or links shader code with conflicting promises, this would have to be detected and reported as an error I think.

The promise concept could be extended ad infinitum, once it’s in place

Now, what’s the use?
Promising the alpha test condition as above would allow hardware that doesn’t natively support ‘discard’, but does support conditional moves to compile and execute corresponding fragment shaders. ‘discard’ could then be implemented as a conditional move to the output’s alpha channel.
If the application’s promise is broken, results would be undefined, of course.

Likewise, hardware that doesn’t have hardwired zero/one constants and no other way to generate them on the fly could make utility of the ‘promised’ input conditions. Or, to make it a little harder, hardwired 0.5 constants …

All of this would have no other purpose than allowing compilation of more complex shaders to a broader range of targets, by providing compile-time usage hints that can’t be elegantly expressed otherwise. I believe maximum target support would be a good thing for a high level language to tackle.

Thoughts?

[This message has been edited by zeckensack (edited 06-20-2003).]

Nothing? Well, actually I take it as a good sign, usually threads in this forum are dismissed within a couple of days (and I’m more often that not dismissing ideas myself …).

I’ll rephrase, while I’m at it:
‘promises’ provide a means to supply the compile step with usage hints, which may be used to simplify/improve resource usage beyond the information that could’ve been extracted from the shader code itself. This is achieved by reducing the possible state/input combinations (currently: everything) that a shader must work with to a subset.

glHint() is not well suited to this task, as there’s no way to prevent changes to hint state after compiling. If the usage hints are to be used for compiling better shaders, that would incur a recompile, which is probably bad (I take a stance for aggressive optimizing compilers, expensive to restart them all the time).

‘promises’ would apply to one shader, because they shall be used for compiling this one shader. Not much point in making them context global state.

Also, the hint mechanism is much too simplistic, text has much better expressive power. The compiler already includes a tokenizer/parser … so the right place to add it is the shader code.

A shader with promises can also trivially reduced to a shader without promises. An application may even wish to maintain otherwise identical shaders (if there is a possible benefit).

‘promises’ can easily be implemented as doing nothing at first, so IMO it wouldn’t increase compiler complexity beyond reason. You just ignore all promises, and if you later on feel a need to extract a bit more goodness, look at the promises. It would still be nice to have the syntax in place, so people can get used to it (even without immediate benefit).

The only required immediate implementation burden would be checking for conflicting promises, which may be either very hard or trivial, depending on the allowed complexity of relational expression promises.

I like the idea, but I don’t like the syntax. I would rather see something like:

ASSUME tex_coord0.p = 0;
ASSUME tex_coord0.q = 1;
// or…
ASSUME tex_coord0.pq = {0, 1};

This is almost like declaring a constant in C/C++. Also, I like the word assume better than the word promise, because you’re telling the compiler to assume that a certain input or state variable has a certain value.

Originally posted by Aaron:
[b]I like the idea, but I don’t like the syntax. I would rather see something like:

[quote]

ASSUME tex_coord0.p = 0;
ASSUME tex_coord0.q = 1;
// or…
ASSUME tex_coord0.pq = {0, 1};

This is almost like declaring a constant in C/C++. Also, I like the word assume better than the word promise, because you’re telling the compiler to assume that a certain input or state variable has a certain value. [/b][/QUOTE]Fair enough.
I like ‘promise’ because it strongly implies a responsibility … and that something bad may happen if the promise is broken

All of this would have no other purpose than allowing compilation of more complex shaders to a broader range of targets, by providing compile-time usage hints that can’t be elegantly expressed otherwise.

Consider that only Radeon 9500+ and GeForceFX hardware (from the big 2) could even consider implementing glslang. The glslang spec requires looping in the vertex shader, which earlier hardware does not have.

Anyone who comes along later with hardware will understand what the basic level of functionality needs to be in terms of shader processing, so there won’t be a problem with someone building new technology that doesn’t implement something that glslang needs.

You’re proposing that high level shading languages should never be available on anything below this year’s hardware?

Why?

If harware resources run out, I’m fine with failed compilation. OTOH I have lots of very simple shaders I’d like to funnel through a unified interface … I currently funnel a lot through my brain to generate NV_rc/ATI_fs/ARB_tec code.
Ask what happens if a shader definition gets changed (say, if you work with artists …).

Imagine what would’ve happened if the implementors of the GLQuake days had said “No, we don’t have transform hardware, we can’t do that”.
An API of broad scope on limited hardware. Same argument, basically.

I know I’m not the only one who wants this, assembly interfaces just suck (from a longetivity of code point of view).

You’re proposing that high level shading languages should never be available on anything below this year’s hardware?
Why?

First of all, I’m not proposing anything. I’m not even assigning a value judgement to either side. I’m simply explicitly stating what is currently a fact.

Glslang cannot, given its current spec, run on a GeForce3/4, or even a Radeon 8500. This is simply how it is. The requirements for glslang are too steep. As such, if you want a glslang that can run on older hardware, you will have to make lots of changes to the glslang spec; your “Promise” method is barely a beginning.

I know I’m not the only one who wants this, assembly interfaces just suck (from a longetivity of code point of view).

If you go back far enough in computing days, there were systems that were just not capable of running C. They just didn’t have the fundamental features that a C program requires.

Granted, C was invented late enough in the computer realm that it could be implemented on the avaliable hardware. However, given that C is a guide to virtually all high level shader languages, you’re seeing the evolution of C in shader languages much sooner than it appeared in previous times. The idea of trying to run glslang on a GeForce3/4 is like trying to get C to run on ENIAC; it just can’t do it.

And, this is probably for the best. While it is unfortunate that slightly older hardware can’t run modern shader languages, it is probably better that modern shader languages aren’t weighed down by the baggage that GeForce3/4 hardware brings to the table. Look at NV20-based Cg shaders; you don’t want to have that kind of baggage wandering around.

Originally posted by Korval:
First of all, I’m not proposing anything. I’m not even assigning a value judgement to either side. I’m simply explicitly stating what is currently a fact.
Yes, it is a fact. It’s not a fact I particularly appreciate, that’s why I started thinking in this direction. If time permits I’ll try and demonstrate that this is unnecessary (ie I’m deep into the design stage of a GLslang compiler for all kinds of low capability targets).

Originally posted by Korval:
Glslang cannot, given its current spec, run on a GeForce3/4, or even a Radeon 8500. This is simply how it is. The requirements for glslang are too steep. As such, if you want a glslang that can run on older hardware, you will have to make lots of changes to the glslang spec; your “Promise” method is barely a beginning.
Nope, I don’t think so. I take issue with these “requirements”, they actually don’t exist.

Just for the sake of prose: GLslang is a human readable description language for setting up pipeline resources. It’s aware of advanced functionality, ie it has sufficiant expressive power for that, but you don’t need to use it. The beauty is that you can detect cases where you oversubscribe resources, this can automatically apply to non-present features (eg dependant reads). 3DLabs wisely made provisions for that.

Now, what if that happens? Just like today, if the shader doesn’t want to run, you trim down until you find something that does (or you drop through your minimum reqs and give up). If you do this with GLslang (purely academical atm), you use one interface, for all vendors, for all tech levels.
Seen this guy ? He could have been saved …

To reverse the argument, I can auto-generate GLslang code from all texture env/reg combiner/fragment shading extensions known to mankind. I hope this is obvious enough so that I need not prove it.
So far that’s useless because I can’t compile them back. But these ‘shaders’ are guaranteed to fit the resource restrictions. And they are valid GLslang code.

Originally posted by Korval:
[b]If you go back far enough in computing days, there were systems that were just not capable of running C. They just didn’t have the fundamental features that a C program requires.

Granted, C was invented late enough in the computer realm that it could be implemented on the avaliable hardware. However, given that C is a guide to virtually all high level shader languages, you’re seeing the evolution of C in shader languages much sooner than it appeared in previous times. The idea of trying to run glslang on a GeForce3/4 is like trying to get C to run on ENIAC; it just can’t do it.[/b]
This comparison isn’t particularly relevant because you’re mixing up runtime compilation with inherent fallback provisions and (typically) offline compilation from a single code base (because all C targets can be reasonably assumed to support the same language features; graphics chips are vastly different).

An ENIAC is probably not capable to run any compiler itself, so this argument is kind of moot. If the ENIAC is the most powerful machine at your disposal, well … you don’t even have a compiler. If you have another machine that can run a C compiler, why would you still need the ENIAC?

With graphics chips you always have a host processor nearby that can tackle such tasks.

I’d also like to point you towards the fact that C development for general purpose machines shares the resource oversubscription issue, although on a different scale. If you try to compile huge programs for an 8086, you’ll eventually run out of code segment space, even though the code may have been perfectly valid. So you must try again with less code, or you give up.

Originally posted by Korval:
And, this is probably for the best. While it is unfortunate that slightly older hardware can’t run modern shader languages, it is probably better that modern shader languages aren’t weighed down by the baggage that GeForce3/4 hardware brings to the table. Look at NV20-based Cg shaders; you don’t want to have that kind of baggage wandering around.
I hate this. In essence it’s just whining and laziness. If GLslang becomes the standard shading language, vendors will have to write (or license, buy, whatever) the compilers anyway for their ‘bigger’ products. So they presumably have the parser/tokenizer/code generation covered. Plop in a different back end, and you’re good to go.

In my mind, it can’t really harm to start building the required expertise with simpler architectures. You know, learn to walk before you run. It’s really shameful to see all these cheap excuses for not doing it. It will have to be done anyway, why not start now?

I think I should elaborate on why I so hardly want broad high level shading language support, but maybe it’s better to do that somewhere else. I’ll just quote two bullet points from the 3DLabs GDC pres, that IMO sum it up nicely.

Key benefits:
Shader source is highly portable
No need to change app when compiler improvements occur

GLslang is a human readable description language for setting up pipeline resources.

That’s one way of looking at it. Another way of looking at it is as a language that is compiled into GPU opcodes to be run on various programmable processors on a graphics chip. Obviously, if the hardware has no programmable processors, glslang is hardly appropriate for it.

The beauty is that you can detect cases where you oversubscribe resources, this can automatically apply to non-present features (eg dependant reads).

Looping isn’t a resource. Conditional branches aren’t resources. They are fundamental parts of a language without which, you’re just programming in a glorified assembly. Loops, branches, and functions are what separates C from assembly. So, what good does it do to use C on a system that fails to compile if you use loops, branches, functions, etc? You may as well be using assembly, or even NV_register_combiner/texture_shader*.

Just like today, if the shader doesn’t want to run, you trim down until you find something that does (or you drop through your minimum reqs and give up).

There’s a difference between restructuring your code to use fewer temoraries and not using loops/etc. Clearly, if your algorithm called for a loop, you really needed to loop over some quantity. Simultaneously, if your algorithm called for a conditional branch, you really did need to branch based on that.

This is the equivalent of saying, “Well, we have these language features, but they may or may not always work.” If you can’t tell beforehand whether or not the feature was going to work, what’s the point of having the feature to begin with? Indeed, what’s the point of the higher-level language?

That’s like saying that texture objects are a suggested feature of OpenGL 1.1, that an implementation is OK to not apply them. No, it’s not OK for a 1.1 implementation to not have texture objects; the spec says those functions are there, so any 1.1 implementation will have these functions.

The glslang spec says that glslang can have loops; therefore, it will have loops. Compilers that fail because of loops are not glslang-compilers; they are compilers for some other, proto-glslang, shader language.

The purpose of this is so that there is absolutely no incentive to programming using a subset of the full language. If you can use the language at all, you can use the full power of that language. Having sub-languages, where using the full language is forbidden or discouraged for whatever reason, is a bad idea.

I hate this. In essence it’s just whining and laziness.

That’s one way to look at it. Another way might be that this is a rational idea, founded in the understanding that, in order to progress, we must leave the old ideas behind us. Do you want glslang to look like NV20-based Cg shaders?

In my mind, it can’t really harm to start building the required expertise with simpler architectures. You know, learn to walk before you run. It’s really shameful to see all these cheap excuses for not doing it. It will have to be done anyway, why not start now?

Why will it have to be done? In 2 years, nobody will be using GeForce 3/4 hardware. It’ll be obselete, and all the work done to make them use glslang will be for naught.

I’ll just quote two bullet points from the 3DLabs GDC pres, that IMO sum it up nicely.

I’d like to point out that the glslang 3DLabs proposed was more flexible, and had even less of a chance of being used in low-end machines. Indeed, a GeForceFX or Radeon9500+ couldn’t hope to run the full language.

Korval,

That’s one way of looking at it. Another way of looking at it is as a language that is compiled into GPU opcodes to be run on various programmable processors on a graphics chip. Obviously, if the hardware has no programmable processors, glslang is hardly appropriate for it.
The difference between ‘programmable’ and ‘configurable’ is purely artificial. Look at Geforce 3, where programmability was claimed by marketing, while in reality it’s just a configurable machine. The only definition of ‘true programmability’ I would accept is the ability to fetch and execute opcode streams from memory. This mainly serves the purpose of allowing arbitrary program length. A Geforce 3 obviously doesn’t offer this.

Looping isn’t a resource. Conditional branches aren’t resources. They are fundamental parts of a language without which, you’re just programming in a glorified assembly.
Loop control is a hardware resource, because the availability of any opcodes fit for looping is a hardware trait. Look at the limitations given in the NV30 launch information, and you’ll see that loops can also be oversubscribed. Duh. In fact, there’s sufficient evidence towards a lack of conditional jump instructions on NV30 (‘kil’ doesn’t save cycles), that kind of rules out loops and branches, no?

Conditional branches can also be expressed without explicit hardware support. Keyword: predication. Ever heard of Itanium? Oh, btw, Itanium has limited resources for predication, you should try not to oversubscribe them. Better hope your compiler sorts that out for you.

Predication can even be simulated on simpler SIMD machines. And it can certainly be implemented on a Geforce 3 (within the limits of program length of course). Beg your pardon, but I know what’s possible and what’s not possible, I’ve been there.

There’s a difference between restructuring your code to use fewer temoraries and not using loops/etc. Clearly, if your algorithm called for a loop, you really needed to loop over some quantity. Simultaneously, if your algorithm called for a conditional branch, you really did need to branch based on that.
There is no difference. Fixed count loops can be unrolled, branches can be inlined, conditional branches can be predicated. This fits the term ‘restructuring’ quite well, I’d say.

This is the equivalent of saying, “Well, we have these language features, but they may or may not always work.” If you can’t tell beforehand whether or not the feature was going to work, what’s the point of having the feature to begin with? Indeed, what’s the point of the higher-level language?
It doesn’t work this way. Try another.
Counterquestion: What’s the point of implementing an NV30 path, if that may not run on the user’s machine (because he doesn’t have an NV30)? What’s the point in supporting register combiners, or ATI_fragment_shader?

It’s an optional path, if you detect that you can’t use it, you fall back. Doing this on top of a common interface would be better, not worse than anything OpenGL programmers currently need to go through.

The glslang spec says that glslang can have loops; therefore, it will have loops. Compilers that fail because of loops are not glslang-compilers; they are compilers for some other, proto-glslang, shader language.
We’re talking about runtime compilation. You cannot even tell beforehand whether anything will fail. You can write shaders today that won’t compile anywhere but that might compile and run fine on hardware in two years, without touching the code. Even on the ‘proper’ GLslang targets that aren’t yet on the market (your view, not mine), compilation is bound to fail with too much shader code. Does this really defeat GLslang?

You somehow pretend that GLslang - because it resembles C - means ‘unlimited resources’, which is clearly not the intent. Every machine is limited, and it’s clearly the compiler’s job to hide these limitations from you. I repeat myself, there are mechanisms to detect compilation (!=compiler) failure, and they are surely there for a purpose.

I wish to do this with ‘proto’ GLslang because it’s reasonable to concentrate research efforts on one interface, one that will eventually become very important, rather than scattering time and energy over different syntactically incompatible things (nvparse, Cg, PS1.1, whatever). I’d like to see the most versatily and well designed choice go forward as fast as possible, that also requires creating incentive to use the interface. That’s all.

That’s one way to look at it. Another way might be that this is a rational idea, founded in the understanding that, in order to progress, we must leave the old ideas behind us. Do you want glslang to look like NV20-based Cg shaders?
GLslang can express all NV20 shaders. Why would I want to detract development efforts toward Cg, when I know it’s destined to eventually die? ( my take on some of NV’s official Cg statements )

Why will it have to be done? In 2 years, nobody will be using GeForce 3/4 hardware. It’ll be obselete, and all the work done to make them use glslang will be for naught.
Two part rebuttal:
1)You’re dreaming. Nobody uses a Geforce2MX anymore, right? Half Life 2 is slated for release this year. The official statement is that RivaTNT class hardware will be fully supported.
2)You again miss the point of runtime high level compilers. Even if the hardware around then will be much more capable, you don’t need to patch old applications to fully utilitze it. Frankly, you shouldn’t bother, it’s not your responsibility to patch your old stuff for higher efficiency on new hardware. The only ones interested in that sort of thing are the hardware vendors, so they should carry this burden. Not much of a burden anyway, if we could manage to consolidate on a single codepath that can be optimized to hell and back with all the saved manpower that isn’t required anymore for maintaining the old clutter.

Korval,
we’re drawing way off topic I think. Is there maybe some older “Pros and cons of high level shading” thread that we might use for our little battle?

I want to add one small example of what can be done with predication. I’ve posted some x86/SSE code here: http://www.forum-3dcenter.org/vbulletin/showthread.php?postid=993203#post993203

It uses emulated predication for a loop that has unknown and dynamic loop count. This way, it can execute four of these unknown length loops ‘in parallel’. It computes the mandelbrot fractal btw.

I’ve cheated of course, because the loop is not unrolled (which it would have to be in a shader back end), but that’s trivial as long as the allowed program length is sufficient.

I’ve cheated even more, because every 16 iterations, the parallel predicates are accumulated and a check is made if it’s possible to leave the loop. In a shader, a true early out scheme can’t be done (which is also the way the NV30 seems to handle things). In this case the loop would run for a fixed maximum iteration count, which causes a performance loss. At least it doesn’t interfere with the computed results.

Well, it’s still x86 assembly, but it should demonstrate what can be done with predication.

The benefit lies in the parallel computation of multiple samples using a single instruction flow. This potentially saves huge gobs of logic. I’d rather see graphics hardware moving in that direction, eight independant control units with branch prediction and other crud known from the x86 world are certainly big time die space consumers. Sharing these resources across all pipelines will be much simpler to implement.

we’re drawing way off topic I think. Is there maybe some older “Pros and cons of high level shading” thread that we might use for our little battle?

I don’t see that this is off-topic. The topic is whether or not a high level shading language should be compiled for systems that cannot actually implement the language itself.

Do you agree with me that an OpenGL 1.1 implemention that does not support texture objects is a broken implementation? Also, do you agree that a C compiler that did not allow recursion was also broken?

If you don’t agree on these points, then it is clear that you have no particular respect for specs. In which case, knock yourself out with your pseudo-glslang compiler.

If you agree on these points, if you agree that a spec for something defines what that something both is and is not, then it is important for you to understand that low-end hardware (hardware that doesn’t support ARB_fragment_program) simply cannot implement any form of glslang.

Here are a few violations of the glslang spec that you will be imposing on code compiled with your faulty compiler:

  1. Floating-point accuracy. The spec requires something like a dynamic range on the order of 2^32, with a decimal accuracy equal to or better than 1x10^-5. Pre-ARB-fragment-program hardware just can’t do this without implicit multipass (which requires a lot more than implementing the glslang extension specs).

  2. Conditional branches. The spec requires a compiled version to implement arbiturary conditional branches, and most C flow-control structures. You can’t do this with older hardware. At least, not guarenteeably in every case. The spec doesn’t allow for failure due to conditional branches (though it can fail because an unrolled loop goes past the instruction limit). Failing due to the use of a loop is like failing due to adding two numbers; it’s a fundamental feature of the language that is required to work.

  3. A minimum of 32 varying floats. Older hardware just doesn’t support this many. At best, you have the 8500 which supports 6. You’re not getting around this limitation without implicit multipass.

Also, note that the spec specifically requires that this is the minimum. An implementation that violates this minimum is as broken as the 1.1 implementation that doesn’t have texture objects.

Let’s analyze the third violation. Why did the ARB decide on this absolute minimum? After all, it clearly restricts glslang from any older card. So, what was the purpose?

Well, the only thing that this limit does is force a restriction on what hardware can run glslang. After all, if the minimum was something like 8 floats (two texture coordinates), somebody might get it into their heads to make a glslang compiler for TNT’s. The ARB has no interest in it because it would create a fundamental dicotomy in the language. There would be the official spec’d language, and then there would be the ‘language-in-use’.

That is the danger in creating a compiled language where you allow the compiler to fail for apparently random or arbiturary reasons. What can happen is that people start writing to the lowest-common denominator. As such, they want to make the LCD as high-end as possible. Which would be around the Radeon 9500+ level, with it’s 4-texture dependency issue.

When people use glslang, the ARB wants them to understand that the language provides them more than they ever had before. As such, it should be restricted from running on hardware that does not provide that power. What good does it do?

There are perfectly good interfaces for getting at per-vertex and per-pixel computation in older hardware. ARB_vertex_program can be run on anything. And the tex_env combine stuff is relatively powerful (though lacking texture-coordinate accessing functionality).

As such, the ARB, with this restriction, has stated that there is pre-GL2.0 hardware and post-GL2.0 hardware. Post-2.0 hardware can use glslang. Pre-2.0 hardware can’t. This is a simple, and very good, restriction. Whenever you find that you can use glslang, you will find that you have access to a great deal of power and functionality. If you can’t, you automatically know you don’t have that kind of functionality and power.

When you use OpenGL 1.0, there is the expectation of a certain level of functionality. That is, the implementation must provide 8 lights, regardless of how this may impact hardware. An implementation must provide a matrix stack of 32 elements or greater. Etc, etc. If your hardware can’t handle it, you have two options: do it in software, or don’t make a GL implementation. The third option, make it anyway and just let people use a subset of the spec, is not an acceptable option.

Glslang was not designed with the intension of ever being run on all possible hardware, just as GL 1.0 was not. Instead, it was designed to run on a certain level of hardware, with fundamental assumptions about what that hardware can and cannot do.

[This message has been edited by Korval (edited 06-29-2003).]

[This message has been edited by Korval (edited 06-29-2003).]

Korval, I see we won’t be agreeing on this.
I’ll cut this a bit short, before I clarify what I actually want to do.

Originally posted by Korval:
I don’t see that this is off-topic. The topic is whether or not a high level shading language should be compiled for systems that cannot actually implement the language itself.
You’re concentrating on the language, I’d like to concentrate on individual programs.

Do you agree with me that an OpenGL 1.1 implemention that does not support texture objects is a broken implementation? Also, do you agree that a C compiler that did not allow recursion was also broken?
Yes.

… especially when I know that the target would easily handle it (okay, I admit that was mean).

If you don’t agree on these points, then it is clear that you have no particular respect for specs. In which case, knock yourself out with your pseudo-glslang compiler.
Is that so? Have I been demanding that GLslang should be supported by RivaTNT drivers?

I have suggested a mechanism to inject optional usage information into GLslang code, which implementations are free to ignore. This is what I would like added to the GLslang specs, nothing else.

This ‘pseudo-GLslang compiler’, which I’ll be very happy knocking myself out with, thank you, is a seperate issue. I still believe there are benefits to be had from my suggestion beyond that.

The crossover point between these two is code compatibility and the general usage model. Otherwise I’d be wasting my time reinventing nvparse. I will hopefully not need to explain again why I feel this ‘pseudo’ code is desirable.

If you agree on these points, if you agree that a spec for something defines what that something both is and is not, then it is important for you to understand that low-end hardware (hardware that doesn’t support ARB_fragment_program) simply cannot implement any form of glslang.
The spec thing should already be clear by now. I’d happily call any usage of GLslang, even a one-liner a “form of glslang”, but hey, maybe that’s just me.

<snip>
Let’s analyze the third violation. Why did the ARB decide on this absolute minimum? After all, it clearly restricts glslang from any older card. So, what was the purpose?

Well, the only thing that this limit does is force a restriction on what hardware can run glslang. After all, if the minimum was something like 8 floats (two texture coordinates), somebody might get it into their heads to make a glslang compiler for TNT’s. The ARB has no interest in it because it would create a fundamental dicotomy in the language. There would be the official spec’d language, and then there would be the ‘language-in-use’.
Both would be compatible. What’s the issue? I can today claim a ‘dicotomy’ between the specced language (ARB_fragment_program) and the language-in-use (NV_register_combiners). Notice how these two are distinct incompatible interfaces? What about distinct compatible interfaces, wouldn’t that be nicer?

The ARB’s stance encourages IHVs to go nuts with their hardware implementations, which is a good thing. The ARB also encourages floating point computation. If you don’t have the hardware, you can’t claim GL2. I like that.
But that’s a long way down the line. What if I want something that works now and doesn’t further encourage writing multitexturing code that’ll be obsolete in a couple of years?

<…>
What can happen is that people start writing to the lowest-common denominator.
People do that all the time. ‘High tech’ for marketing, ‘LCD’ for the average user. OpenGL’s current state encourages skipping intermediate tech levels.

When people use glslang, the ARB wants them to understand that the language provides them more than they ever had before. As such, it should be restricted from running on hardware that does not provide that power.
The ARB can restrict everything they like, they are the IHVs after all and they’ll have to provide the implementations.
You may restrict yourself to whatever you like.
And I’ll layer on top of implementations whatever I like.

What good does it do?
You did read this read, didn’t you?

There are perfectly good interfaces for getting at per-vertex and per-pixel computation in older hardware.
<…>
Yes, and these things make nice backends for runtime compilers.

As such, the ARB, with this restriction, has stated that there is pre-GL2.0 hardware and post-GL2.0 hardware. Post-2.0 hardware can use glslang. Pre-2.0 hardware can’t. This is a simple, and very good, restriction.
There’s no transition path. As long as there are no ‘true’ implementations, there won’t be any GLslang code. People are reluctant to create stuff that won’t run anywhere, they’ll just be chugging out yet more ARB_texture_env_combine code. Bleh.

Why do you have to be so negative about high level ‘low end’ shaders? If you think it’s pointless, don’t do it.

I have suggested a mechanism to inject optional usage information into GLslang code, which implementations are free to ignore. This is what I would like added to the GLslang specs, nothing else.

You said it yourself, “All of this would have no other purpose than allowing compilation of more complex shaders to a broader range of targets, by providing compile-time usage hints that can’t be elegantly expressed otherwise.” Considering that the ARB doesn’t intend for glslang to compile on these platforms, nor does it want to encourage it (the reason they choose 32-varying floats, when they could have made more accepting choices), there’s no point in requesting the feature.

Also, considering that your compiler can’t compile actual glslang (only a modified form of the language, because it can’t guarentee that you have the minimum number of varying floats or even the internal accuracy), why not simply make your modified spec official by adding keywords to it? If you really want, you can put them in speciallized comment blocks (like /** or /// at the beginning of a line means a special keyword) that real glslang compilers will simply ignore.

The spec thing should already be clear by now. I’d happily call any usage of GLslang, even a one-liner a “form of glslang”, but hey, maybe that’s just me.

You haven’t answered the question: how does your compiler not violate the glslang spec? Especially in terms of the minimum number of floats or the accuracy of computations?

And a 1-liner shader from your compiler isn’t a compiled form of glslang, because it doesn’t follow the spec. That it just happens to produce the same result, well, that’s a fortunate thing.

It is not very difficult to write a shader that, symantically, compiles under real glslang and your pseudo-glslang, but doesn’t produce anywhere near the same result. Simply make a value that goes above 1.0 (an additive blend of 2 textures), then multiply it with some other value (say, a vertex color). Your compiled shader will have no choice but to clamp it at 1.0 (on a GeForce3/4. A Radeon 8500+ can handle it). However, glslang mandates that the result be allowed to go above 1.0. Wildly different results.

What you don’t want is for someone to write something on your limitted shader, thinking they’re getting the same power as a real glslang shader, and then have it compile on a real glslang compiler to produce completely different results. That’s what specs are written to prevent. And that is why willfully violating these specs is such a bad thing.

Originally posted by Korval:
You said it yourself, “All of this would have no other purpose than allowing compilation of more complex shaders to a broader range of targets, by providing compile-time usage hints that can’t be elegantly expressed otherwise.” Considering that the ARB doesn’t intend for glslang to compile on these platforms, nor does it want to encourage it (the reason they choose 32-varying floats, when they could have made more accepting choices), there’s no point in requesting the feature.
There’s still a point. Getting semantics into shaders is always a potential benefit for optimizations. Let me refer you to the section where ‘reflect’ is explained:
"
genType reflect (genType I, genType N)

For the incident vector I and surface orientation N,
return the reflection direction R = I - 2 * dot (N,
I)*N.
Note N must be have unit length for this to work.
"

Obviously there’s an issue with this function. Implicitly there’s already a ‘promise’ attached to its parameter. I can make this promise explicit. Further, I can potentially deduce correct operation from promises on inputs, something the standard language cannot currently do.
When in doubt, you’d want to use reflect(normalize(whatever)) in your shader code.

If the reflect operation were to work on a cross product, and I know (by promise) that both inputs to the cross are already unit length, I can safely skip a following normalize (this idea extends beyond reflect, of course). If the shader programmer put it there because he didn’t want to take a risk, a standard compiler would have to leave it in (or gather similar knowledge from preceding operations in the shader; the collected data would be very similar to ‘promises’, so if that point is ever reached it would be no big burden to also add the explicit form; however, pure code analysis can never extend beyond the shader and won’t be as accurate).

If the ‘offending’ code is from a different code segment (via the attach mechanism), the issue should become more obvious.

My best guess is that a normalize uses an extra register and three cycles on many shader implementations (dot3, inverse square root, mul). Easy savings.

Originally posted by Korval:
You haven’t answered the question: how does your compiler not violate the glslang spec? Especially in terms of the minimum number of floats or the accuracy of computations?
It’s supposed to be a compiler for GLslang syntax, not a straight implementation of the spec and won’t be presented as such. Therefore, though it does not guarantee the same set of functionality as the GLslang spec, it’s technically not in violation. It just borrows the syntax and some aspects of the API. Users will be able to distinguish ‘fake’ and ‘real’.

And a 1-liner shader from your compiler isn’t a compiled form of glslang, because it doesn’t follow the spec. That it just happens to produce the same result, well, that’s a fortunate thing.
I didn’t mean to claim that. When I said “usage of GLslang” I referred to the syntax (as in shading language), not to the complete surrounding spec. Perhaps I should have been more precise.

It is not very difficult to write a shader that, symantically, compiles under real glslang and your pseudo-glslang, but doesn’t produce anywhere near the same result. Simply make a value that goes above 1.0 (an additive blend of 2 textures), then multiply it with some other value (say, a vertex color). Your compiled shader will have no choice but to clamp it at 1.0 (on a GeForce3/4. A Radeon 8500+ can handle it). However, glslang mandates that the result be allowed to go above 1.0. Wildly different results.
I’m aware of the issue. The resolution is to require explicit clamping for the lowest end targets. Looks ike this:
min(0,max(1,value)) //clamp to [0…1]
Or one can use the builtin clamp function, which is equivalent.
So this can be solved within the language, without requiring any extensions or hackery. Code that follows this rule will not cause any issues when moved over to a GL2 target.
Code that doesn’t clamp will require an ARB_fragment_program (or similar) target to succesfully compile. And then there are intermediate steps, R200 being probably the most complicated one, but possible.

What you don’t want is for someone to write something on your limitted shader, thinking they’re getting the same power as a real glslang shader, and then have it compile on a real glslang compiler to produce completely different results. That’s what specs are written to prevent. And that is why willfully violating these specs is such a bad thing.
Very true. Believe me, I’m doing my best to take care of any issues that would ultimately counter my motivation. It’s in my own interest.

Obviously there’s an issue with this function. Implicitly there’s already a ‘promise’ attached to its parameter. I can make this promise explicit…
<section removed>
pure code analysis can never extend beyond the shader and won’t be as accurate).

First, to correct something. The cross product of two normal vectors is not guarenteed to be normal, as the resulting vectors length is |A|*|B| * sin(theta), where A and B are the two vectors, and theta is the angle between them.

However, I understand your point. Here’s my counterpoint.

For over 20 years, programmers in C, and later C++, have not found the need for any kind of promise mechanism. Even more modern languages, like Java, Visual Basic, or C#, don’t have such facilities. Why? Because the symantics of a function is solely the responsibility of the coder.

If I pass the reflect function a non-normal vector, I have clearly done something wrong, symantically. Syntactically, it’s perfectly fine. However, symantically, it is wrong.

If I, as a coder, write a function F that takes a float on a range [0, 1], and I pass it 25, I have done something symantically wrong. The language should not have facilities for understanding the symantics of my code. Why?

Besides the fact that few, if any, languages have these facilities built directly into the syntax, speedy compiliation is of paramount importance to glslang. If, in the middle of a frame, I suddenly find the need to link some different shaders together, I don’t want my compilation taking longer because of this data verification mechanism.

A good programmer can deal with minor issues like, “is this data normalized”, or “does this parameter fall into the given range.” Programmers have been doing this for years; there’s no need to give them any help in this area.

Even if the compiler could, intrinsically, understand which operations produce what kind of output (whether or not certain operations will de-normalize normals, etc), its only real use is for debugging. If I, as a programmer, can’t determine whether a normal may, or may not, be normalized at a point in my computations, I’m probably not that good of a programmer.

Korval,
thanks for the correction about the cross product thing. Shame on me

Re the cost of agressively optimizing compilers, it’s probably an issue of design philosophy and can’t be nailed down one way or the other.

I believe it’s ultimately more efficient to invest the extra time once per program object, if it can save you, say 10% in instruction count, which will affect every fragment you push. Even if compilation takes ten times longer (I don’t know yet whether that’s a generous or a conservative estimate). 3DLabs’ proposed interface makes it wonderfully easy to do semi-offline compilation (at level load time or something) and to quickly reidentify shaders and avoid any repeated work.

This is just my opinion, nothing more. If you don’t agree, that’s fine

I want to comment on the C code comparison, though. Any C program is monolithic, it has a “main()” starting point and exits somewhere. Everything the program’s going to do will be known to the compiler, if it decides to look hard enough. There are issues with external data (files, user input) that may be bogus or corrupted and ambiguity created by language features (pointer aliasing) that cannot be overcome, but my point is that a shader program cannot stand alone, while a C program can. This isn’t really done in practice, because it would buy little and still cost loads of (compile) time and memory.

OTOH a shader is a single element in a chain. It’s also a very critical element, as it’s executed very very often, just like most time in larger C programs is spent in relatively small portions of code. There’s little incentive toward doing the most aggressive optimizations imaginable in a complete C program. You concentrate on the inner loops and optimize there. I link to think of shaders as a kind of these inner loops.

I believe it’s ultimately more efficient to invest the extra time once per program object, if it can save you, say 10% in instruction count, which will affect every fragment you push. Even if compilation takes ten times longer (I don’t know yet whether that’s a generous or a conservative estimate). 3DLabs’ proposed interface makes it wonderfully easy to do semi-offline compilation (at level load time or something) and to quickly reidentify shaders and avoid any repeated work.

The point is that, for a reasonably decent programmer, it isn’t going to cost any time. Said programmer knows when his vectors are normalized and when they aren’t. He knows whether or not the data coming in is normalized. He knows which functions can unnormalize vectors. As such, he will immediately recognise whether or not he needs to use “reflect(normalize(X))” or simply “reflect(X)”. So, your verification mechanism saves him no time.