Supporting offline GLSL compilation basically just means revivifying the ARB assembly syntax, updated to modern shader features.
Your use of “just” there suggests that this is a simple undertaking. Everyone understands what would have to be done. The question is whether it would be worthwhile.
smaller program sizes. Yes, this really matters. We’ve recently measured that disk seek and read time is the big killer in loading times for even simple games.
Simple games don’t care about loading times; they’re not spending a lot of time loading things because they don’t have a lot to load. They are, as you say, “simple games.”
Cutting a file down from several KB to a few hundred bytes makes a big difference, especially if you have a lot of files like that.
I agree in principle, but the problem with your assertion is that you haven’t demonstrated that GLSL shaders are significantly larger than ARB assembly shaders that do the same thing. Note that ARB assembly already allows you to use reasonable variable names (no r1, r21, etc nonsense), so reasonably-written code will use reasonable variable names.
For example, here’s a fairly lengthy GLSL shader of mine:
<div class=“ubbcode-block”><div class=“ubbcode-header”>Click to reveal… <input type=“button” class=“form-button” value=“Show me!” onclick=“toggle_spoiler(this, ‘Yikes, my eyes!’, ‘Show me!’)” />]<div style=“display: none;”>
#version 330
in vec4 diffuseColor;
in vec3 vertexNormal;
in vec3 cameraSpacePosition;
out vec4 outputColor;
uniform vec3 modelSpaceLightPos;
uniform vec4 lightIntensity;
uniform vec4 ambientIntensity;
uniform vec3 cameraSpaceLightPos;
uniform float lightAttenuation;
const vec4 specularColor = vec4(0.25, 0.25, 0.25, 1.0);
uniform float shininessFactor;
float CalcAttenuation(in vec3 cameraSpacePosition, out vec3 lightDirection)
{
vec3 lightDifference = cameraSpaceLightPos - cameraSpacePosition;
float lightDistanceSqr = dot(lightDifference, lightDifference);
lightDirection = lightDifference * inversesqrt(lightDistanceSqr);
return (1 / ( 1.0 + lightAttenuation * sqrt(lightDistanceSqr)));
}
void main()
{
vec3 lightDir = vec3(0.0);
float atten = CalcAttenuation(cameraSpacePosition, lightDir);
vec4 attenIntensity = atten * lightIntensity;
vec3 surfaceNormal = normalize(vertexNormal);
float cosAngIncidence = dot(surfaceNormal, lightDir);
cosAngIncidence = clamp(cosAngIncidence, 0, 1);
vec3 viewDirection = normalize(-cameraSpacePosition);
vec3 halfAngle = normalize(lightDir + viewDirection);
float blinnTerm = dot(surfaceNormal, halfAngle);
blinnTerm = clamp(blinnTerm, 0, 1);
blinnTerm = cosAngIncidence != 0.0 ? blinnTerm : 0.0;
blinnTerm = pow(blinnTerm, shininessFactor);
outputColor = (diffuseColor * attenIntensity * cosAngIncidence) +
(specularColor * attenIntensity * blinnTerm) +
(diffuseColor * ambientIntensity);
}
[/QUOTE]</div>
The ARB-assembly version would still use the same long variable names. The ARB-assembly version would have “MUL” in place of every *, “ADD” in place of every +, etc. It would also have to have a lot of temporary variables that I don’t need to explicitly state.
The main difference would be the lack of a function call. And that’s assuming that you disallow functions in this advanced version of ARB-assembly.
I’d bet that the ARB version would be larger in terms of byte size than the GLSL version. And I can make the GLSL version smaller by using larger expressions. I could compute “blinnTerm” in a single line rather than 4 (though this would make the code less readable). You can’t do that in the ARB-assembly version.
Now, let’s say you found a way of encoding your ARB-assembly shaders into some kind of smaller, binary format. That’s basically just employing a compression algorithm. There’s no reason you couldn’t do something similar with GLSL shaders. And there’s no reason to expect that shaders wouldn’t zip-compress well, particularly if you stick them in a TAR (or something similar) before compressing them. That way, the same names in different shader files will be compressed with the same bitpattern. You’ll get smaller files, and you can decompress them all-at-once.
While ARB-assembly shaders might benefit from compression more than GLSL shaders (the repeated use of certain instructions), I doubt that they will benefit from it so much more than the final size difference will be substantial.
Loading a PNG and decompressing it on the CPU for instance is way faster than loading a TGA; all the time the CPU ‘wastes’ in decompressing the image data is more than made up for by how much faster the image is loaded for disk, even for small images.
Perhaps (depending on the CPU in question), but the fact that this is a TGA instead of an S3TC or BPTC-compressed texture means that you’re losing quite a bit of memory. Yes, your load-times may be faster, but you’re using up ~4x the memory compared to compressed textures. PNG doesn’t work with texture compression, and good S3TC compression algorithms take longer than the read time of their output to work. So you’re not going to be compressing it on the fly.
So either your data is PNG compressed on disk, or it’s S3TC compressed on disk. You need to make a decision: runtime performance in both memory and actual texture access speed, or load-time performance?
For most applications, you’re going to use a texture far more than you’re going to load it. And you’re really going to want the extra texture space you get from compressed texture formats.
It also makes it possible for a well-known GLSL compiler library to exist to help minimize all those pesky NVIDIA/AMD GLSL compiler differences by simply removing them from the high-level language part of compilation.
And that’s where you’re going to run afoul of two simple facts.
1: OpenGL is, and always will be, backwards-compatible. GLSL is core now. Therefore, it will always be core. NVIDIA and AMD will have to support it. And support it in addition to whatever enhanced ARB-assembly language you suggest. Even if they stopped extending GLSL and froze it at 4.1, the IHVs would still have to support it.
You will not make drivers more conformant by giving IHVs more test cases and a larger specification to conform to.
2: Most of the “pesky compiler differences” would still remain in an ARB-assembly language. Sure, you would lose some, like NVIDIA’s slightly more lax GLSL front-end. But the most egregious would remain, as those tend to be in the lower-level optimizing routines.
Let’s look at some GLSL-based driver bugs reported on these very forums.
AMD: struct array in UBO: This is based on the internal accessing code of UBOs. It has nothing to do with the compilation of GLSL code, and it would have occurred in AMD’s implementation of an enhanced ARB-assembly language that supported UBO-like constructs.
AMD: textureOffset flickering and texelFetch crash: It’s not clear if the second part of this bug (the glCompileShader crash) would be solved by your suggestion, but the first part would certianly be unaffected. The enhanced ARB-assembly would have similar texturing functions, and AMD would be just as capable of screwing up with this code in ARB-assembly form as GLSL form.
AMD: The craziest bug I have even seen! Oo: This one would be solved, since ARB-assembly probably wouldn’t have function overloading.
So that’s not many bugs fixed.