glsl "offline" compilation

i don’t know wether this has been already discussed or not, but some sort of offline compilation for shaders would be really helpful

currently compiling takes sometimes ages, especially when you got many different shader iterations for different materials/lights

atm i have to compile all iterations at program start, even if more than half the shaders might never get used (in a “streaming” world)
loading/compiling on the fly is currently impossible (stalls!)

cheers

There has been a discussion about possibility of downloading/uploading compiled shaders from/to GPU.
Offline compilation is in my opinion not the way to go, since compiled shader is valid only for GPU/driver it has been compiled on.
Some DirectX games compile shaders during first run and store them on disk for later use. OpenGL does not allow that yet, but I’m sure it’s being considered.

I believe OpenGL GLSL compiler generates assembly code for a pseudo architecture. The exact code it generates is indeed implementation dependant AFAIK, but it must be recompiled (again) anyways to native machine code in order to run on the GPU.

We should indeed be able to query that pseudo-assembly code and store it in a binary file. It is a very good idea.

I believe OpenGL GLSL compiler generates assembly code for a pseudo architecture.
No, it does not. First, there is no single OpenGL glslang compiler, so something that doesn’t exist can’t do anything. Each implementation is free to make their compiler however they see fit.

nVidia prefers to compile their glslang to their ARB/NV assembly languages before using their pre-existing assembly compiler. ATi instead goes straight to the hardware assembly.

Originally posted by k_szczech:
snip
Some DirectX games compile shaders during first run and store them on disk for later use. …

that’s exactly what i’m hoping for :slight_smile:

Since each IHV is free to implement GLSL in a manner of its choosing, offline compilation into a single binary format would be next to impossible (you’d probably need one for each IHV). This is one of the reasons an IHV-neutral IL (intermediate language) would be beneficial.

As shaders become larger and more complicated, I imagine the need for something like this will only grow over time. And I wouldn’t want my GL implementation cutting corners in optimization strategies, just to keep compile times down (i.e., compiling my shaders during app startup is not my first choice).

OpenGL ES 2 supports offline shader compilation, but they’re looking at different tradeoffs; hardware is more likely to be a) fixed and b) a crappy platform to compile on.

Don’t know whether that helps you at all.

I think OpenGL needs offline shader compilation like DX9/DX10 does.

  1. To protect my intellectual property.

  2. Will save you warnings and errors headaches. If you pass the instruction count won’t compile. If you make an illegal cast will complain. If you use any reserved word not 100% GLSL compatible that works on NVIDIA but not on ATI will protest.
    Will make the shader syntax more strict less shader errors will be present on the drivers.

  3. Allows to see what mess is making the compiler so you could change the design to get better results.

  4. Will reduce the driver runtime size deploy and complexity because they won’t need to add a shader compiler like the 3D Labs one.

  5. Will reduce the OpenGL initialization time in your engine ( because won’t need to compile on-the-fly the shaders )

SO I vote yes :stuck_out_tongue:

Since each IHV is free to implement GLSL in a manner of its choosing, offline compilation into a single binary format would be next to impossible (you’d probably need one for each IHV).
The idea is that the different IHVs would be able to look at a well-formatted header in the blob and say, “Yes, I can use this” or “No, I can’t use this”. The ARB need not specify the details of the binary data outside of the header.

Endian and 32/64 bit issues are a couple of the problems that immediately jump out from this.

This is one of the reasons an IHV-neutral IL (intermediate language) would be beneficial.
Yes, but there are many difficulties involved in such a thing.

First, that means GL implementations need to support 2 compilers. They’re doing a crappy job with just one.

Second, in order to properly maintain performance for all hardware (not just hardware that looks like D3D shaders), the intermediate language needs to maintain many of the semantics of glslang. Functions can’t be inlined. Assignment of variables to temporaries and so forth can’t happen. Struct definitions need to remain in-place.

Really, at that point, what you save is having a simpler parser (expressions can be unfolded into simple instructions). The actual compilation duties aren’t that different. This isn’t a trivial savings, mind you, but it’s not terribly huge either. Particularly if you still need to write the glslang-to-IL compiler into your driver.

So, either the intermediate form won’t be too far from glslang, or it will loose important semantic information that a compiler could use to optimize a piece of code. I live in the realm with Performance is king, so I’m willing to accept that starting the application will be slow.

I think OpenGL needs offline shader compilation like DX9/DX10 does.
Offline compilation, as being discussed on this thread, is one thing. Offline compilation “like DX9/DX10” (ie, compiling into a specific intermediate language) is quite another.

To protect my intellectual property.
It does not, in any way, protect your IP. Because the compiled form is well-specified (unlike the “binary blobs” being mentioned above), someone can very easily figure out what your initial source code was.

Furthermore, if you really want to protect your IP, you would encrypt your files. Anything less is just a minor hindrance to someone looking for your shader code.

Will save you warnings and errors headaches.
Compiler bugs are compiler bugs, whether on glslang or an intermediate language. Granted, an intermediate language compiler is easier to write and test, thus resulting in fewer bugs, but that will not in any way fix nVidia vs. ATi errors.

Will reduce the driver runtime size deploy and complexity because they won’t need to add a shader compiler like the 3D Labs one.
OK, the ship has sailed. This is not being changed.

Glslang is in drivers and it’s not going to be removed. I didn’t like it, and I argued against it. So did nVidia. But it is done; there’s no point in arguing the point further. It has its advantages and disadvantages.

Originally posted by Korval:
Offline compilation, as being discussed on this thread, is one thing. Offline compilation “like DX9/DX10” (ie, compiling into a specific intermediate language) is quite another.

Oh well, let me clarify then: I think OpenGL needs offline compilation and to compile GLSL to a common-defined assembler instructions ( in the same manner HLSL compiles into dp3,madd,mul,texkill,cmp, etc )

Originally posted by Korval:
It does not, in any way, protect your IP. Because the compiled form is well-specified (unlike the “binary blobs” being mentioned above), someone can very easily figure out what your initial source code was.

Furthermore, if you really want to protect your IP, you would encrypt your files. Anything less is just a minor hindrance to someone looking for your shader code.

Nothing can protect my IP, indeed.
Encryption is not enough because somebody could write a fake OpenGL.dll and intercept my GLSL complete shader shource just overriding/exposing the glShaderSourceARB function.

On the other way, if is compiled into a assembly just can “figure” what my code does but will be harder to understand… that’s a form of “obfuscation” so protects my intellectual property a bit more than the current method.

Guess what does this code:

Originally posted by Korval:
Compiler bugs are compiler bugs, whether on glslang or an intermediate language. Granted, an intermediate language compiler is easier to write and test, thus resulting in fewer bugs, but that will not in any way fix nVidia vs. ATi errors.

Fix not, but minimize yes. There are more possibilities to bug the high-level GLSL code than an opcode one(madd,texkill,etc). In that way the driver can concentrate in the asm-to native GPU instructions compiler/optimization and we can skip the LEX/Yacc/GLSL syntax parser step ( which must be done after all, but by other entity like Khronos not by the OGL driver maker).

If you reduce enough the assembler instruction set and you make it very strict the compatibility should be better. Asm opcode instructions are more simple than HLSL and not very flexible so you have less possibilities to bug it.

Originally posted by Korval:
OK, the ship has sailed. This is not being changed.

Glslang is in drivers and it’s not going to be removed. I didn’t like it, and I argued against it. So did nVidia. But it is done; there’s no point in arguing the point further.
Unfortunally to keep the backwards compatibility should stay there, indeed. With the time could dissapear though ( if nobody uses it because the offline is much better I think ). But well, I think the error was to adopt the dynamic-compiled GLSL from a start… Now we have to pay the consequences.

Oh well, let me clarify then: I think OpenGL needs offline compilation and to compile GLSL to a common-defined assembler instructions
Very well. You’re wrong; it does not.

Guess what does this code:
PS3.0 is not an appropriate intermediate language. As I pointed out, a proper intermediate language does not lose the semantics of the original glslang. Those semantics are vital for optimizing; it was one of the arguments for glslang being in the driver, and it’s very true.

Without those semantics (function calls, structs, etc), the final compiler cannot determine when to inline and when not to, which is something that can change based on the hardware. For example, Intel’s new Larrabee x86-style GPU would probably not want to do nearly as much inlining (since it has an actual stack) as an nVidia GPU. But PS3.0 takes this decision out of Intel’s hand.

It is a crappy intermediate language.

Now, perhaps if you were to show a good intermediate language, then we could talk.

If you reduce enough the assembler instruction set and you make it very strict the compatibility should be better.
And in doing so strip the ability of the compiler to perform proper optimizations.

I would rather have the application start a bit slower than sacrifice per-frame performance. And if it means my shaders are out there in a form that can be easily seen, so be it.

The ability to decompile Java or .NET into human-readable and comprehendable forms hasn’t stopped these languages from becoming exceptionally popular. The tiny IP protection that a crappy intermediate form would provide is meaningless next to the performance gains of retaining the original semantics.

In short, there’s a reason that C looks like C rather than assembly.

Originally posted by Korval:
[QB] …The idea is that the different IHVs would be able to look at a well-formatted header in the blob and say, “Yes, I can use this” or “No, I can’t use this”. The ARB need not specify the details of the binary data outside of the header.

exactly. the so called “binary blob” should be strictly tied to the one specific machine (+ driver version) it was compiled on.
changing the driver or hardware should automatically invalidate the binary blob and require a new compilation of the shader at program start :slight_smile:

a common intermediate language would only complicate things without much benefit (besides that tiny ip-protection thingy mentioned earlier)

I would just like to agree with santyhamer. I think a common intermediate language is a good thing.

On my current project, (not OpenGL based) if we had to compile the shaders on startup, it would take literally hours to start. (LOTS of shader combinations)

Even a “binary blob” solution would not really be a good fit. Imagine seeing “Please wait, compiling shaders” for a few hours when you first load the game.

I am not convinced that current OpenGL shaders are any faster than D3D shaders which use a intermediate language. (infact, OpenGL might even be slower as they don’t have much time to optimize)

I also think the next generation of graphics should focus on accuracy and predictability, (I get frustrated with GLSL’s flakiness sometimes between vendors) and a common intermediate language would go a long way towards that.

It will be interesting to see what Long Peaks does…

On my current project, (not OpenGL based) if we had to compile the shaders on startup, it would take literally hours to start. (LOTS of shader combinations)

Same problem here. I use 218 SM2 combinations. Takes almost 10s to initialize. The offline DX9 version of my program initializes in 0.1 seconds.

    Even a "binary blob" solution would not really be a good fit. Imagine seeing "Please wait, compiling shaders" for a few hours when you first load the game.

Not very good idea, indeed… That’s where an offline compiler can help.

I am not convinced that current OpenGL shaders are any faster than D3D shaders which use a intermediate language. (infact, OpenGL might even be slower as they don’t have much time to optimize)

In all the programs I did, the DX9 shaders ( with almost the same shader code) were a 200-500% faster than the GLSL ones. Using a profiler discovered that OGL is way faster than DX9 in the draw calls. Using a shader profiler I could see the problem are, in fact, the shaders. So, if you think GLSL runtime-compiled can optimize better something is wrong there ( or perhaps is the lack of half precission in the GLSL model what slows )

I also think the next generation of graphics should focus on accuracy and predictability, (I get frustrated with GLSL’s flakiness sometimes between vendors) and a common intermediate language would go a long way towards that.

Same here. I am tired of getting stupid warnings and errors with well syntax written GLSL that fails unexpectly with some graphic drivers. I have never got the same problem with DX9 btw.

It will be interesting to see what Long Peaks

Well I heard Open ES was getting an offline compiler…
Also, a bird told me they are putting there a system like DX9/DX10 coming with glFX… wait and see!

Originally posted by Korval:
Without those semantics (function calls, structs, etc), the final compiler cannot determine when to inline and when not to, which is something that can change based on the hardware.

Well, the ps3.0 included a “call” asm instruction. In the practice inlines all which can be bad…
I wish new shader models could include some kind of C stack to call recursively the functions.

About the structs are just a linear data pointer with the operator applied. Perhaps the intermediate language could conserve the struct “structure” as you mention, but the members should be “obfuscated” to hide the curious people the meaning. Something like:

Original struct:

        struct VSIn
        {
           vec3 pos, normal;
           vec2 uv;
        };

Obfuscated struct:

        struct VSIn
        {
           vec3 a123, b5087;
           vec2 c38024;
        };

We could use some kind of “reflection” in the shaders taking into consideration the obfuscation system. The code obfuscator should replace all the VSIn.pos by the a123 in the code ( which is not a problem really and does not affect the speed of the shader ).

For the uniforms(like uniform vec3 g_vLightPos) the offline compiler need to encrypt it into “AHDKDSF” and tell us the constant index assigned in a table. This is what Dx9 does when you compile an effect:

        vertexshader = 
                    asm {
                    //
                    // Generated by Microsoft (R) D3DX9 Shader Compiler 
                    //
                    // Parameters:
                    //
                    //   float4x4 g_mNegZInvCamProjTM;
                    //   float4 g_vCamPos;
                    //
                    //
                    // Registers:
                    //
                    //   Name                Reg   Size
                    //   ------------------- ----- ----
                    //   g_mNegZInvCamProjTM c0       4
                    //   g_vCamPos           c4       1
                    //

See, it assigned the g_vCamPos to the c4.
It also saves some kind of obfuscated “reflection” information into the compiled shader… so we could do after:

m_pFX->SetVector("g_vCamPos",1.0f,-17.0f,-32.0f);

which internally goes to the reflection encrypted table, searches the “g_vCamPos”, gets the c4 register as return value and assigns it. Also is possible to get a “g_vCamPos” handle to avoid string map fetch.

Btw, notice also I just wanted to put the PS3.0 as example of code obfuscation, not as a good intermediate language example!

ps: What’s wrong with the edit system in forums? I edit the post and puts old data into the textbox!

Offline compilation would/will require a standard for an intermediate language and/or byte code, as some have deduced.

Caching or read-back of compiled shaders at application runtime doesn’t require such standardization, however it would not speed up the first-run case. On the other hand, read-back might also allow for capturing forms of shaders that are closer to the final hardware form, whereas vendor-neutral offline compilation would not.

Both approaches have their own theoretical advantages.

The blob approach does not lose any information that was present inside the original source code so the resulting microcode might be better regardless of GPU architecture and, once created, the unblobing might be much faster than compilation from the intermediate language.

The DX approach has advantage in the first run scenario if huge number of shaders is involved and if that compilation can not be done in parallel during installation or if driver/hw changes in way that would invalidate the blob.

The issue with the offline complier doing better optimization can be “solved” by providing compiler with information about how much thorough it should be. Of course at the price of increased compilation time.

Then there is implementational and practical side. Creating good optimizing compiler needs time and experience. Microsoft created compilers for many years, the IHV probably have not. Also there is the price/performance thing. If IHV creates bad DX driver, it has serious troubles. If it creates bad OGL driver, then the “only thing that matters” is how fast it can run latest Carmack’s engine. Given that, the IHV will likely put most of its work to the DX driver + some Carmack specific optimizations. Both things might be the reason why, even with the theoretical advantage, the GLSL operated worse for santyhamer.

I once tried to work around problem with long shader compilation time (I have several thousand shader combinations so it takes several minutes to initialize) by compiling shaders into assembly (using CG compiler). The shaders loaded much faster however they did not work entirely correctly on ATI hw and because I had not time to find where that problem is, I currently abandoned that path. This shows that even the presence of the intermediate language does not guarantee flawless operation if most of the IHV work is put somewhere else.

Even a “binary blob” solution would not really be a good fit. Imagine seeing “Please wait, compiling shaders” for a few hours when you first load the game.
Two things.

One, the binary blobs can be created during installation. So, unless driver revisions force a recompile, running the program should be relatively fast.

Two, if you honestly have enough shaders that it takes hours to compile… you’ve got problems. If it takes longer to compile your shaders than it does your game’s source code from scratch, that’s an issue I have no sympathy for you over. You clearly have too many shaders and should look to that.

I mean, a few minutes for a large number of shaders, I can understand. But over an hour on a reasonably performant computer? That sounds very much like you’re doing something pathological with the API.

Both things might be the reason why, even with the theoretical advantage, the GLSL operated worse for santyhamer.
Undoubtedly. However, that won’t get better by forcing IHVs to support two languages in their drivers. What will help is Longs Peak, and the comparative ease with which implementations can be written under it. That should allow IHVs to focus more resources on their glslang compilers and less on basic development and bug fixing.

Originally posted by Komat:
I once tried to work around problem with long shader compilation time (I have several thousand shader combinations so it takes several minutes to initialize) by compiling shaders into assembly (using CG compiler). The shaders loaded much faster however they did not work entirely correctly on ATI hw and because I had not time to find where that problem is
Cg + ATI = bad :stuck_out_tongue: Cg is NVIDIA exclusive. Don’t expect to work in non-NVIDIA cards…
If works is due to 1) your amazing skills, 2) luck :stuck_out_tongue:

I think the GLSL is slower in my program just due to 1) GLSL half is a reserved keyword. I optimized a lot the DX9 shaders using half precision where applicable(can’t turn on the partial precision for the entire shader because crashes). 2) The IHV GLSL compiler is not as mature/optimized as the current DX9 shader compiler(fxc.exe) or the drivers not well tuned… 3)Other programmers are having here the same speed issues, so is not the code. The profiler is very clear… the bottle- neck is on the shader+driver( and the glsl log told me all is running in HW so…)

Whatever it is, anybody could see my shaders hacking the OpenGl dll(again, encryption is not enough). If you don’t like the intermediate language then I would consider the obfuscation thing… won’t affect the speed and will enhace a lot the IP protection.

About the binary blob thing is ok if you don’t have a lot of shaders to compile. Personally I would require a SM3.0 and use dynamic branching for the shader options instead of SM2.0+link separate shaders.

Finally, although GLSL dynamic compilation is in theory better to optimize, you can see clearly the DX9 offline model is, today, much faster in practice. It also offers a great IP protection. Also has the effect system which is >>>>>> GLSL at the moment. I hope Evans/Peaks will solve this.

Originally posted by Korval:

I mean, a few minutes for a large number of shaders, I can understand. But over an hour on a reasonably performant computer?

The number of shaders can explode very fast very easily. For example some shaders I am using are always calculating several special effects and ignoring theirs result most of the time simply because I can not afford to quadruple number of shaders and application startup time to have variants for all combinations of enabled effects. (ADDED: I do not trust the GLSL compiler with uniform based branching.)

That should allow IHVs to focus more resources on their glslang compilers and less on basic development and bug fixing.
Unless they decide that is more cost effective to use free resources to improve DX drivers.

will enhace a lot the IP protection.
Right. Because nobody has deprocessed shaders from assembly forms before :rolleyes:

Give up the ridiculous idea that your shader IP is safe. It’s not, in any way, shape, or form. If someone wants your “precious” shader logic, it’s going to be theirs. Trying to fight it is futile.

I hope Evans/Peaks will solve this.
As I said, the ship has sailed on the whole “intermediate language” issue. You’re 3 years too late to argue the point.

As for an effect system, you can pretty much forget about it. There’s COLLADA FX, but it will never be something that gets incorporated into OpenGL itself.

Unless they decide that is more cost effective to use free resources to improve DX drivers.
Yes, but they could decide to do that anyway. Thus it makes no difference one way or the other.