Support for precompiled shaders not needed !

From time to time people say that they want support for precompiled shaders in the OpenGL API. The main point for this they bring forth is reducing loading times for their applications.

I think this advantage of precompiled shaders can be had without support for it in the API: It can be inmplemented transparently in the driver.

When the application passes a shader source to be compiled to the driver the driver calculates a cryptographic hash of the source and relevant settings such a some hints. The driver then looks into it’s cache of compiled shaders for one with the same hash value. If there is one, use it - no need to compile the source again. If there is none, compile it, store it in the cache along with the hash value.

There is no additional work for the application programmers to do. They will automatically receive the benefit of faster application startup. The driver knows best when it has to recompile (e.g. after a driver update). The cache size is expected to be rather small, and today’s hard-disks are large.

The cryptographic hash ensures that no application will intentionally or accidently use shaders from another application.

Let the driver do the compiled shader caching instead of burdening API and application with it!

Philipp

If I am not mistaken, this was suggested before…

Sorry, I must have missed it (there are times when I do not check these forums regularly). I’ve just read the recent threads and noticed that people still want API support for precompiled shaders to reduce startup times.

You could store the full shader source along with the hash value to resolve hash collisions(same hash value for different shaders, unlikely but not impossible to happen).

However, that does not solve the point that the source is human readible and not “copy-protected” at all.
(Don’t think this is an issue for graphical effect shaders, but for GPGPU algorithms it might be).

A 256 bit cryptographic hash should be sufficient.

However, that does not solve the point that the source is human readible and not “copy-protected” at all.
(Don’t think this is an issue for graphical effect shaders, but for GPGPU algorithms it might be).

Well, someone who uses some mechanism to intercept calls to the GL to get at the source would probably use some mechanism for decompilation to GL_ARB_*_program, too.
Furthermore, this is a problem that affects a much smalle part of GL users than the speed issue. Nearly everyone wants speed. A few want code obfuscation.

Philipp

Thanks, but no. OpenGL drivers are horrible enough already. Putting more stuff that has to be done in the driver, so they have even more places to screw up?

Nah. The driver should do minimal amount of stuff that is required to interface with the hardware. Yes, GLSL parser should be outside of the driver. Yes, legacy API support (display lists, selection, etc.) should be outside of the driver. Yes, precompiling shaders should be outside of the driver. In my ideal world at least :slight_smile:

The new OpenCL spec has clGetProgramInfo to obtain a compiled binary, and clCreateProgramWithBinary to load it again, so OpenGL 3.1 will almost certainly get something similar as both API’s are going to be implimented in the GPU drivers and will share a lot of internal driver code with each other.

Interestingly it also has clUnloadCompiler, which suggests that the compiler will usually be a separate DLL that is only loaded by the driver when it needs to compile some source code.

Thanks, but no. OpenGL drivers are horrible enough already. Putting more stuff that has to be done in the driver, so they have even more places to screw up?

Nah. The driver should do minimal amount of stuff that is required to interface with the hardware. Yes, GLSL parser should be outside of the driver. Yes, legacy API support (display lists, selection, etc.) should be outside of the driver. Yes, precompiling shaders should be outside of the driver. In my ideal world at least :slight_smile: [/QUOTE]

In my experience the problems in OpenGL drivers do not lie in the upper layers and in software stuff like the chaching of shaders I proposed.
Most driver problems seem to be rooted in not sufficiently documented hardware and hardware bugs.

Philipp

I don’t care who takes care of the caching (me or the driver), so long as the shaders upload quicker and that it is in some way specified in the OpenGL spec that this will be the case for a compliant driver. Like vbo/pbo’s give us this assurance (though not guarantee), for example.
The problem is clearly stated as “GLSL shaders are slow to upload - we need to expose an acceleration point to fix this”. It’s as much of a fact as that streaming textures was slow and pbo’s were needed to fix it.
Making it transparent to OpenGL makes it difficult to spec.
Also, I might want the first play through to be smooth, but I don’t want to have to pre-compile at first run every-single-shader-combination-that-might-be-needed-depending-on-the-users-actions first.

Probably the most realiable way to get shaders to upload quicker would be to have benchmarks (or important games though it’s a chicken and egg problem there) that change shaders more often (and modify uniforms and state often while shaders are active). Then caching shaders would translate to higher benchmark scores and vendors would be more interested in it.

Philipp

Agreed! If it were, we could toss the load/compile in a background thread on a separate core, like we can now with Cg.

Why can’t you do it now by creating another context and sharing the gl programs? This should let you load compile link in a separate thread.

I actually tried to do exactly this and it doesn’t work as expected. I did a test application which created 2 threads and 2 shared contexts. One thread was rendering while the other was compiling shaders. It does work (using glShareList), but linking in one context completely blocks the operations of the other context used for rendering. There seems to be some kind of global mutex in the driver.

Precompiled opaque shader binaries are a simple solution that would have been easy to implement and use if any implementors had deemed it worthwhile. Often programs have some existing pipeline state baked into them that affect optimizations, so it’s difficult to truly eliminate compile/link overhead without also sacrificing JIT performance that we are probably implicitly relying on.

Hopefully by the time we kick this issue around for a few more years the problem will simply have gone away. Perhaps better concurrency in compiler/driver architectures alone will save the day in the end.