Effects of driver-sided program re-linking

thokra · April 3, 2017, 7:32am

Hello everyone,

this is an old chestnut, but since I’m currently working on the shader infrastructure parts of a rendering system, the question is if and when it may occur that programs are re-linked without user intervention and what happens to the program state. I’m pretty sure that the driver cannot incur what glLinkProgram can, i.e. re-assigning locations, default values (if any) and stuff and since there is no way to detect a driver-sided re-linking, I can logically assume that the state of the program remains intact.

What I’m missing is this sweet piece of normative language I so yearn for. I cannot find anything related to this in the specs or in any issues section of the initial GLSL related extensions. So, does anyone have something more reliable here?

Cheers!

Dark_Photon · April 3, 2017, 5:06pm

Just to make sure I understand, you’re talking about “shader patching”, right (sometimes called shader re-optimization)?

That is, the driver (way after glLinkProgram is called) occasionally going out-to-lunch when processing a draw call to go re-write, re-optimize, and/or re-link your shader for various reasons – to the detriment of your application’s performance.

As far as I’ve experienced and read, it occurs in the draw call. It’s only then that the driver knows for sure that you’re going to render with a particular shader+state combination, after which it will go bake some GPU+driver-specific subset of that state into the shader code (if it hasn’t already).

I’ve never seen a comprehensive list for any GPU for all of the GL state which may be “baked” into the shader (though you can find tidbits scattered around the net), but the kinds of things that trigger it are: varying one or more of 1) vertex attribute formats, 2) blend modes, 3) framebuffer formats, 4) write masks, 5) shader uniform values, and 6) texture formats (which ones depends on the GPU+driver). However, as far as I know, these don’t actually trigger shader patching until the referencing draw call.

I’ve hit this “shader patching” and the stuttering that it causes during rendering too, and it’s really annoying. Pre-rendering with your shaders (aka pre-warming) on startup can help, just like we have to do with textures to force the GPU to actually upload them to the GPU. The vendor’s implementation of an on-disk optimized shader cache can help as well, but only after the first use (which may come at a very bad time). But if there’s specific state X which (possibly unbeknowst to you) is baked into your shaders on a specific GPU+driver, and you’re using multiple permutations of X with the same linked shader program P – prepare for driver “shader patching” and the stuttering that goes with it. Depending on how your driver handles this case, it may work best to track these X permutations along with your program objects and have separate GL program objects for this case – identical as far as the GL interface, but not under-the-hood to the driver.

In my opinion, this is one area where GL seems to have fallen behind the curve with regards to GPU shaders and realtime (60Hz+) applications. It would help to have an extension where we could query what GL state will be baked into the shaders by the driver, and then on the app side manage creating separate programs for each such state permutation.

Vulkan and Metal seem to have solved this problem with pipelines.

If anyone has more information or pointers on driver shader patching (particularly for desktop GL drivers such as on nVidia, AMD, and Intel), please do follow up!

thokra · April 3, 2017, 11:33pm

Pretty much - I would call it whatever standard terminology had been bestowed upon said phenomenon, but alas. I was merely looking for guarantees that stuff happening under the hood will not have consequences I might not be able to properly track - which shouldn’t happen anyway, but I wanted it in cold print.

My question does not stem from me being weary of spikes in frame times - this is not a problem for our customers (delivery and the usual multi-platform hassles are of much greater concern). However, personally I’m of course interested in anything regarding this.

Up until now, the only time I’ve seen the driver create corresponding debug output was when using Nsight in conjunction with a GTX 1060 on Windows - for what reason is yet unclear to me. Anyway, thanks for the insight and hopefully we can assemble some more information. Probably worthwhile checking out Vulkan’s approach to this.

Dark_Photon · April 4, 2017, 6:15pm

You know, it dawned on me later today that there is something in OpenGL available through the GL API that could help with this (i.e. avoiding the gotcha’s associated with run-time shader patching). In fact, it looks like it may have been at least partly engineered to help avoid them. But I haven’t personally verified that it does yet.

And that is NVidia’s NV_command_list extension:

NVIDIA “Command-List”: Approaching Zero Driver Overhead" Presentation (6/2014)
NV_command_list - GL extension spec

Check out pg. 30-33 in the presentation which talks about “State Objects”. This is a similar concept to Vulkan’s “Pipeline Objects”. In both cases, these encapsulate a bunch of GL/Vulkan state along with the shader to be used. This in principle let’s the driver know in advance what permutation of a shader it’s going to need to generate and use.

And in the extension, you can even find a mention about shader patching and how it was considered when drafting this extension:

Searching the net on a combination of these terms (state object pipeline vulkan) turns up an interesting web page I’d forgotten I’d read:

How to make OpenGL usage Vulkan like (NVidia)

This goes on to describe how to avoid the gotchas associated with shader patching:

"How to make OpenGL usage Vulkan like (NVidia)":

Pre-validating a program under render conditions will cause the driver to generate the state-dependent version of the program. This can help to avoid later stuttering. One can achieve this by recreating the rendering state with important metrics (framebuffer configuration, multisampling state, vertex enable state…) and issuing a dummy draw-call that doesn’t actually trigger work, but triggers validation. By using the KHR_debug_output extension the NVIDIA driver will also trigger performance warnings when a program had to be recompiled because a different state was active when the program is used. While the process is not as clean as Vulkan’s pre-validation of pipeline objects, it can be an improvement over the general situation.

…

State Management

…
NV_command_list is fairly close to Vulkan in this regard as well. It provides a “StateObject” analog to Vulkan’s pipeline object. … This allows pre-validation of the state just like in Vulkan and then have much faster state transitions at render time.

Now that I’m back working with GL again, I need to use NV_command_list.

thokra · April 10, 2017, 3:57am

Thank you for pointing this out! I remember a cursory reading of this page, but frankly, I had forgotten about it. I’m gonna go through the material again and I’ll also finally need to really dive into Vulkan to get a different view.

If this extension were cross-vendor I’d be more than happy to use it. Still, if you can pin down your target to use NVIDIA hardware, sure - or one needs to implement multiple paths again. I remain reluctant to do the latter to exploit this stuff on the desktop because I’m currently struggling enough with a unified pipeline and architecture for proper cross-device support (i.e. desktop, mobile and web … expletive that expletive :dejection: ).

I’d also love to see the GL move to a consistent, completely bindless approach. If I’m not mistaken, some NV_bindless_* stuff is still superior, for instance GL_ARB_bindless_texture does not seem to permit samplers in shader storage blocks and forces you to employ uniform blocks with uniform buffer object backing - may not be an issue but it would sure be nice to entirely back shaders with shader storage buffers (IIRC, the original GDC AZDO presentation advised the latter). We can already map a lot of these techniques to existing functionality, but it seems we’re not quite there yet.

Maybe working with Vulkan will put things into perspective.