NVidia driver-internal on-disk shader cache (and draw-time shader recompilation)
( Will give some background in a second, but first my question(s): )
Has anyone developed some experience with this driver feature that they'd care to share as far as avoiding the driver "writing to its disk cache at render time"?
And do you know some examples of what GL state is in the "shader key" for an internal precompiled shader?
Even with prerendering with shaders/materials in an initialization phase, I'm still seeing some first-render spikes when rendering for the user on the first app run after building a shader for the first time (or when the shader cache has been nuked beforehand). Obviously, this is very undesirable for performance.
The NVidia driver-internal on-disk GL precompiled "shader cache" is something NVidia added back in the 290.03 beta drivers back on Oct 21, 2011 (Phoronix post) to speed up subsequent compilation and rendering with that same shader.
For direct rendering contexts (the usual case), the driver by default writes precompiled shaders to a database off $HOME/.nv/GLCache/ (on Linux). There is an analog for this for OpenCL/CUDA kernels at $HOME/.nv/ComputeCache/. These paths as well as the enabled state of these caches can be changed by various means. The driver appears to store the compiled assembly code along with some interface uniforms/attributes and such in text format in this cache. Having the cache enabled, present, and up-to-date on startup has positive performance benefits for second and subsequent runs of an application.
The trick is, figuring out how to prod/trick the driver so that it offloads all the "shader compiling/optimization" to startup/init time and doesn't do any of this at render time. More on that:
I'd thought until recently that the whole "dynamic recompilation of shaders at render time" business was a distant memory left over from GeForce 7 days due to hardware architecture limitations on older GPUs. Not so! Come to find out, even on very recent top-of-the-line GPUs, the driver is still dynamically recompiling shaders at render time, and not only that, writing to disk (the shader cache) in such cases. This is of course very bad for performance and something to be avoided.
How do I know?
- Nuke the on-disk shader cache
- Run GL app in a DEBUG context, with a glDebugMessageCallbackARB() callback plugged in. Print out everything.
On this run (where it's having to rebuild the on-disk shader cache from scratch) at application render time you may see one or more PERFORMANCE warnings about:
"Fragment Shader is going to be recompiled because the shader key based on GL state mismatches"
(Though no hints given as to what GL state is in the "shader key".) Not only that, but the modification timestamp on the NV shader cache disk files indicates that these files are not only being written during startup (shader compilation occurs here) but after startup during render time in front of the user.
These seem to correspond to the occurance of first-render spikes on the first run with a new shader (or a removed shader cache). I've seen a lot of "what is this PERF warning and how do I fix it" posts on the net, but no decent responses as a resolution -- most just ignore it because there's no concrete suggestion provided. Removing the shader cache before startup at least makes the appearance of this warning consistent however.
Now by prerendering with shader/state combinations before "render time in front of the user", I can move the occurance of this warning to an "init" phase. However, I'm still seeing some first-render frame spikes when rendering for the user on the first run, which do not reappear on subsequent app runs (suggests state persistence, ala the shader cache which makes me suspect it, but that's just a guess at this point).
So before I dig further on this, I thought I'd check with the group, offer what I've found out so far, and see if anyone else has additional info on :
- avoiding "driver on-disk shader cache writes" at render time, or
- getting rid of "dynamic recompilation of shaders" at render time
scenarios with the NVidia drivers.
Last edited by Dark Photon; 06-18-2013 at 04:26 AM.
I'm also having trouble with this. If I compile the shaders at startup I don't catch all the states that's probably in the "shader key" and I get bad performance.
Even if I let the program run a very long time my shaders don't recompile either and I'm left with some badly optimized shaders running at half speed.
A maybe related problem occured when I switched to 320.18 driver and the shader cache probably needed to be recompiled(guessing).
Then my program ran at 25% speed. I tried to run the program several times to be sure that the cache had been updated, but it didn't.
When I uninstalled the driver and installed it again everything worked normally.
So I'm also interested how to manage this situation.
More on this after some deeper digging.
Originally Posted by Dark Photon
These first-render spikes seem to always correspond the NV driver writing to it's on-disk shader cache at draw time.
I have been unable to figure out how to "trick" the NV driver into moving these shader cache disk writes into a pre-render phase during startup (merely pre-rendering with shaders and textures doesn't cut it) -- partly as I have nothing to go on to help determine why it's writing the shader cache at render time in the first place (i.e. what made it decide it needed to). I would think if nothing else these disk writes would generate PERFORMANCE warnings in a GL DEBUG context (if anything should, these should!) but there's nothing generated right now when when the driver-internal on-disk shader cache is written, which is when the spikes occur.
So I'd like to make the following requests to NVidia:
- Please provide a PERFORMANCE warning in a GL DEBUG context when the on-disk shader cache is either read or written.
- As a bonus, please give the developer a good hint as to "why" the shader cache was read/written -- in other words, what they need to do to avoid that cache read/write from happening there in the future (e.g. what GL state change prompted this very expensive operation).
- Provide a GL enable which will allow an application to suspend and reallow writes to the on-disk shader cache during perf-critical sections so we can get these in-driver disk writes out of our realtime rendering loops (for instance glEnable/glDisable( GL_SHADER_CACHE_WRITING_NV ).
Thanks! If you have any questions or need more detail, please let me know.
Last edited by Dark Photon; 06-24-2013 at 05:32 PM.
Originally Posted by Dark Photon
I would tend to assume that it's the compilation which takes the time rather than the write, and compilation probably can't be postponed (at least, not without drastically altering the way the driver works).
Originally Posted by Dark Photon
As for what triggers recompilation, I can only guess. But my guesses would be any "qualitative" change to a uniform variable risks the shader being re-compiled with the uniform as a constant. Changes to the dimensions, format or filter/border modes of a bound texture would also be candidates.