AMD glLinkProgram Performance Tips?
I am having trouble with glLinkProgram on AMD drivers (both Windows and Linux). The compilation time is absolutely absurd, and is killing my game engine. I am seeing 5 to 10 seconds for single shader compiles, where on NV drivers it is immeasurable. The shaders in question are generated by the engine and are quite math-heavy.
Does anyone have general tips / advice for speeding up shader compilation on AMD? For example, should I try hand-unrolling heavily nested function calls, perhaps hand-unrolling, loops, etc? Given my lack of knowledge about shader compilers, I don't really know how to proceed with making my shaders more compiler-friendly.
Thanks for any tips!
Hmm, I don't know about the GLSL compilers, but those for C/C++ can sometimes run into performance issues with very large functions (blocks really) that have many variables, due to the use of algorithms that are quadratic in the number of instructions or variables for example.
Haven't used AMD cards in a while, but 5-10 secs sounds outrageously long to me. Are you using a debug context? You could try shader binaries (if your hardware supports them) that you cache on disk, that way you only pay the link time penalty once.
Sometimes drivers trick you, as even when the compilation looks virtually instant, it could be because the driver just transmitted the actual compilation job to a separate thread and thus won't block your code to continue until the time when you actually try to use the shader (thus the latency didn't disappear, but just got delayed).
So, at first, I would make sure you measure compilation time properly. In order to do so, do the following:
1. Compile your shaders
2. Render some simple primitive using the shaders (e.g. a point)
3. Use glReadPixels or other mechanism to make sure the rendering actually happened and not delayed as well
4. Measure the time of all the 3 steps, it will give you a better estimate on how much time the compilation actually required.
Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/
Thanks aqnuep, but the shaders are used immediately to generate geometry, so I am quite sure of the compilation time. On NV they are compiled and able to start displaying the geometry with virtually no delay, so I do think it's actually the AMD compiler But it is very surprising to me that the difference is so dramatic...
Originally Posted by aqnuep
Unless you are nuking the NV-driver-internal on-disk precompiled GL shader cache before doing this test, don't be so sure.
Originally Posted by xeonxt
If you've run with that shader before, it's probably just loading a precompiled version off-disk (or more likely, from a memory cache of that on-disk data thanks to the OS caching of disk accesses, so it's blindingly fast), not actually compiling it on-the-fly. There are precompiled caches for OpenCL/CUDA kernels as well.
On Linux, the default paths for these caches are: $HOME/.nv/GLCache and $HOME/.nv/ComputeCache, respectively.
On Windows, %APPDATA%\NVIDIA\GLCache and %APPDATA%\NVIDIA\ComputeCache, respectively.
Websearch these paths for hits. For more info, see:
* NVIDIA's OpenGL Shader Disk Cache For Linux
* NVidia Linux Driver README - Chapter 11 (see the bottom section here)
* CUDA Pro Tip: Understand Fat Binaries and JIT Caching
* NVidia driver-internal on-disk shader cache (and draw-time shader recompilation)
I don't know if AMD has a similar mechanism in-place. Check their driver docs.
Last edited by Dark Photon; 06-22-2013 at 10:06 AM.