PDA

View Full Version : AMD glLinkProgram Performance Tips?



xeonxt
06-20-2013, 05:28 PM
Hi,

I am having trouble with glLinkProgram on AMD drivers (both Windows and Linux). The compilation time is absolutely absurd, and is killing my game engine. I am seeing 5 to 10 seconds for single shader compiles, where on NV drivers it is immeasurable. The shaders in question are generated by the engine and are quite math-heavy.

Does anyone have general tips / advice for speeding up shader compilation on AMD? For example, should I try hand-unrolling heavily nested function calls, perhaps hand-unrolling, loops, etc? Given my lack of knowledge about shader compilers, I don't really know how to proceed with making my shaders more compiler-friendly.

Thanks for any tips!

carsten neumann
06-20-2013, 09:40 PM
Hmm, I don't know about the GLSL compilers, but those for C/C++ can sometimes run into performance issues with very large functions (blocks really) that have many variables, due to the use of algorithms that are quadratic in the number of instructions or variables for example.

Haven't used AMD cards in a while, but 5-10 secs sounds outrageously long to me. Are you using a debug context? You could try shader binaries (if your hardware supports them) that you cache on disk, that way you only pay the link time penalty once.

xeonxt
06-21-2013, 02:07 PM
Hmm, I don't know about the GLSL compilers, but those for C/C++ can sometimes run into performance issues with very large functions (blocks really) that have many variables, due to the use of algorithms that are quadratic in the number of instructions or variables for example.

Haven't used AMD cards in a while, but 5-10 secs sounds outrageously long to me. Are you using a debug context? You could try shader binaries (if your hardware supports them) that you cache on disk, that way you only pay the link time penalty once.

Thanks very much Carsten, I was completely unaware of the GL binary facilities! Wish I had known about these sooner :) That will certainly help. Still open to compilation insights if anyone has them, in the mean time I am sure binaries will lift a lot of the load.

And no, it's not a debug context. It's pretty terrible because, as I said, on NV drivers it's virtually instant...ouch, come on now AMD... :dejection:

aqnuep
06-21-2013, 04:04 PM
Sometimes drivers trick you, as even when the compilation looks virtually instant, it could be because the driver just transmitted the actual compilation job to a separate thread and thus won't block your code to continue until the time when you actually try to use the shader (thus the latency didn't disappear, but just got delayed).

So, at first, I would make sure you measure compilation time properly. In order to do so, do the following:
1. Compile your shaders
2. Render some simple primitive using the shaders (e.g. a point)
3. Use glReadPixels or other mechanism to make sure the rendering actually happened and not delayed as well
4. Measure the time of all the 3 steps, it will give you a better estimate on how much time the compilation actually required.

xeonxt
06-21-2013, 08:35 PM
Sometimes drivers trick you, as even when the compilation looks virtually instant, it could be because the driver just transmitted the actual compilation job to a separate thread and thus won't block your code to continue until the time when you actually try to use the shader (thus the latency didn't disappear, but just got delayed).

So, at first, I would make sure you measure compilation time properly. In order to do so, do the following:
1. Compile your shaders
2. Render some simple primitive using the shaders (e.g. a point)
3. Use glReadPixels or other mechanism to make sure the rendering actually happened and not delayed as well
4. Measure the time of all the 3 steps, it will give you a better estimate on how much time the compilation actually required.

Thanks aqnuep, but the shaders are used immediately to generate geometry, so I am quite sure of the compilation time. On NV they are compiled and able to start displaying the geometry with virtually no delay, so I do think it's actually the AMD compiler :( But it is very surprising to me that the difference is so dramatic...

Dark Photon
06-22-2013, 10:59 AM
...the shaders are used immediately to generate geometry, so I am quite sure of the compilation time. On NV they are compiled and able to start displaying the geometry with virtually no delay

Unless you are nuking the NV-driver-internal on-disk precompiled GL shader cache before doing this test, don't be so sure.

If you've run with that shader before, it's probably just loading a precompiled version off-disk (or more likely, from a memory cache of that on-disk data thanks to the OS caching of disk accesses, so it's blindingly fast), not actually compiling it on-the-fly. There are precompiled caches for OpenCL/CUDA kernels as well.

On Linux, the default paths for these caches are: $HOME/.nv/GLCache and $HOME/.nv/ComputeCache, respectively.

On Windows, %APPDATA%\NVIDIA\GLCache and %APPDATA%\NVIDIA\ComputeCache, respectively.

Websearch these paths for hits. For more info, see:

* NVIDIA's OpenGL Shader Disk Cache For Linux (http://www.phoronix.com/scan.php?page=news_item&px=MTAwNDk)
* NVidia Linux Driver README - Chapter 11 (see the bottom section here) (ftp://download.nvidia.com/XFree86/Linux-x86/304.32/README/openglenvvariables.html)
* CUDA Pro Tip: Understand Fat Binaries and JIT Caching (https://developer.nvidia.com/content/cuda-pro-tip-understand-fat-binaries-and-jit-caching)
* NVidia driver-internal on-disk shader cache (and draw-time shader recompilation) (http://www.opengl.org/discussion_boards/showthread.php/181986-NVidia-driver-internal-on-disk-shader-cache-%28and-draw-time-shader-recompilation%29?p=1251853)

I don't know if AMD has a similar mechanism in-place. Check their driver docs.