A short while ago there was a thread on measuring performance of different shader subroutines.
I have been having a look at nVidia NVPerfKit sdk. It appears to have counters that give the instruction rate by shader stage; ie vertex, tesselation, geometry and fragment.
I have not used this. Has anyone else looked at this data.
I'm using NV PerfKit for a long time, but only the first release (called PerfSDK), that does not support Fermi/Kepler. Version 2.2 was released a year ago, but I had a problem to interpret values retrieved from the counters, so I stopped experimenting.
At the end of April, NV released ver. 188.8.131.5223. The link was valid just one day, and unfortunately I didn't download it. Obviously there is some problem.
As you can read from the user guide, *_shader_instruction_rate retrieves "the % of all shader instructions seen on the first SM unit that were executing defined shader(s)". How do you plan to use it in measuring performance of different shader routines?
I was wondering if I selected a different subroutine using dynamic subroutines would a change in instruction rate correspond to a better (worse) implementation.
I have started using version 2.2. It seems to return 0 for the "OGL ..." variables on my Quadro 5000 (although it finds the variable name) but they are ok on my geForce card; the gpu values seem ok on both.
Another solution, which is less complicated, might be to grab a copy of nVidia's nvEmulate. You can set it to print out the assembly instructions for your shader. If you retool the shader and end up with fewer instructions then you probably have a more efficient shader. I suspect that this assumption will not always hold true, but likely will for most cases.