A short while ago there was a thread on measuring performance of different shader subroutines.
I have been having a look at nVidia NVPerfKit sdk. It appears to have counters that give the instruction rate by shader stage; ie vertex, tesselation, geometry and fragment.

I have not used this. Has anyone else looked at this data.