Sure. While I appreciate that there are generally fewer compiler related bugs in DX due to the intermediate language it’s also the case that trying to optimize in DX is a bigger pain in the butt when you have two levels of compilers to go through before you end up and the final hardware code. When the HLSL compiler does the wrong thing, you’re SOL, and no driver update will sort the issue out. And Microsoft generally doesn’t have the same incentive to improve their compiler as IHVs have.
I think it is a bit unfair to compare compiler bugs that prevent your app from running at all, against bugs that prevent maximum speed.
At least when the HLSL compiler does the “wrong thing” you can view the output and see what is going on. Infact, I recall reading some slides by you where you suggest this.
It’s not like bugs that prevent your app from running at all doesn’t exist with HLSL. Like for instance that HLSL memory usage tends to baloon way out of proportion sometimes when you unroll loops, to the point where it compiles for minutes until it finally runs out of virtual memory.
The main difference is that those bugs are consistent because that stage is shared across IHVs. Yes, you can view the output of HLSL (assuming the “wrong thing” didn’t include crashing), but so can you with the hardware assembly using tools like GPU ShaderAnalyzer. The difference is that with HLSL you have to check both at the HLSL stage and the hardware stage and hope neither screwed up.
No offense Humus, but GPU ShaderAnalyzer is a joke. For example, try to copy/paste this shader:
#version 120
uniform vec4 kernelOffsets[32];
uniform sampler2D sceneTex;
void main()
{
vec4 scene = vec4(0.0);
float weight = 1.0;
for (int i = 0; i < 2; i++)
{
vec2 uv = gl_TexCoord[1].xy + kernelOffsets[i].xy;
scene += texture2D(sceneTex, uv);
weight++;
}
gl_FragColor = scene / weight;
}
And be amazed how at how even a Radeon HD 2900 fails to compile it.
Now reduce the size of kernelOffsets from 32 to 2, it works on the Radeon HD but not on a X1900. Yay!
The best is the answer of the GPU Tools support when contacted about the issue, replying that the result were what they were expecting because “looping and lookup tables aren’t supported in earlier generations of hardware” … I would have cried.