GLSL inline assembly

What about adding inline assembly in GLSL?
In that way, we no longer needs GL_ARB_vertex_program and GL_ARB_fragment program and there extensions.

since every chip executes shaders in another way you would still need an translator between, this also would’nt result in great performance improvements…

The ARB doesn’t want to support low level shaders.
Besides, why mix the two?

But in the long run, you may need it. For example, the CPUs have a lot of programming flexibility, but the modern compilers still support inline assembly for some extended instructions like MMX, SSE, SSE2, 3DNOW and so on. Which can’t be achieved using common high-level language. The assembly language itself may not be defined by the ARB, it can be defined by the hardware vendors. The ARB only gives an “asm” keyword for the GLSL specification. (The current GLSL specification says “asm” is a reserved keyword)

Look at what Nvidia is doing. They have their own GPU features and they expose it in GLSL with their Cg extensions.

But in the long run, you may need it. For example, the CPUs have a lot of programming flexibility, but the modern compilers still support inline assembly for some extended instructions like MMX, SSE, SSE2, 3DNOW and so on. Which can’t be achieved using

Hello - shaders are quite different than normal CPU executables.
Beside that look what this made to the x86 architecure - with a new, clear design we could get twice as much performance out of the same amount silizium used.

If opengl would specify low-level assembler GPU designers could not redefine the way their GPUs work, they alsways would need to stay compatible -> compatibility kills inovation!

With GLSL they can just recompile your GLSL-code to the new gpu-assembly and everything works fine again at double speed.

lg Clemens

If opengl would specify low-level assembler GPU designers could not redefine the way their GPUs work, they alsways would need to stay compatible
Not true. Even ARB_vertex/fragment_programs are “compiled” into hardware instructions. Nobody is forcing them to actually have hardware that mirrors the assembly, and in several cases (3DLabs, for instance), they do not have what the assembly language looks like.

In any case, the primary impetus for having inline assembly are dealing with platform-specific performance tweaking. And this is usually because, quite frankly, glslang compilers suck at optimizing. As long as humans can do a better job than the compiler, there is a need for inline assembly (or assembly in general).

compatibility kills inovation!
That’s not true at all. Look how innovative Intel and AMD have become in disguising the true nature of their hardware, and in rapidly translating x86 instructiosn into their real hardware instructions. Do you think it’s easy to come up with a high-performing chip that processes x86 instructions?

Granted, it may not be innovative in the direction you want to go, but it is still innovative :wink:

and in several cases (3DLabs, for instance), they do not have what the assembly language looks like.
They could but they decided to concentrate on GLSL, so they have a good compiler.

Hi again!

In any case, the primary impetus for having inline assembly are dealing with platform-specific performance tweaking. And this is usually because, quite frankly, glslang compilers suck at optimizing. As long as humans can do a better job than the compiler, there is a need for inline assembly (or assembly in general).

Well, then its an implementation problem, not a problem by design :wink:
Well, of course I understand your idea, and maybe we had an impedance-mismatch cause I ment hardware-assembler and you ment something like assembler-like instructions that are translated to gpu-instructions at runtime, right?
Maybe I misunderstood you at all…

That’s not true at all. Look how innovative Intel and AMD have become in disguising the true nature of their hardware, and in rapidly translating x86 instructiosn into their real hardware instructions. Do you think it’s easy to come up with a high-performing chip that processes x86 instructions?

Well, the instructions-decoders and low-level code optimizers require a large part of their die, leading to higher costs and higher (!!) power consumption.
Another fact is that thanks to x86 compatibility you cannot guarantee how your code will be optimized.
You create an assembly optimized for Pentium3 (short load/store times, maximum possible paralell execution), run it on P4 and it completly sucks, although the CPU looks completly equal to the program. The same was true for PMMX->P2 (==P3).
With a more advanced assembly set like the Itanium has such operations would be much better to maintain.

However, peace on earth :wink:

lg Clemens

Well, then its an implementation problem, not a problem by design :wink:
I don’t quite accept that. Many people, myself included, were concerned that glslang would be difficult to implement back when it was being discussed. Apparently our concerns were justified, for here we are.