GLSL inline assembly

N64Marin · April 12, 2005, 9:45pm

What about adding inline assembly in GLSL?
In that way, we no longer needs GL_ARB_vertex_program and GL_ARB_fragment program and there extensions.

Linuxhippy · April 14, 2005, 5:24am

since every chip executes shaders in another way you would still need an translator between, this also would’nt result in great performance improvements…

system · April 14, 2005, 9:11am

The ARB doesn’t want to support low level shaders.
Besides, why mix the two?

N64Marin · April 14, 2005, 11:16pm

But in the long run, you may need it. For example, the CPUs have a lot of programming flexibility, but the modern compilers still support inline assembly for some extended instructions like MMX, SSE, SSE2, 3DNOW and so on. Which can’t be achieved using common high-level language. The assembly language itself may not be defined by the ARB, it can be defined by the hardware vendors. The ARB only gives an “asm” keyword for the GLSL specification. (The current GLSL specification says “asm” is a reserved keyword)

system · April 15, 2005, 6:10am

Look at what Nvidia is doing. They have their own GPU features and they expose it in GLSL with their Cg extensions.

Linuxhippy · April 19, 2005, 12:02am

But in the long run, you may need it. For example, the CPUs have a lot of programming flexibility, but the modern compilers still support inline assembly for some extended instructions like MMX, SSE, SSE2, 3DNOW and so on. Which can’t be achieved using

Hello - shaders are quite different than normal CPU executables.
Beside that look what this made to the x86 architecure - with a new, clear design we could get twice as much performance out of the same amount silizium used.

If opengl would specify low-level assembler GPU designers could not redefine the way their GPUs work, they alsways would need to stay compatible -> compatibility kills inovation!

With GLSL they can just recompile your GLSL-code to the new gpu-assembly and everything works fine again at double speed.

lg Clemens

Korval · April 19, 2005, 12:29pm

If opengl would specify low-level assembler GPU designers could not redefine the way their GPUs work, they alsways would need to stay compatible
Not true. Even ARB_vertex/fragment_programs are “compiled” into hardware instructions. Nobody is forcing them to actually have hardware that mirrors the assembly, and in several cases (3DLabs, for instance), they do not have what the assembly language looks like.

In any case, the primary impetus for having inline assembly are dealing with platform-specific performance tweaking. And this is usually because, quite frankly, glslang compilers suck at optimizing. As long as humans can do a better job than the compiler, there is a need for inline assembly (or assembly in general).

compatibility kills inovation!
That’s not true at all. Look how innovative Intel and AMD have become in disguising the true nature of their hardware, and in rapidly translating x86 instructiosn into their real hardware instructions. Do you think it’s easy to come up with a high-performing chip that processes x86 instructions?

Granted, it may not be innovative in the direction you want to go, but it is still innovative

system · April 20, 2005, 3:44am

and in several cases (3DLabs, for instance), they do not have what the assembly language looks like.
They could but they decided to concentrate on GLSL, so they have a good compiler.

Linuxhippy · April 20, 2005, 9:44am

Hi again!

In any case, the primary impetus for having inline assembly are dealing with platform-specific performance tweaking. And this is usually because, quite frankly, glslang compilers suck at optimizing. As long as humans can do a better job than the compiler, there is a need for inline assembly (or assembly in general).

Well, then its an implementation problem, not a problem by design
Well, of course I understand your idea, and maybe we had an impedance-mismatch cause I ment hardware-assembler and you ment something like assembler-like instructions that are translated to gpu-instructions at runtime, right?
Maybe I misunderstood you at all…

That’s not true at all. Look how innovative Intel and AMD have become in disguising the true nature of their hardware, and in rapidly translating x86 instructiosn into their real hardware instructions. Do you think it’s easy to come up with a high-performing chip that processes x86 instructions?

Well, the instructions-decoders and low-level code optimizers require a large part of their die, leading to higher costs and higher (!!) power consumption.
Another fact is that thanks to x86 compatibility you cannot guarantee how your code will be optimized.
You create an assembly optimized for Pentium3 (short load/store times, maximum possible paralell execution), run it on P4 and it completly sucks, although the CPU looks completly equal to the program. The same was true for PMMX->P2 (==P3).
With a more advanced assembly set like the Itanium has such operations would be much better to maintain.

However, peace on earth

lg Clemens

Korval · April 20, 2005, 10:01am

Well, then its an implementation problem, not a problem by design
I don’t quite accept that. Many people, myself included, were concerned that glslang would be difficult to implement back when it was being discussed. Apparently our concerns were justified, for here we are.