HLL vs HLSL

Many people have been arguing against or in favor of Cg. Matt said the GL2.0 shading language has no place in the core since it’s a High-Level language (and that’s against the low-level philosophy of OpenGL).

I would like to share my opinion: sure, Cg and GL2.0’s SL are High-Level languages. But they’re are not High-Level shading languages. What’s the difference between Cg, GL2.0 SL, and current asm-like API ? They both provide the same low-level functionnality. They use the same low-level shading approach: a shader is a combination of a vertex program and a fragment program (which is the natural way of designing a low-level API for a stream processor).
A High-Level shading language, however, don’t split a shader into vertex-specific and fragment-specific parts. It uses High-Level concepts from real world lighting. One typicaly writes light shaders and surface shaders. The “Shader” (with a big S) can be seen as the combination of one or several light shaders and one surface shader.
I don’t know any realtime HLSL, except Stanford’s one (still quite buggy, yet).
With Cg, GL2SL, ARB_vertex_program, register combiners, ATI_fragment_shaders, etc. programmers must think the same way, no matter the language.
So I think the Cg/GL2SL flamewar is just stupid…

Julien.
PS> About Matt’s argument about the GL philosophy (the low-level thing): GL has always been intended to provide low-level functionnalities, but that does not mean low-level APIs… As an analogy, I would say you’re not bound to asm when writing an OS: you can write it in C.

You create a cool effect with ASM shaders. Because it can’t fit in the requirments of the current hardware (number of operations, number of texture accesses, number of registers) this effect requires 5 passes. By the time your title ships, the next generation hardware is out. It has more registers, can do more operations, can do more texture accesses, has new more advanced and more efficient instructions… Because of this your effect now requires only 1 pass. But it is still done in 5 unoptimal passes.
With OpenGL 2.0 this would not be a problem. Not only the effect will work in just 1 pass, but it will be fully optimized for the hardware that did not exist when the effect was created. None of the 3D apps and games that I play today are optimized for the next gen hardware that will come in a few months. Very few of them are optimized for the current crop of hardware(GF3). So at the end - the ASM approach is slower.

Originally posted by GeLeTo:
Because of this your effect now requires only 1 pass. But it is still done in 5 unoptimal passes.
(…)
So at the end - the ASM approach is slower.

I don’t see any relation between the language (being asm-like or c-like) and the multipass abstraction you’re talking about.
If transparent multipass is possible (Matt said it would be difficult to achieve and I agree with him on that), then it’s possible with any language… Remember a language does not provide any functionality by itself, it only allows the programmer to use them…

Julien.
(edit: typos)

[This message has been edited by deepmind (edited 06-22-2002).]

“I don’t see any relation between the language (being asm-like or c-like) and the multipass abstraction you’re talking about.”

True, if you that asm language has unlimited number of registers, unlimited number of operations, ability to do as many texture reads as you like… Well, then how would that be different from a c-like language? The biggest advantage of the asm-like languages is that you know when you hit the limitations. As for optimal code - the compiler should deal with that quite well.
Also if the asm language does not support some function (for instance noise)you can use a lookup table/texture or some algho to implement it. But the next-gen hardware may support it natively.

As for the transparent multipass being dificult to implement - I am not so sure. The compiler will have to:

  1. Group instructions that rely on each other.
  2. Use a different var for each assignment (e.g. don’t reuse variables)
  3. At each step of the program that is not inside a conditional statement or a loop - check how much data is stored in active variables(e.g. vars that hold data that will be used later). If that data can fit in what your intermediate frame/alpha/aux/f/p-buffer can store - it is safe to split the program at that point.

My suggestion is that transparent multipass should be handled either at scene graph or HLSL level. I think it can be done more efficiently by a scene graph, because the scene graph knows relevant semantic information like “if I render this object with a second pass of DepthFunc(EQUAL), it’s OK if that results in double blending on a pixel where two triangles have the same Z by accident.”

The driver, on the other hand, can’t make those assumptions without creating obscure but nevertheless real bugs.

  • Matt

It’s kind of hard to figure out where to put a reply with all of the shader language threads around now, but I think this is the most appropriate one…

Most of the arguments made thusfar about “high level” versus “low level” shading languages have been made assuming that assembler-type shading languages are in the “low level” category. I don’t agree that this is necessarily the case.

Most assembly languages have a very close mapping between assembly language instructions and hardware opcodes, so much that in their simplest form simplest form all an assembler has to do is to translate the assembly language instruction into the corrosponding opcode. The concept of an “optimizing assembler”, which tries to optimize an an assembly language program during the assembly process, is somewhat unheard of.

Interestingly enough though, this is what is going to have to happen if one assembly-type language is going to be supported across multiple platforms. Unless all hardware vendors are going to support the same hardware architecture and instruction set, “low level” assembler-type instructions are going to need to be modified during the assembly process by the driver to fit the target architecture and instruction set. The end result is that by writing in a standard low-level shading language you have all the disadvantages of writing in a low-level language (difficulty of implementation and maintenance of your shader) with none of the benefits (you still don’t know exactly what is going on in the driver to make your shader work). I don’t think it’s even possible to come up with an assembly language standard that is anything other than a “high level” assembly language.

I’m still undecided as to whether this is any better or worse than a higher-level c-like language, but I don’t think it’s justified to make an assembler-type language as part of the API standard just because it is “lower level” than some other shading language.

I suppose I should say that these opinions are mine and not those of my employer, before I get myself in trouble…

Originally posted by GeLeTo:
Because of this your effect now requires only 1 pass. But it is still done in 5 unoptimal passes.

I think this is a wrong problem: when you develop a game/app, you should typically target some kind of hardware.

When doing that, I think you will rarely target “future” products (unless you know what’s going to come, cf Carmack and the likes).

Then, you develop so that the game/app runs well on your target hardware.

In your example, when the new card is released and it can do everyting in one pass instead of five, it won’t bring you anything better than the original target: if the developpers had known that they would have had this extra time to do more things in a frame, they would have used it but the problem is … they didn’t know !

The only applications that I can see getting immediate benefits from this kind of evolution is the likes of 3DS Max and Maya, but in general, I think old products won’t get anything out of it (specifically games).

Anyway, I think that was quite an OT remark…

Regards.

Eric

Originally posted by deepmind:
[b]Many people have been arguing against or in favor of Cg. Matt said the GL2.0 shading language has no place in the core since it’s a High-Level language (and that’s against the low-level philosophy of OpenGL).

I would like to share my opinion: sure, Cg and GL2.0’s SL are High-Level languages. But they’re are not High-Level shading languages. What’s the difference between Cg, GL2.0 SL, and current asm-like API ? They both provide the same low-level functionnality. They use the same low-level shading approach: a shader is a combination of a vertex program and a fragment program (which is the natural way of designing a low-level API for a stream processor).
A High-Level shading language, however, don’t split a shader into vertex-specific and fragment-specific parts. It uses High-Level concepts from real world lighting. One typicaly writes light shaders and surface shaders. The “Shader” (with a big S) can be seen as the combination of one or several light shaders and one surface shader.
I don’t know any realtime HLSL, except Stanford’s one (still quite buggy, yet).
With Cg, GL2SL, ARB_vertex_program, register combiners, ATI_fragment_shaders, etc. programmers must think the same way, no matter the language.
So I think the Cg/GL2SL flamewar is just stupid…

[/b]

This is definitely wrong. Both the Stanford SL and OpenGL 2.0 make a clear separation between calculations per-vertex and per-pixel. There is only a different syntax: Stanford SL uses variable keywords and casts, OpenGL 2.0 uses different functions for per-vertex and per-fragment calculations.

Note that Stanford first tried to define the shading language without distinguishing between per-vertex calculations and per-pixel calculations, but it did not work out. Instead, the mechanism of per-vertex and per-fragment calculations play an central role in their SL. (See their publications for details.)

Furthermore, as already pointed out by others, there are fundamental differences between Cg and OpenGL 2.0 SL: The idea of OpenGL 2.0 is to run every shader program on every hardware, and different hardware “only” gives different performance. The idea of Cg is to simply writing shader programs which may run on some hardware, but not on other hardware depending on the features you use.

Originally posted by Eric:
[b] I think this is a wrong problem: when you develop a game/app, you should typically target some kind of hardware.

When doing that, I think you will rarely target “future” products (unless you know what’s going to come, cf Carmack and the likes).

Then, you develop so that the game/app runs well on your target hardware.

In your example, when the new card is released and it can do everyting in one pass instead of five, it won’t bring you anything better than the original target: if the developpers had known that they would have had this extra time to do more things in a frame, they would have used it but the problem is … they didn’t know !

The only applications that I can see getting immediate benefits from this kind of evolution is the likes of 3DS Max and Maya, but in general, I think old products won’t get anything out of it (specifically games).

Anyway, I think that was quite an OT remark…

Regards.

Eric[/b]

I disagree completely:
If a new hardware can do the same in one pass instead of five, you get a performance boost without adapting your software. In short: Using the same 3d engine, you can create much more complex and detailed 3d worlds. If the new hardware still uses 5 passes, you have to re-write your software, rewrite your vertex- and pixel programs etc. etc.

It is a crucial aspect of high-level software development that the software runs on every hardware, and the “only” difference is performance.

In other words:
When developing a game engine, you should NOT target a particular hardware (as far as possible). Only your content should target a particular hardware.