PDA

View Full Version : ARB June Meeting



Korval
07-25-2003, 10:42 AM
Well, the June meeting notes are up. Comments:

NV_occlusion_querry and NV_point_sprite get into the core, straight from NV extensions. Cool.

VBO's remain an extension. Also fine. It's best to take it slow with such a fundamental API, let alone performance, issue; you don't want something in the core that you'll have to replace because you missed something in the design.

What I want to know is what is, "ARB_texture_non_power_of_two"? Can hardware even implement it?

Also, they had a quick line about the super-buffers WG, but they didn't say whether or not the extension is coming together. Obviously, it's not ready yet, but is it close?

Nutty
07-25-2003, 10:44 AM
What I want to know is what is, "ARB_texture_non_power_of_two"? Can hardware even implement it?

Probably similar to NV_texture_rectangle.

SirKnight
07-25-2003, 11:01 AM
I'm quite curious about the HLSL they put in OpenGL 1.5 now. I'm guessing it's the same as the OpenGL 2.0 glslang. I just hope I can use it on the GeForce 3 and 4 TI level of hardware by perhaps setting some profile similar to Cg. If not well I'll still continue to use Cg. http://www.opengl.org/discussion_boards/ubb/smile.gif

-SirKnight

al_bob
07-25-2003, 11:02 AM
VBO's remain an extension

That wasn't my impression of it...



ARB_vertex_buffer_object

ISVs are using this and seem happy with it. Only negative feedback was from someone using thousands of tiny VBOs and unhappy with performance implications on some platforms.

VOTE for inclusion in the core: 10 Yes / 0 Abstain / 0 No, PASSES unanimously.



What I want to know is what is, "ARB_texture_non_power_of_two"? Can hardware even implement it?
It's basically a texture with non-power of 2 dimensions. The extensions is actually a group of 4 mini-extensions (each can be separately supported or not) for non-power-of-2 1D, 2D, 3D and "cube"map textures.

Edit: typo

[This message has been edited by al_bob (edited 07-25-2003).]

davepermen
07-25-2003, 11:27 AM
nothing new about superbuffers/uberbuffers http://www.opengl.org/discussion_boards/ubb/frown.gif http://www.opengl.org/discussion_boards/ubb/frown.gif oh i'm just waiting for those..

Korval
07-25-2003, 11:33 AM
VOTE for inclusion in the core: 10 Yes / 0 Abstain / 0 No, PASSES unanimously.

Well, I guess it's unanimous: I'm an idiot http://www.opengl.org/discussion_boards/ubb/wink.gif I have no idea what I was even looking at that made me think they hadn't moved VBO into the core.

Though I still think they should probably wait a few more months. Just to be safe.


The extensions is actually a group of 4 mini-extensions (each can be separately supported or not) for non-power-of-2 1D, 2D, 3D and "cube"map textures.

Well, yes, but NV_texture_rectangle defines a special type of texture. It specifies that texture coordinates cannot be normalized on [0, 1], but have to be from [0, w] and [0, h]. Also, since texture rectangles is separate from cubemaps and other textures, you can't have a cubemap that is non-power of 2. Also, texture rectangles can't have mipmaps.

That's why I asked if this was implementable in hardware. It's clear that the NV_texture_rectangle extension defines behavior that nVidia hardware seems to like. After all, if those restrictions weren't necessary, nVidia wouldn't have placed them there. So, can this extension be implemented in modern hardware?

It's easy enough to write a spec for texturing that transparently supports non-power-of-two textures. But, does such hardware exist? Or is it going to be another of OpenGL's wonderful games of, "Find the set of state that works on your hardware of choice."

jra101
07-25-2003, 12:11 PM
The ARB_texture_non_power_of_two extension basically just relaxes the "width and height must be power's of 2" restriction for all texture targets. Texture coordinates are still in the [0..1] range and mipmapping is still supported.

No current hardware (that I know of) supports this extension, its meant for future hardware.

dorbie
07-25-2003, 04:24 PM
I'd like to know who voted against the shading language as an ARB extension. Depending on the answer it might cause me some concern.

SirKnight
07-25-2003, 04:45 PM
Originally posted by dorbie:
I'd like to know who voted against the shading language as an ARB extension. Depending on the answer it might cause me some concern.

I apologize if I'm wrong but I have a hunch it was nvidia. Simply because of one thing. Cg. They want every one to use Cg (I do and I think it's great btw). So having some HLSL ratified into an ARB extension that is different than Cg, this I can see, they would not like. Than again maybe I'm totally wrong, but this would be my first guess.


-SirKnight

Korval
07-25-2003, 04:46 PM
No current hardware (that I know of) supports this extension, its meant for future hardware.

So, what you're saying is that the ARB wasted some time that could have been spent on the uber-buffer's extension?

If somebody writes an extension spec, but it is not implemented, then they have wasted their time. They should have just promoted EXT_texture_rectangle to ARB status, and when the new non-power-of-2 stuff is actually avaliable, then release the extension spec (along with appropriate drivers that implement it).


I'd like to know who voted against the shading language as an ARB extension. Depending on the answer it might cause me some concern.

Oh, come on. Didn't you read some of the issues (and who brought them up)? It's obvious who voted against glslang.

Granted, I agree with some of their reasons for it, though. I seriously doubt it is an attempt to do anything underhanded.

This, coupled with the non-power-of-2 thing, shows a shift of the ARB back towards what makes OpenGL bad: not knowing the right API to use.

Should I really need a document somewhere to tell me, "Yes, I know glslang says that feature X exists, but never use it because nobody implements it at anything anyone would reasonably call 'fast'." Such things are a major impediment towards the growth of the API's usage.

Sure, it's nice to see "texture" access facilities in vertex shaders (though I would have used a very different kind of API, one that is quite distinct from textures, since uploading modifications is more likely on vertex shaders than regulat textures in fragment shaders), but there's no reason to require it at a point where no hardware can use it. If hardware can't use it, it is a mis-feature; that is, a feature that exists technically, but not in any useable fashion. The same goes for this non-power-of-2 extension; there's no reason to even have the spec if nobody can use it.

Meanwhile, we're still waiting on the super-buffers extension, which is set to provide real, useful power and functionality that the API really needs.

It's this kind of backwards thinking that allows an API like Direct3D to become more efficient; at least it doesn'y have 1001 functions that you should never use, but have to be there because some moron on the ARB wanted them.

[This message has been edited by Korval (edited 07-25-2003).]

al_bob
07-25-2003, 06:26 PM
If somebody writes an extension spec, but it is not implemented, then they have wasted their time.
Extensions aren't only written because current hardware can expose the functionality. It's nice ot be forward-looking every now and then.


It's clear that the NV_texture_rectangle extension defines behavior that nVidia hardware seems to like. After all, if those restrictions weren't necessary, nVidia wouldn't have placed them there.
Looking at the date at the top of the extension spec, NV_texture_rectangle was first drafted in 2000. In that time frame, the GeForce 2 was released. Needless to say, hardware has improved a bit since.


Meanwhile, we're still waiting on the super-buffers extension, which is set to provide real, useful power and functionality that the API really needs.
Better to have a *good* ARB extension as opposed to a set of not-so-good vendor-specific extensions, no? I'd rather they take their time and get it right than having to have 3 versions of the same extension. Besides, if the extension is an ARB one, there's a good chance it gets promoted to the core, which is why it's all that more important to get it right the first time.

NitroGL
07-25-2003, 09:34 PM
Originally posted by jra101:
No current hardware (that I know of) supports this extension, its meant for future hardware.

Doesn't D3D allow non-Power of 2 textures with mipmapping?

Roderic (Ingenu)
07-26-2003, 12:06 AM
Originally posted by Korval:
NV_point_sprite get into the core, straight from NV extensions. Cool.

That's not what I read, they say it'll be modified before entering the core.

dorbie
07-26-2003, 02:04 AM
SirKnight, I can speculate as well as the next guy, but I'd like to know.

Mezz
07-26-2003, 07:55 AM
dorbie, whatever the answer actually is, why would you be concerned?

-Mezz

PS: If you don't want to say then I apologise for asking.

SirKnight
07-26-2003, 12:18 PM
To me it really doesn't matter who it was because it's not like we HAVE to use it. There is always Cg and the other HLSLs out there. Actually I'm quite happy with the way Cg is right now. Except for the fp20 profile. It doesn't optimize good enough. I sent an email to nvidia showing this though I never heard back. What it will do is use a general combiner for some operation when it didn't need to. For example it could perform this op in the final combiner yet it chooses against this. But other than that Cg fine, and you can use it on any other card as long as the card supports ARB_vp and/or ARB_fp. Anyway, how ever this new HLSL is like in OpenGL 1.5, I'm sure it's pretty good. I'm hoping it's the same as the OpenGL 2.0 glslang. Maybe someone could come on here and talk about it who knows for sure? http://www.opengl.org/discussion_boards/ubb/wink.gif


-SirKnight

dorbie
07-26-2003, 02:51 PM
Ahhh SirKnight, you're warming the cockles of NVIDIA's heart.

Of course it matters, this goes to the heart of shader compilation and optimization for different targets.

If NVIDIA voted no it wouldn't bode well for ARBslang support from them, it becomes optional because of the two related votes in that meeting. We may end up with Cg vs ARBslang (we're already there but only by default).

Incase you missed the recent debacle w.r.t. cheats & optimization of shaders, this stuff is important and the control of that legitimate compiler optimization requires that manufacturers are active in developing the compiler you are using.

We'll could end up in the middle of a shader 'war' and that's not good for developers.

Of course we still don't know where the no vote came from, or what NVIDIA ARBslang support will look like either way.

dorbie
07-26-2003, 03:07 PM
SirKnight, this is the glslang of OpenGL 2.0, although it may change by then.

The issue of 1.5 vs 2.0 is misleading, 1.5 has many 2.0 features. Major features of 2.0 have been included in 1.5 as an extension and this was the intent of some people trying to head off a monolithic 2.0 introduction that would have included breaking comaptability.

There was another vote on glslang in that meeting that kept glslang out of the 1.5 core. The hope was that this would make it into 1.5 core. Holding out the mythical GL2 as a carrot for glslang core inclusion is a joke. GL2 is a name, a concept and 1.5 has some of it's features, because of this vote it has one less feature although it's there as an ARB extension, the distinction will ultimately depend on whether it's supported where you need it.

Korval
07-26-2003, 04:21 PM
The hope was that this would make it into 1.5 core.

Why? The language isn't even in real use yet. As long as it is an extension, there is the opportunity to change it if it turns out that it doesn't provide the functionality in a reasonable way (I'm still uncertain as to how the whole shader-linking thing works out, and I really don't like the idea of 'texture' accesses in vertex shaders). Once it makes it into the core, you have to live with it, no matter what. Changing core features is a different matter from changing extensions.

Features should be extensions before becoming core so that you can make sure that they serve everyone's needs correctly.

[This message has been edited by Korval (edited 07-26-2003).]

SirKnight
07-26-2003, 05:18 PM
Dorbie, I think you misunderstood what I said. I said it didn't matter to me who that one company was who voted no. Reason is because it passed so now this extension is in the core and all hardware companies have to support it now, even the company who said no. If not then we can say screw you and not support them, we all know they don't want that. And the way things are right now we don't have to use it if it's not all that great. But since it's the same as OpenGL 2.0's glslang then I'm quite sure I'll like it. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Now I don't know the details about why, according to the notes, NVIDIA said they would vote no if some things stayed the way they were, but if it has anything to do with having a HLSL "built in" so to speak into the OpenGL core then I would have to agree with them. I don't think it makes much sense to have a HLSL built into OpenGL. There should only be assembly language like shading extensions (like ARB_vp and ARB_fp) in the core and the HLSL should be "outside" like how Cg is. To me this seems like an obvious thing to do but some don't see it that way. OpenGL should be kept a "low level" graphics API and anything else you need, any kind of helpers like HLSLs, should be just like utilities outside the api and compile to what is in the core. I'm no expert on Direct3D but from what I have seen, that's pretty much how D3D is. All of these extra things are a part of the D3DX library of helpers. Having a HLSL built in the core makes about as much sense as having C++ built into our CPUs. No, what we have is an assembly language defined for our processors which has a 1:1 mapping to it's machine code instructions which is what the CPU understands, it doesn't know wtf "mov" is, but it does know what a number is and what to do with it. And it turns out luckily that there is a number that corresponds with the "mov." Well most of these instructions have this mapping anyway afaik. Then we have all of these high level languages that compile down to this assembly language then to these machine code instructions for our CPU so the program can run. The way HLSLs are right now, ie Cg, is how it should be. You have an assembly language defined for the GPUs which has a 1:1 mapping (or close enough for it to work anyway) with the GPUs machine code, just like with our CPUs, and our HLSL like Cg or whatever will compile to this assembly language then from there that get's turned into the GPUs machine code to be executed. Of course we need a standard assembly language defined that all graphics hardware will work with, ie ARB_vp. Which it turns out we do so it's all good. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Also, what Korval said is 100% correct. All of these new features should be out and in use for a while, 6mo to a year I'm guessing, to get all the issues ironed out and to a point where everyone, or mostly everyone, is happy with it. THEN only should it become part of the core. How things have been going in the ARB just doesn't make sence. What are these guys thinking? Don't make something a part of the core before us developers are using it for a while to allow these potenetial core features to be "broken in" if you will.

-SirKnight


[This message has been edited by SirKnight (edited 07-26-2003).]

SirKnight
07-26-2003, 05:40 PM
Now the issue of each vendor having their own extensions to expose features common to only their hardware will always be there. Getting away from writing code specific for a particular card will never go away. Now most of it should, the basic core stuff, like the deal with ARB_vp and ARB_fp is great. It's a standard and all cards, if they want to be worth a poo, will support them. But still having a few extra vendor specific extensions is not a bad thing. It's just that whole entire code paths should not be vendor specific, like how things had to be done in Doom 3. This kind of thing even goes on in the CPU world (I don't mean the entire code path thing). We have Intel and AMD fighting each other just like we have NVIDIA and ATI. Like at one point we had these AMD chips with 3dnow. Oh boy, stuff to help applications run faster and better if you were running an AMD chip, while the people on the Intel would just be running through the standard path. Dang. But don't despair just yet, Intel had their own fancy "extensions" like MMX, SSE, and the like. So here we have two CPUs with the basic instruction sets but at the same time each had their own enhancements for those who wanted to use them for extra performance to make their program better than the rest. This is just like it is right now with GPUs. And this kind of trend I do not see dying off any time soon. It's actually a good thing in many ways.


-SirKnight

davepermen
07-26-2003, 05:45 PM
Originally posted by Korval:
and I really don't like the idea of 'texture' accesses in vertex shaders

hm.. why? you don't like displacement mapping? or other features? you can have a full lodded terrain and just map your texture on it and voilŗ, always good heightmapped..

or hw animated water texture can get used for the displacement, too..


tons of features actually!

and for hw vendors it will be more useful for the future, too.. why? because in the end, vs and ps should and so could be implemented the very same way.. that could mean for example in a fillrate intensive situation, you could use 6 of your 8 pipelines for the pixelshading, and 2 for the vertexshading.. in a vertexintensive situation you could use 5 for vs, 3 for ps..


and, if you want to use hw to assist raytracing for example, you could use all 8 for raytracing..

this could come, the pipeline-"sharing".. and could be quite useful for gaining performance.. reusing resources that is http://www.opengl.org/discussion_boards/ubb/biggrin.gif


just think of it.. all your vertexshaders would just go and support the pixelshaders...

woooooooohhhhh now THAT would rock http://www.opengl.org/discussion_boards/ubb/biggrin.gif

dorbie
07-26-2003, 06:59 PM
Well you have fewer interpolators than texture units so what's your option?

Anyhoo, to correct your Intel ananlogy, Intel won't be adding 3DNow! instruction support to their compiler any time soon. The analogy is strained anyway because other vendors could optimize if they chose to (or at least had a choice once), they won't of course, because they're firmly in a different shader camp. Reasoning by analogy is rarely very useful.

I know you have ARB_fp etc, but that doesn't mean all hardware does the same thing, or has 100% instruction match (esp into the future) nor the same optimal program length or register use or texture unit count. This is all pretty obvious.

You have to be pretty blinkered to think that compiler multiple choice between partial hardware coverage or proprietary half-hearted support is a good thing.

IF that's how things pan out.

dorbie
07-26-2003, 07:22 PM
SirKnight, this shader extension is NOT in the core, it's an optional ARB extension, if never got enough votes in the second vote to put it in the core.

This probably puts things in a different light for you.

zeckensack
07-26-2003, 09:35 PM
Originally posted by SirKnight:
I don't think it makes much sense to have a HLSL built into OpenGL. There should only be assembly language like shading extensions (like ARB_vp and ARB_fp) in the core and the HLSL should be "outside" like how Cg is. To me this seems like an obvious thing to do but some don't see it that way. OpenGL should be kept a "low level" graphics API and anything else you need, any kind of helpers like HLSLs, should be just like utilities outside the api and compile to what is in the core.NVIDIA's company line. Gosh. Allow me to disagree (as I do all the time now, it seems) http://www.opengl.org/discussion_boards/ubb/wink.gif

Nailing down the 'assembly' language is bad. If you require a certain assembly interface for the high level, layered compiler to work, you restrict hardware implementations to that exact assembly interface. Hardware is too diverse to do that. Much more than in CPU land.

Remember ATI_fragment_shader vs NV_register_combiners? You'd need one of these to make full use of the NV2x and R200 generation. What you're proposing is somewhat akin to restricting yourself to ARB_texture_env_combine. You gain portability but lose flexibility on both targets.

One of the very reasons for high level languages is the opportunity to eliminate diverse middle interfaces, to say goodbye to multiple codepaths, and still get the best possible hardware utilization.

This is why, IMO, the assembly style interface should best be hidden and never even be considered for exposure again.

I'm no expert on Direct3D but from what I have seen, that's pretty much how D3D is. All of these extra things are a part of the D3DX library of helpers.And the DX Graphics model didn't work out too well. Futuremark, anyone? MS subsequently did a new PS2_a profile. Guess why that just had to happen ...

Having a HLSL built in the core makes about as much sense as having C++ built into our CPUs. No, what we have is an assembly language defined for our processors which has a 1:1 mapping to it's machine code instructions <...>Yeah, right. Try running an x86 executable on a Mac. Then come back and try a more appropriate analogy for the point you wish to make. Sheesh.

Meanwhile, I'll take your analogy and use it for my own POV:
C++ can be compiled for x86, for PowerPC, for Sparc. If the code in question doesn't touch upon OS peculiarities, it's all a matter of selecting the right compiler target.
You don't compile C++ to x86 ASM, and then try and do a second compile step to produce a Mac binary.

There is no industry standard assembly representation that'd do justice to all targets. Everyone who tells you otherwise must have been smoking something hallucinogenic.

Korval
07-27-2003, 12:07 AM
you don't like displacement mapping?

What displacement mapping?

Real displacement mapping involves shifting the location of objects per-fragment. Doing it per-vertex is nothing more than some kind of hack.

In any case, vertex shaders can't do the really hard part of displacement mapping anyway: the tesselation. And, if they're smart, it never will (tesselation should go into a 3rd kind of program that feeds vertices to a vertex program, so that they can run async). So, in order to do automatic displacement mapping, you still have to do a render-to-vertex-array to tesselate it. Since you're writing vertex data from your fragment program, you may as well use its texturing facilities to do the displacement.

Now, I do like the idea of binding arbiturary memory to a vertex shader. However, this is different from a texture access.

If you have a 16x1 texture, accessing the texel value 3.5 has some meaning. With bilinear filtering, that means accessing the blend of 1/2 of pixel 3 and 1/2 of pixel 4.

This has absolutely no meaning for the kind of memory I'm talking about. For example, let's say I bind a buffer of memory that contains matrices for skinning to a vertex shader. The way this should work is that it only takes integer values as arguments. Matrix 3.5 has no meaning. And a blend of the 16-float values that matrix 3.5 represents would be the absolute wrong thing to do.

Also, textures are not updated frequently. And, when they are, they are usually updated via a render-to-texture or a copy texture function, not from main memory data. However, 9 times out of 10, memory bound to a vertex shader is updated every time the shader is used. So, you don't want to use the texture accessing functionality with it; instead, you want an API more akin to VBO (you could even use a buffer object for the binding, since the API works so very well for the kinds of things you'll try to do).


because in the end, vs and ps should and so could be implemented the very same way.. that could mean for example in a fillrate intensive situation, you could use 6 of your 8 pipelines for the pixelshading, and 2 for the vertexshading.. in a vertexintensive situation you could use 5 for vs, 3 for ps..

Yeah, this makes since. Especially considering how they are on very different ends of the pipeline. And they would be processing data fed from different places. And 1001 other major differences between vertex and fragment shaders that make this a horrible idea from a hardware implementation standpoint.

Plus, for optimal performance, you want to pipeline vertex shaders like a CPU: deep pipelining with a sequence of instructions all being processed at once. For a fragment shader, you want to pipeline like pixel pipes: wide pipelining, with multiple copies of the same instruction being called at the same time. Why?

Because vertex programs must operate sequentially. The setup unit has to get each vertex in turn. It does no good to spit out 2 or 3 vertices at once; indeed, this is incredibly bad for a short vertex shader (shorter than it takes the setup unit to process 2 or 3 verts). Also, it compilcates the setup logic, as it now must somehow know the order of these triangles. Each fragment of a single triangle, however, is completely independent of the others, so it makes since to do them in parallel.


Nailing down the 'assembly' language is bad.

Odd. Intel, apparently, thought that this was a very good idea (until recently with IA64, but AMD is taking up the reigns). Allow me to explain.

A CISC chip like most Intel chips works by emulating an instruction set. The P4 reads x86 instructions, converts them (using a program written in what they call 'microcode') into native instructions, and then executes those native instructions.

The thought behind this concept is so that you can compile programs to a single assembly language that can be run on multiple generations of a processor. Which is why code compiled for a 286 still runs on a Pentium 4 (to the extent that the OS'es allow it).

If you take the analogy to graphics cards, the card itself would be the underlying generation of hardware. The assembly would represent the x86 instruction set. The microcode is the assembler that runs when you create the program. So, really, SirKnight's idea is nothing more than modern instruction set

Granted, ARB_vertex_program and ARB_fragment_program are not quite good enough to immortalize as a finalized ISA-equivalent. However, it is hardly fair to say that this idea is bad; after all, it is the basis of why your computer works today (unless you're not using a PC).

You might say that graphics hardware is evolving faster than CPU's did. However wanting to stick to the x86 ISA didn't stop Intel from exposing MMX or SSE instructions; they were simply extensions to the ISA. Unless there is a forseable change that fundamentally alters how the assembly would look (outside of merely adding new opcodes), there isn't really a problem.

However, there is one really good thing that comes out of glslang being part of drivers: shader linking. Supposedly, you can compile two vertex shaders and link them such that one shader will call functions in the other. In a sense, compiled shaders are like .obj files, and the fully linked program is like a .exe.

Of course, with a rich enough assembly spec (richer than either of the ARB extensions), you could still have this facility, where you would give the driver an array of shaders to compile together. The assembly would have to retain function names in some specified fashion. At that point, granted, nobody will want to write code in the assembly anymore, but that's OK.


One of the very reasons for high level languages is the opportunity to eliminate diverse middle interfaces, to say goodbye to multiple codepaths, and still get the best possible hardware utilization.

So, why do you support glslang, when it clearly doesn't offer this (as I have mentioned in other threads)? Outside of that library of your's that you are writing, which has very little difference from Cg's low-end profiles, ultimately.

[This message has been edited by Korval (edited 07-27-2003).]

MZ
07-27-2003, 05:24 AM
Originally posted by Korval:
For example, let's say I bind a buffer of memory that contains matrices for skinning to a vertex shader. The way this should work is that it only takes integer values as arguments. Matrix 3.5 has no meaning. And a blend of the 16-float values that matrix 3.5 represents would be the absolute wrong thing to do.

Also, textures are not updated frequently. And, when they are, they are usually updated via a render-to- texture or a copy texture function, not from main memory data. However, 9 times out of 10, memory bound to a vertex shader is updated every time the shader is used. So, you don't want to use the texture accessing functionality with it; instead, you want an API more akin to VBO (you could even use a buffer object for the binding, since the API works so very well for the kinds of things you'll try to do).

Your skinning example supports validity of GL2's multi-index-array concept (criticized, and now abandoned). You could have 2 index arrays: one for vertices, and one for matrices, effectively sharing each matrix between group of vertices. Could be great for batching (as GDC'03 document states, we may expect importance of batching to increase with each new HW generation), and more powerful than packing the matrices into spare constant regs.

I'm with those who are waiting for sŁperb buffers. IMO it should be prioritized even over GLslang.

SirKnight
07-27-2003, 09:01 AM
Yeah, right. Try running an x86 executable on a Mac. Then come back and try a more appropriate analogy for the point you wish to make. Sheesh.


Running an x86 exe on a Mac? What the hell are you talking about? I never said that a program written in some HL language like C++ would run on _ANY_ CPU. Show me where I said that. I'd love to know.

I don't think you understood a word I said. You are saying that I said things I did not.



You don't compile C++ to x86 ASM, and then try and do a second compile step to produce a Mac binary.


Well duh! I never said you can do that! But, if you are making a program to run on an x86 cpu, then when you compile it does compile to x86 assmebly then from there that converts to the x86 machine code to be executed. The whole CPU thing was just kind of a base, I never meant to say that how CPUs and GPUs are programmed are exactly the same in all ways and a program written in a HL language will magically work on every cpu in the world from just one compile.



There is no industry standard assembly representation that'd do justice to all targets. Everyone who tells you otherwise must have been smoking something hallucinogenic.


Nor did I say there were. Please stop saying I said things I did not and trying to put words in my mouth.

You know...I probably just assumed this but I guess I should have stated when I was talking about CPUs, I was mainly thinking about one single architecture. Like I said, that was just a base on what I was talking about, I never meant to cover everything about all CPUs all at once. I just wanted to show how a program goes from a HLL to an executable form on a CPU to show why I like Cg's way of a HLSL on a GPU and why I think that way is better in my opinion.

And again yes Korval is correct. http://www.opengl.org/discussion_boards/ubb/wink.gif He said my idea is nothing more than a modern instruction set. BINGO! This is what I was getting at. I'm glad someone understood all of that. Korval wins the gold star! http://www.opengl.org/discussion_boards/ubb/wink.gif

Ya, the instruction sets we have now, ARB_vp and ARB_fp is not up to par yet for a standard modern instruction set, but it's a good start and obviously as GPU's get better, so will their instruction sets.

Now I would like to mention that the idea of a HLSL built into OpenGL is not stupid, it does have it's good points and I understand these, it's just not what I prefer. http://www.opengl.org/discussion_boards/ubb/biggrin.gif


-SirKnight

SirKnight
07-27-2003, 09:08 AM
Originally posted by dorbie:
SirKnight, this shader extension is NOT in the core, it's an optional ARB extension, if never got enough votes in the second vote to put it in the core.

This probably puts things in a different light for you.


Hm...you know, I could have SWORN I read it was put into the core. I see now that I read over it again that it wasn't. Ok then nix that whole thing I said about it being in the core. Sorry about that, you're right dorbie. Doh! http://www.opengl.org/discussion_boards/ubb/smile.gif


-SirKnight

zeckensack
07-27-2003, 09:29 AM
Oh my. What have I done? http://www.opengl.org/discussion_boards/ubb/biggrin.gif
Code compatibility on x86 has historical reasons, I understand that. I wish to make the point that these reasons substantiate in the form of 'legacy' code, code which has already been compiled down to the target.

I really like x86 a lot. It's a wonderful, powerful and expressive ISA, though today it's mainly some form of transparent code compression. But ... you can't deny that the required chip complexity to support this sort of legacy translation is overwhelming. Spot the execution units (http://www.chip-architect.com/news/Northwood_130nm_die_text_1600x1200.jpg) , if you can.

"Legacy code" and, along with it, the reasons for all this complexity can only be produced if the ISA is exposed and code is shipped precompiled. Because I prefer more execution units and flexibility over scheduling logic and rename buffers, I think the ISA should be tucked away somewhere.

This issue is all the more important, because in contrast to x86 implementations that have just two instruction paths (int and fp), graphics chips have already hit eight parallel execution pipes. Independent branch control logic for eight parallel, first class citizen OOO schedulers would surely be a major pita. By hiding the ISA, and shipping only high level code, this complexity can neatly be moved to software.

Just like I can create decent x86 and PowerPC code from a single C++ source. I don't want to and I don't need to know the ISA, if you follow me.

The x86 evolution is not necessarily a role model for programmable graphics hardware. I believe it shouldn't. That's all.

Korval
07-27-2003, 10:54 AM
This issue is all the more important, because in contrast to x86 implementations that have just two instruction paths (int and fp), graphics chips have already hit eight parallel execution pipes. Independent branch control logic for eight parallel, first class citizen OOO schedulers would surely be a major pita. By hiding the ISA, and shipping only high level code, this complexity can neatly be moved to software.

Remember, the equivalent of the microcode opcode translator on x86 chips is the driver's compiler for the assembly language. So, the complexity for scheduling and so forth is in the software, not hardware. Also, while you may think that it is complex to write an optimizing assembler for the assembly language, it is more complex to write an optimizing C-compiler for the assembler.

Oh, and, outside of discard actually stopping a pipe (which I seriously doubt will ever happen), why do the eight parallel pipes need to have independent branch control logic? They certainly don't today, and they probably aren't going to in the near future.


The x86 evolution is not necessarily a role model for programmable graphics hardware.

There are alternatives (glslang), but this kind of model is quite viable on its own. And, it gets the C-compiler out of the drivers.

titan
07-27-2003, 11:40 AM
Originally posted by zeckensack:
There is no industry standard assembly representation that'd do justice to all targets. Everyone who tells you otherwise must have been smoking something hallucinogenic.

GCC uses two passes. The front end compiles language X into "gcc assembly" and the back end compiles that into assembly for platform Y and does the low level optimizations. This industry standard assembly language works quite well.

When you develop your Java application you use a HL language which gets compiled into byte code "assembly" which the platform can either run, compile, or interpret.

MS's C# also works this way. Visual Basic too?

Anyway GCC does a great job doing justice to all targets with its internal industry standard assembly.

This style allows you to have CG, GLslang, renderman, standford shader, and even plain C (I belive the code play guys took at look at Gg and its vertex programs and what they were doing for the PS2's vertex shaders and asked why invent a whole new language, you can use C just fine for your vertex programs) which output an intermediate generic assembly which the driver can then optimize. Why tie ourselves to GLslang or cg? What if scheme is the ideal shader language?

I see no reason somebody can't write a CG and GLslang front end for gcc and a vertex/fragment_program backend and get rid of the need to have the compiler in the driver.

zeckensack
07-27-2003, 11:53 AM
Originally posted by Korval:
Remember, the equivalent of the microcode opcode translator on x86 chips is the driver's compiler for the assembly language.The keyword here is "the assembly language". There is no single agreed upon internal instruction format in x86 land. Exposing the internal ROPs, ĶOPs, whatever of current processors would only create new backwards compatibility nightmares. Opened assembly backends to high level compilers are comparable. As soon as you allow people to program to a low level ISA you're obliged to keep compatibility.


So, the complexity for scheduling and so forth is in the software, not hardware. Also, while you may think that it is complex to write an optimizing assembler for the assembly language, it is more complex to write an optimizing C-compiler for the assembler.[/B]Both are non-trivial tasks. Full blown high level compilers are more complex than cross-assemblers, I can agree with that. At the same time, they maintain more opportunities for hardware evolution. We'll get to that in a second.


Oh, and, outside of discard actually stopping a pipe (which I seriously doubt will ever happen), why do the eight parallel pipes need to have independent branch control logic? They certainly don't today, and they probably aren't going to in the near future.If dynamic branching ever becomes important, this will become interesting. Consider a fragment shader with a dynamic branch. Parallel pipes can go different ways through this branch, different loop iteration counts etc, so you either need to synch it all somewhere, or you need multiple control units (if you want to be efficient).

I strongly favor predication for graphics stuff, but there are different solutions to the issue (eg sorta like split f-buffers, suspending execution at the branch, spilling temporaries to two parts of the buffer; emptying both buffer regions applying their respective taken/not taken branch code). The truth is not out there yet.

However, if any one of these mechanisms is chosen, it affects the ISA definition, gets exposed, and creates the compatibility issue. If 'the industry' goes predication, it'll make sense to expose predicate registers and predicated execution (similar to x86 flags and CMOV but more sophisticated) in the assembly interface. Otherwise a layered compiler couldn't optimize for the hardware, or even couldn't support branches at all.

If we go with 'real' branches, there needs to be a JMP instruction, a conditional jump and condition flags (btw, how many of them?).

Either way, hardware implementations must somehow support whatever the standard middle layer is. If there's a new idea (or simply more resources) in any one IHV's hardware, it cannot go into the middle layer for compatibility reasons. Just like you can't fully use a Geforce 3's fragment processing w PS1.3. This, I think, is a major drawback that should be avoided. It's still possible to avoid it.

Extending the middle layer will only work if all IHVs agree upon the improvement. This is DX Graphics turf, and it simply can't do justice to everyone's hardware simultaneously.

zeckensack
07-27-2003, 12:06 PM
Originally posted by titan:
Anyway GCC does a great job doing justice to all targets with its internal industry standard assembly.I love GCC, mostly because it occasionally beats the crap out of MSVC6, but this is simply not true.

If you want optimum performance on an Intel processor, you get ICC, period. GCC can't compete, and I even think I've read on the mailing list that this internal "everyone's equal here" is the root cause.

What we're seeing with GCC vs ICC is an example of an IHV taking the responsibility to show off their own product. They know it best. They can optimize best for it. And most importantly: they're the only ones with a real motivation.

But one further question: do you know the internal GCC representation? Can you code directly in this representation? Will it affect your code if that internal representation gets changed? Three times no, probably. GCC's internals are free to evolve as needed because they are not exposed to users.

Korval
07-27-2003, 01:38 PM
I belive the code play guys took at look at Gg and its vertex programs and what they were doing for the PS2's vertex shaders and asked why invent a whole new language, you can use C just fine for your vertex programs

That's because the PS2's vector units are mini-CPU's. They have memory. They have branching. They have all the facilities that C expects to be present.

Vertex programs may never have the facilities that C expects. Remember, the PS2's VU's also function as command processors (deciding, not just what to do with the given vertex data, but actually walking the vertex data lists); they need these facilities to even function. Vertex programs don't have to perform these operations, and, as far as I'm concerned, never should.

In any case, C is a reasonable solution to VU's. It's not for programmable graphics hardware.


The keyword here is "the assembly language". There is no single agreed upon internal instruction format in x86 land.

You don't seem to understand. The assembly extensions we are debating would be akin to the x86 instruction set itself. When the driver is given this assembly, it compiles it into native opcodes. As such, there is a "single agreed upon internal instruction format in x86 land," it's called x86 assembly.


Parallel pipes can go different ways through this branch, different loop iteration counts etc, so you either need to synch it all somewhere, or you need multiple control units (if you want to be efficient).

Ew. Given these fundamental hardware problems (which I had not realized until now), maybe we won't be getting branches in fragment programs for a while. I had been expecting this generation, but I now I won't be upset to have this pushed back for a generation or 2.


However, if any one of these mechanisms is chosen, it affects the ISA definition, gets exposed, and creates the compatibility issue.

There's the fundamental question: why?

What is it about C that allows for these optimizations transparently that an assembly language would not allow for? Also, why is it that these facilities that allow for the transparent optimizations cannot be given to the assembly as well as a C-like system? Remember, the assembly doesn't have to closely resemble the final hardware data; it can have facilities that don't look much like common assembly.


If you want optimum performance on an Intel processor, you get ICC, period. GCC can't compete, and I even think I've read on the mailing list that this internal "everyone's equal here" is the root cause.

First of all, Intel is in the best possible position to optimize code for their processors; for all we know, they may be sitting on some documents that would help GCC and VC++ compile better for Intel chips.

Secondly, it is highly unlikely that GCC's notion of a middle-layer is what is slowing GCC down, compared to Intel. More likely, it is a fundamental lack of detailed knowledge of the architecture of the Pentium processor required to produce extremely optimized code.

Thridly, GCC does a pretty good job.


But one further question: do you know the internal GCC representation? Can you code directly in this representation? Will it affect your code if that internal representation gets changed? Three times no, probably. GCC's internals are free to evolve as needed because they are not exposed to users.

I'm sure somebody knows GCC's internal representation. Everybody doing ports of GCC has to know, so it must be documented somwhere.

The format we are propsing would not be modified in a destructive way. That is, it would never remove functionality. Nor would it create alternatives to existing opcodes. When the format needs to be changed, it will be modified by adding new opcodes that do something completely different. Otherwise, it is up to the assembler to decide what to do with a given bit of code.

That is why it is important to pick a good assembly representation. Your concern is that you think that we can't pick a good one. I believe that we can, if we consider the possibilites carefully. If they had been working on this rather than glslang for the time it has been around, they would have worked all of the bugs out of the system, and there would be no need to be concerned.

Let's look at the benifits of an assembly-based approach:

1) Freedom of high-level language. We aren't bound to glslang. If we, for whatever reason, don't like it, we can use alternatives.

2) Ability to write in the assembly itself.

The only benifit that glslang has is a potential one. It guarentees that you get optimal compiling from the high-level language. However, the assembly approach does not preclude this either. So really, as long as the assembly approach produces optimal hardware instructions, it is fundamentally superior to the glslang approach.

t0y
07-27-2003, 04:26 PM
Originally posted by Korval:

Let's look at the benifits of an assembly-based approach:

1) Freedom of high-level language. We aren't bound to glslang. If we, for whatever reason, don't like it, we can use alternatives.

2) Ability to write in the assembly itself.

The only benifit that glslang has is a potential one. It guarentees that you get optimal compiling from the high-level language. However, the assembly approach does not preclude this either. So really, as long as the assembly approach produces optimal hardware instructions, it is fundamentally superior to the glslang approach.

If all of this was true we'd all be programming in assembly instead of high-level languages.

A low-level vendor/platform-specific assembly interface is more than enough for control and optimization freaks, and you could still use CG or whatever.

If you're thinking about a general ISA for GPU's and assuming it'll be future-proof then you must realize that for that to become a reality you wouldn't call it an ISA but a high-level language with complicated syntax.
I believe this is the case with gcc/msvc intermediate code, and java bytecodes.

The main problem here is to create a common interface that will last. And the better way to achieve this is using high-level languages and letting the drivers do whatever they want with it.

Korval
07-27-2003, 06:43 PM
If all of this was true we'd all be programming in assembly instead of high-level languages.

You, clearly, do not understand the purpose of this discussion.

I'm not suggesting that everyone be forced to program assembly. What we are suggesting is that a single ISA-equivalent exist that off-line (ie, not in drivers) compilers can compile to as a target. That way, if you don't like the glslang language, for whatever reason, you may freely use Cg, or something you create yourself.


If you're thinking about a general ISA for GPU's and assuming it'll be future-proof then you must realize that for that to become a reality you wouldn't call it an ISA but a high-level language with complicated syntax.

Any evidence of this? It's easy enough to make a claim like this; do you have any actual facts to back it up?


The main problem here is to create a common interface that will last.

Which is precisely what the ARB could have been doing instead of debating features for glslang.


And the better way to achieve this is using high-level languages and letting the drivers do whatever they want with it.

Once again, you make these claims without any actual facts backing them up. I could just as easily retort, "No, it isn't. The best way is to have off-line compiler compile to a common assembly-esque language." But, of course, that isn't a real argument; it's a shouting match.

t0y
07-27-2003, 07:22 PM
You, clearly, do not understand the purpose of this discussion.


Yes I do... I just happen to have a different opinion. I may not have experience with shaders but I do know what they are and how they work. Specially the number of extensions and languages that crop up these last years.




I'm not suggesting that everyone be forced to program assembly. What we are suggesting is that a single ISA-equivalent exist that off-line (ie, not in drivers) compilers can compile to as a target. That way, if you don't like the glslang language, for whatever reason, you may freely use Cg, or something you create yourself.


But you said this about it:



So really, as long as the assembly approach produces optimal hardware instructions, it is fundamentally superior to the glslang approach.


Which may be true in current generation. Do you know exactly what the future will bring us? Isn't it better to leave optimal hardware instructions in its own specific extension? As long as we have a general-purpose language I can't see the problem in that.





If you're thinking about a general ISA for GPU's and assuming it'll be future-proof then you must realize that for that to become a reality you wouldn't call it an ISA but a high-level language with complicated syntax.

Any evidence of this? It's easy enough to make a claim like this; do you have any actual facts to back it up?


We are discussing the future... Do you have facts from the future?
Just see how many "assembly languages" and extensions were created since dx8. They had many things in common but a general assembly language was too difficult to achieve. Wasn't CG created among other things to overcome these problems? And this is just starting to evolve!



Once again, you make these claims without any actual facts backing them up. I could just as easily retort, "No, it isn't. The best way is to have off-line compiler compile to a common assembly-esque language." But, of course, that isn't a real argument; it's a shouting match.


I don't agree with the x86 as the standard ISA for PC's that someone posted before. Things changed alot since the 8086 and, as you know, most code from that days won't run properly in today's systems and vice-versa. Only the paradigm survived. And high-level code! And we're talking about processors not gpu's.... A gpu generation lasts one year or two!

If you want to code shaders that run dx10 hardware only then your solution will be fine for the time being. Next year some vendor comes up with a way to speed up some special case and another general-purpose-one-generation-only assembly extension will come up...

al_bob
07-27-2003, 08:08 PM
Things changed alot since the 8086 and, as you know, most code from that days won't run properly in today's systems and vice-versa.
Not quite right - 8086 code will run fine on a Pentium 4; as well, if not better than on the 8086. It might not be the *fastest* code for the Pentium 4, but it doesn't need to - as long as it's *faster*.
Not only that, but converting that code to a near-optimal format for the Pentium 4 requires far *far* less work than writting an optimizing C compiler. Most of the changes are simple look-ups!


Next year some vendor comes up with a way to speed up some special case and another general-purpose-one-generation-only assembly extension will come up...
That's exactly what the drivers do! Do you truly believe that NV30 and R300's native assembly language is ARB_fp? Surely not! They could run Java bytecode for all you know.

My point is that it's up to the driver to perform the conversion, and there isn't necessarily a one-to-one mapping between ARB_fp (or whatever other fp assembly) and the hardware's native language.

Edit: typos

[This message has been edited by al_bob (edited 07-27-2003).]

zeckensack
07-27-2003, 08:50 PM
Originally posted by al_bob:
Not quite right - 8086 code will run fine on a Pentium 4; as well, if not better than on the 8086. It might not be the *fastest* code for the Pentium 4, but it doesn't need to - as long as it's *faster*.Kidding? You've just given the archetypical example, where you can more than quadruple (float) throughput by not using assembly. In fact, I just wanted to construct a similar example as a rebuttal for Korval.
If you hand x87 assembly nicely scheduled for a 486DX to a P4, you'll lose. If you use the same high level code you should have used ten years ago to begin with, on an up to date compiler, you win.
You don't care about that?

Not only that, but converting that code to a near-optimal format for the Pentium 4 requires far *far* less work than writting an optimizing C compiler. Most of the changes are simple look-ups!Uhm. You extract back parallelism from non-SIMD assembly to make SIMD assembly, that's what you're suggesting? That's an optimizing compiler. A second one. Well, maybe you could call it a second compiler pass, but no, you wish to expose this layer to users, don't you?

This would be all fine and dandy if it were one monolithic thingy. Saves you a parsing step and redundant error checking at a minimum. I've already portraied more serious issues, would sure be nice if somebody would answer my concerns. Why does a middle layer need to be defined and exposed?

That's exactly what the drivers do! Do you truly believe that NV30 and R300's native assembly language is ARB_fp? Surely not! They could run Java bytecode for all you know.You know, they could even be MIMD or VLIW ... *cough* NV_rc *cough*.

My point is that it's up to the driver to perform the conversion, and there isn't necessarily a one-to-one mapping between ARB_fp (or whatever other fp assembly) and the hardware's native language.Exactly!
Convert the code to the hardware's native language. The conversion to any intermediate "this isn't the real thing anyway"-language is completely devoid of any merit.

Java may benefit from this approach because the size of distributed code is a concern. Java also pays a very real performance penalty for it. Just like 486 assembly code incurs a penalty on P4s (despite the P4 spending a whole lot of transistors for legacy support, mind you).

"Traditional" software is distributed precompiled because of several issues I don't even want to enumerate here, because none of them apply to shader code.

In case anyone overlooked it:
Why do we need to define and expose any sort of middle interface and layer an external compiler on top of that? Where are the benefits vs a monolithic compiler straight from high level to the metal?

al_bob
07-27-2003, 09:43 PM
You've just given the archetypical example, where you can more than quadruple (float) throughput by not using assembly. In fact, [...] If you use the same high level code you should have used ten years ago to begin with, on an up to date compiler, you win.

Let's ignore for a second the change in the example you used. We're comparing x87 to SSE 1/2 now.

Ignoring precision issues, yes, you *can* compile C floating-point code into SSE. But you can do just as well on assembly (x87) code! This (unfortunately) isn't a problem that can only be solved by a high-level language - the methods that work with C work equally well with x86 assembly.

As Korval pointed out:

What is it about C that allows for these optimizations transparently that an assembly language would not allow for? Also, why is it that these facilities that allow for the transparent optimizations cannot be given to the assembly as well as a C-like system? Remember, the assembly doesn't have to closely resemble the final hardware data; it can have facilities that don't look much like common assembly.


That's an optimizing compiler. A second one
Yes it is. I see only one on-line optimizing compiler though (the one that matters). The other (if present) is off-line. I don't see what the problem is. This is done every day by modern CPUs, in hardware. It could also be done in hardware, on GPUs, but typically isn't.


Why does a middle layer need to be defined and exposed?
Your conserns have already been addressed; I shall repeat them here:
- You get your pick of HLSL. If Cg is your thing, then by all means use it. If you don't like Cg, then use whatever else you like.
- If none of those suit you, you still have access to the low-level assembly, so you can write your own code, or write your own HLSL. Call it zeckensackG or something, which may or may not be the same as t0yG.


The conversion to any intermediate "this isn't the real thing anyway"-language is completely devoid of any merit.
Perhaps you should explain that to the nice people who write GCC. After all, they're facing similar problems to what the ARB is: their high-level compiler needs to work on all these different platforms. As to not duplicate most of the optimizer, the platform-independent optimizations are performed on the C code, which is then converted to an intermediate-level assembly language. That intermediate assembly is then converted (and optimized) to the platform-specific code.
And btw, GCC, on integer code, does not suck.


"Traditional" software is distributed precompiled because of several issues I don't even want to enumerate here, because none of them apply to shader code.
On the contrary - most of the issues deal with IP and/or user-interfacing. There are no real technical issues why "traditional" software isn't distributed in source-form. In fact, some Linux distributions (Gentoo) install themselves by downloading source code fom the internet and compiling it specifically for your platform.

Korval
07-27-2003, 10:50 PM
We are discussing the future... Do you have facts from the future?

No, but I'm not the one making factual claims either. I'm providing evidence that the ISA approach can work as good, if not better, than the glslang approach.


Things changed alot since the 8086 and, as you know, most code from that days won't run properly in today's systems and vice-versa.

They won't run on today's OS's, or maybe motherboards, or other hardware. However, the fundamental machine language itself can be executed on a P4 just as well as a 286 (assuming that 32-bit extensions or other instruction-set extensions aren't in use).


If you hand x87 assembly nicely scheduled for a 486DX to a P4, you'll lose.

Define "lose". To me, a loss would be, "It runs slower than it did before." A win would be, "It runs faster."

Now, for the Intel x86 architecture case, this may be correct, because the processor is not allowed to do things like re-order large sequences of opcodes. It can do some out-of-order processing, but not to the level a compiler can.

In the case of this proposal for GPU's, driver writers get the entire program to compile. Where the P4 can't produce optimal instructions simply because it can't help but work with what it's got, the driver can compile it and do whatever re-ordering is required.

And, even so, let's say that hardware 2 years from now running assembly compiled from a high-level language written today doesn't perform as fast as it would if the high-level language were compiled directly. So? As long as it is faster than it was before (and it should still be, on brute force of the new hardware alone), then everything should be fine.


The conversion to any intermediate "this isn't the real thing anyway"-language is completely devoid of any merit.

Unless you don't want to be a slave to glslang, that is. If you, say, want to have options as to which high-level language to use, OpenGL is clearly not the place to be. No, for that, you should use Direct3D.

Maybe, for whatever reason, I like Cg more. Maybe, for whatever reason, I don't like any of the high-level languages and I want to write my own compiler. Or, maybe my 2-line shader doesn't need a high-level language, and I want to just write it in assembler.

The fact that you are happy with glslang does not preclude anyone else from not liking it, or wanting an alternative.


Java may benefit from this approach because the size of distributed code is a concern. Java also pays a very real performance penalty for it.

That's not entirely true. JIT compilers, these days, can get native Java (anything that's not windowed) to get pretty close to optimized-C. 80-95% or so. And these are for large programs, far more complicated than any shader will ever be.

And Java doesn't use bytecode to shrink the size of the distribution. It uses bytecode becase:

1) They believe, as many do, that the idea of having people compile a program they downloaded is assanine and a waste of time.

2) They want to be able to hide their source code.

3) All bytecode is is an assembly language that the Java interpreter understands. They needed a cross-platform post-compiled form of code. The solution is some form of bytecode.


Why do we need to define and expose any sort of middle interface and layer an external compiler on top of that? Where are the benefits vs a monolithic compiler straight from high level to the metal?

You must mean, of course, besides the reasons I have given twice and 'al_bob' gave once?

And let's not forget the notion that writing an optimizing C compiler is a non-trivial task. Neither is writing an optimizing assembler of the kind we are refering to, but it is easier than a full-fledged C-compilier. Easier means easier to debug, less buggy implementations, etc. And, because there will then only need to be one glslang compiler, all implementations can share that code.

Also, one more thing. nVidia is widely known as the company that set the standard on OpenGL implementations. They were the ones who first really started using extensions to make GL more powerful (VAR, RC, vertex programs, etc). Granted, Id Software didn't really give them a choice, but they didn't make nVidia expose those powerful extensions. I doubt there are any games that even use VAR, and even register combiners aren't in frequent use, though 2 generations of hardware support them. Yet, nVidia still goes on to advance the cause of OpenGL.

nVidia has made no bones about not being happy with the current state of glslang. Now, they can't really go against OpenGL overtly (by dropping support), because too many games out there use it (Quake-engine based games, mostly). But, they don't have to be as nice about exposing functionality anymore. Or about having a relatively bug-free implementation. As long as those bugs don't show up on actual games (just using features that real game developers use), it doesn't hurt nVidia.

Also, they can choose not to provide support for glslang at all, even if it goes into the core. They can't call it a Gl 1.6 implementation, but they can lie and call it nearly 1.6. Even Id Software can't afford to ignore all nVidia hardware; they'd be forced to code to nVidia-specific paths. And by them doing so, they would be legitimizing those paths, thus guarenteeing their acceptance.

Rather than risk this kind of split in the core (where you have the core functionality that a good portion of the marketshare supports, and functionality that a good portion doesn't. This isn't good for OpenGL), the optimal solution would have been the compromise we're suggesting here. There would be a glslang, but it wouldn't live in drivers. It would compile to an open extension defining an assembly-esque language that would be compiled into native instructions.

That way, you can have a glslang that the ARB can control, but you don't force all OpenGL users to use it.

Granted, the reason the ARB didn't go that way was not some notion of, "putting glslang into drivers is the 'right thing'." No, it's there because it hurts Cg, and therefore nVidia. ATi and 3DLabs have a stake in hurting things that are in nVidia's interests. Killing the ability for Cg to be used on OpenGL in a cross-platform fashion is just the kind of thing that they would like to do to nVidia. And, certainly, using the glslang syntax over the Cg one (even though neither offers addition features over the other) was yet another thing ATi and 3DLabs wanted to do to hurt Cg; it makes it more difficult for Cg to be "compiled" into glslang.

[This message has been edited by Korval (edited 07-28-2003).]

zeckensack
07-27-2003, 11:21 PM
Originally posted by al_bob:
Ignoring precision issues, yes, you *can* compile C floating-point code into SSE. But you can do just as well on assembly (x87) code! This (unfortunately) isn't a problem that can only be solved by a high-level language - the methods that work with C work equally well with x86 assembly.

As Korval pointed out:
"What is it about C that allows for these optimizations transparently that an assembly language would not allow for? Also, why is it that these facilities that allow for the transparent optimizations cannot be given to the assembly as well as a C-like system? Remember, the assembly doesn't have to closely resemble the final hardware data; it can have facilities that don't look much like common assembly."Again.
The issue is that it's not monolithic. You need to define and expose the middle interface because otherwise you couldn't layer a compiler on top of it.
As soon as you expose it, you need to keep supporting the defined rules because users can go there. You need to duplicate the parsing and error checking already done in the front end. You risk destroying semantics. You risk underexposing resources for the sake of cross-vendor compatibility.


Originally posted by al_bob:
Yes it is. I see only one on-line optimizing compiler though (the one that matters). The other (if present) is off-line. I don't see what the problem is. This is done every day by modern CPUs, in hardware.No, it's not done by modern CPUs. A CPU can't transform x87 code to SSE code because it is obliged to follow the defined operation of the ISA. x87 has different rounding modes, exception handling, flags and register space. You may reuse the same execution units (Athlon XP), or you may not (P4), but the code will never be executed like SSE code would be executed. The public ISA nails 'em down. (High level) source code doesn't.

It could also be done in hardware, on GPUs, but typically isn't.Okay.

Your conserns have already been addressed; I shall repeat them here:
- You get your pick of HLSL. If Cg is your thing, then by all means use it. If you don't like Cg, then use whatever else you like.
As you believe code reformatting is such a fun thing to do, and you also seem to appreciate layered compiler models, maybe you could just as easily write a Cg to GLslang converter. May I suggest an offline preprocessing step?

- If none of those suit you, you still have access to the low-level assembly, so you can write your own code, or write your own HLSL. Call it zeckensackG or something, which may or may not be the same as t0yG.No, I won't.
"Extension mess"
"Multiple codepaths"
"deprecated"
"waste of energy"
Any of these terms ring familiar?


Perhaps you should explain that to the nice people who write GCC. After all, they're facing similar problems to what the ARB is: their high-level compiler needs to work on all these different platforms. As to not duplicate most of the optimizer, the platform-independent optimizations are performed on the C code, which is then converted to an intermediate-level assembly language. That intermediate assembly is then converted (and optimized) to the platform-specific code.Yeah. I think I've already covered that by saying that GCC's internal code representation is not exposed to users of GCC and is therefore free to evolve.

And btw, GCC, on integer code, does not suck.It certainly doesn't. I occasionally (http://home.t-online.de/home/zsack/mandelbrot.html) use GCC myself and appreciate every last bit of work that went into it. You see, it's a monolithic application that can turn a portable high level programming language into reasonably efficient machine code, quite a feat.

GCC supposedly doesn't fare too well in SpecCPU though.


On the contrary - most of the issues deal with IP and/or user-interfacing.And these apply to shader code? Well, go ahead, encrypt your shaders, but don't forget encrypting your textures and sound files, too. I find this idea rather irritating, but I won't stop you.

There are no real technical issues why "traditional" software isn't distributed in source-form. In fact, some Linux distributions (Gentoo) install themselves by downloading source code fom the internet and compiling it specifically for your platform.Absolutely. If only shaders could work this way, too ... (we could leave the internet downloading part out).

harsman
07-28-2003, 12:50 AM
You all seem to argue about something else than how high level the shader language should be. The real issue is exaqctly how high level the language should be not if it will be assembly like or look like c.

Let me explain: Calling what you (eventually) will send to OpenGL an assembly language is largely a misnomer, simply because it won't just be "assembled" if current hw is anything to go by, it'll be compiled. This means you won't have the simplicity of a direct mapping to hardware instructions. So for this compilation step to work as good as possible (after all, we don't have to support old legacy code so there's no need to be backwards compatible with anything) we need to provide as much hints as possible to the driver. ARB_fp obviously doesn't provide enough hints as it is. Nvidia would like more info about precision than a monolithic hint and they also like register use to be *very* low, while ATI doesn't care much about register use but have difficulty getting their analysis of falsely dependant lookups to work correctly. There are even more differences like native sin vs. polynomial approximation vs texture lookup that only will be solved well if the driver knows that you want the sine and not something else.

All this considered I think it's pretty obvious that the language needs to be high level (in the same sense C is high level). ARB_fp isn't high level enough as it is, so any new language needs to work out those kinks first. I think glslang is low level enough for most graphics related work, all the languages I would want are much higher level, a general material system or an image processing language for example.

There's nothing wrong with having a low level language interface as long as it retains code semantics and is high level enough for diverse hardware to execute it efficiently. GLslang isn't very high level by this standard, and that's a good thing.

The question then becomes how much compiling we want to do in the driver. Even with ARB_fp the IHVs seem
to do lots of optimization in the driver so we already have this to an extent.

Of course all the string parsing required to compile something as complex as glslang might be overkill. Defining a intermediate bytecode that retains the semantics of the original code might be a good idea if it reduces work for the driver and enables faster linking. I have a feeling that we'll be doing lots of cut and pasting and/or linking of small shader fragments to get a runnable shader in a the future. This is the only way to get light effects shaders to work with a general material system and custom shaders for example (more like renderman shaders). If the current glslang approach leads to too much overhead then we should of course move to some sort of bytecode that can be linked faster. I doubt that it will need to be very much lower level than the current glslang however.

t0y
07-28-2003, 03:18 AM
Originally posted by al_bob:
That's exactly what the drivers do! Do you truly believe that NV30 and R300's native assembly language is ARB_fp? Surely not! They could run Java bytecode for all you know.

Really? http://www.opengl.org/discussion_boards/ubb/wink.gif If they were running java then shaders could be a lot more complex and we're underusing the hardware! You are just saying that arb_fp limits a gpu running bytecode and that's exactly what I want to show you..

The fact is that today's hardware interface is very, very similar to arb_fp. But you you know this won't last. I don't like the idea of having different code for arb_fp1 arb_fp1.1 arb_fp2k5. It's enough to please us now but in the long run it will be almost the same as having hardware-specific asm.



Originally posted by Korval:
quote:We are discussing the future... Do you have facts from the future?

No, but I'm not the one making factual claims either. I'm providing evidence that the ISA approach can work as good, if not better, than the glslang approach.

So, your "evidence" is not factual. Seems like a new trend nowadays...




quote:Things changed alot since the 8086 and, as you know, most code from that days won't run properly in today's systems and vice-versa.

They won't run on today's OS's, or maybe motherboards, or other hardware. However, the fundamental machine language itself can be executed on a P4 just as well as a 286 (assuming that 32-bit extensions or other instruction-set extensions aren't in use).



But of course! A new extension for each generation! But isn't that what's supposed to change? How future proof is that?

Alternatively, if our C/C++ code does 32bit math, you know that it'll work in all platforms with varying intruction sets. You could code for 8087 even if you dind't have one, remember? It's a question of the flexibility of the processor.



And, even so, let's say that hardware 2 years from now running assembly compiled from a high-level language written today doesn't perform as fast as it would if the high-level language were compiled directly. So? As long as it is faster than it was before (and it
should still be, on brute force of the new hardware alone), then everything should be fine.


If speed is what matters most, just use the hardware-specific extensions. If we want shaders that work across generations optimally (compiler dependent of course), a high level language is a good solution. Both forward and backward compatability is important. What I don't get here is that Cg was supposed to work this way and you are not agreeing with me.




Also, they can choose not to provide support for glslang at all, even if it goes into the core. They can't call it a Gl 1.6 implementation, but they can lie and call it nearly 1.6. Even Id Software can't afford to ignore all nVidia hardware; they'd be forced to code to nVidia-specific paths. And by them doing so, they would be legitimizing those paths, thus guarenteeing their acceptance.


You are forgetting the huge userbase ATi is getting recently. This was true in the early days when nvidia ruled and other vendors were "ignored". This is easily getting to an ATi vs nVidia flame war when we should be talking about openGL.




Granted, the reason the ARB didn't go that way was not some notion of, "putting glslang into drivers is the 'right thing'." No, it's there because it hurts Cg, and therefore nVidia. ATi and 3DLabs have a stake in hurting things that are in nVidia's interests. Killing the ability for Cg to be used on OpenGL in a cross-platform fashion is just the kind of thing that they would like to do to nVidia. And, certainly, using the glslang syntax over the Cg one (even though neither offers addition features over the other) was yet another thing ATi and 3DLabs wanted to do to hurt Cg; it makes it more difficult for Cg to be "compiled" into glslang.

Now you're getting paranoid. glslang was a work in progress in the ARB for a while, and CG is an independent project. It was nVidia who chose this path, not the ARB, us, ATi, 3DLabs or whatever other players in this game.


If they lose it, it's their own fault.

Korval
07-28-2003, 09:02 AM
All this considered I think it's pretty obvious that the language needs to be high level (in the same sense C is high level).

Do you have any actual basis for this claim? While we're both in agreement that ARB_fp doesn't cut it, that doesn't preclude something that looks similar to ARB_fp doing the job.


Defining a intermediate bytecode that retains the semantics of the original code might be a good idea if it reduces work for the driver and enables faster linking.

The question, one that the members of the ARB are probably best suited to answer, is how much of the semantics are absolutely required to get good code. Really, does specifying an expression like this:




D = a*b + dot(r + p, z * q)


really do anything for optimization that simply specifying the sequence of "opcodes" doesn't?


I have a feeling that we'll be doing lots of cut and pasting and/or linking of small shader fragments to get a runnable shader in a the future. This is the only way to get light effects shaders to work with a general material system and custom shaders for example (more like renderman shaders).

Most definately (I'm defining a library to manage just such a system). The glslang shader linking mechanism (though I've never heard a really good explaination of the details on how it works) is, to me, the one saving grace of the language. It is, also, the feature that should definately go into any shading language paradigm.


The fact is that today's hardware interface is very, very similar to arb_fp.

Oh really? I wouldn't be too surprised if the 9500+ series wasn't just running suped-up versions of ATI_fragment_shader-based hardware. The dependency chain is what makes me think of this. If you made an ATi_fragment_shader implementation with 4 passes and lots of opcodes/texture ops per pass, you could implement ARB_fp on top of that kind of hardware, as long as you could build a dependency chain.

And even if they're not, how can you be so sure that the hardware looks anything like ARB_fp?


I don't like the idea of having different code for arb_fp1 arb_fp1.1 arb_fp2k5.


A new extension for each generation! But isn't that what's supposed to change? How future proof is that?

I hope you don't expect glslang 1.0 to be the final version. If you do, you'll be sorely disappointed.

Whatever shading language the driver uses will change over time (probably about once per generation). Whether it happens to be glslang or an ISA of some form doesn't matter.


If speed is what matters most, just use the hardware-specific extensions. If we want shaders that work across generations optimally (compiler dependent of course), a high level language is a good solution. Both forward and backward compatability is important.

Why is "forward compatibility" important? As long as old shaders work on new hardware, and work faster than they did on older hardware, then you're getting all the functionality you need.

Also, as I pointed out before, you have failed to give a reason as to why a lower-level solution could not be compiled into just as optimized a solution as the higher-level one. All you've done is simply state that this is the case; that doesn't make it true.


You are forgetting the huge userbase ATi is getting recently.

Larger, maybe. But do you think Id can afford to only sell their games to 40% of the potential market?


Now you're getting paranoid.

Not really. Allow me to explain.

Without a glslang burned into the core, if instead they went with our approach, Cg would likely become the de-facto standard in terms of high-level shading languages. Oh, sure, you might have things like the Standford shader running around, but they wouldn't be frequently used in production of actual graphics products.

This doesn't do ATi or 3DLabs any good. It helps nVidia's position, which weakens theirs.

So, explain how it is that nVidia ends up arguing for functionality that helps Cg while ATi and 3DLabs argue against it? It is obviously self-interest on nVidia's part, but why is it so hard to believe that ATi and 3DLabs aren't engaging in self-interest of their own?


glslang was a work in progress in the ARB for a while, and CG is an independent project.

Granted that, nVidia took Cg to the ARB once it was apparent that the ARB had decided to use a C-like solution for shaders (something nVidia fought against). However, the ARB (ie, nVidia's compeditors) refused to use Cg's syntax even though it is functionally equivalent to what glslang provides (or, at least, provides in hardware). Granted, there's something to be said for the ARB wanting to keep control of the language, but they could have at least used the basic Cg syntax. That way, we wouldn't have 2 different hardware-based shading language syntaxies running around (like we do now).


It was nVidia who chose this path

To nVidia's defense, they started developing Cg before glslang was publically being tossed about. Indeed, I believe (but am not sure) that Cg was publically released before 3DLabs unveiled their glslang proposal to the ARB. nVidia saw a need just as much as 3DLabs did. And they went to fulfill that need.

[This message has been edited by Korval (edited 07-28-2003).]

V-man
07-28-2003, 09:22 AM
Why all the talk about compilers, x86, SSE, 3dnow! and the rest?

I think we had settled this question.

quote (more or less):
---------------------------
The answer is HLSL advantage over ASM style is that it is easier and makes you a more efficient coder.

The person writing the compiler will have a hard time coding the HLSL compiler (compared to ASM compiler), but it might save developers some time.

---------------------------

Are there any other advantages that HLSL has over ASM and vice-versa?

harsman
07-28-2003, 10:36 AM
All this considered I think it's pretty obvious that the language needs to be high level (in the same sense C is high level).
-------------------------------------------------------------------------------

Do you have any actual basis for this claim? While we're both in agreement that ARB_fp doesn't cut it, that doesn't preclude something that looks similar to ARB_fp doing the job.


I don't preclude something that looks like ARB fp doing the job, but I'm betting it will be about as high level as C. C was after all designed to be a portable assembler. As long as the language supports the demands I outlined above I don't particularly care how it looks as long as there is something c/cg/HLSL/glslang like or higher level for *me* to program in. Viewing the history of OpenGL features that mapped badly to hardware but that were there for programmer convenience weren't exactly success stories however so pushing programmer convenience above everything else is probably not a good idea. It should definitely be considered strongly though.

CatAtWork
07-28-2003, 10:43 AM
"Also, as I pointed out before, you have failed to give a reason as to why a lower-level solution could not be compiled into just as optimized a solution as the higher-level one. All you've done is simply state that this is the case; that doesn't make it true."

If your low level instructions are being compiled and optimized into something that's not just a mapping, why not use the higher level language? It's more readable, and hopefully less error-prone.

CatAtWork
07-28-2003, 10:48 AM
After re-reading that post, it sounds like you want to write to the metal NOW, for what you think will net you the most performance. Then, in the future, you want your code to be backwards compatible with whatever comes out then.

I'm confused.

al_bob
07-28-2003, 11:51 AM
why not use the higher level language? It's more readable, and hopefully less error-prone.
Because it's far easier to build a fast cross-assembler than a fast C compiler. That is, you usually don't want your driver to spend ~1 second compiling your shader.

CatAtWork
07-28-2003, 11:56 AM
What's a typical cross-assemble time we'd expect? I'm assuming here that you want to upload a different shader, or set of shaders, every frame. Otherwise, does the compile time really matteR?

[This message has been edited by CatAtWork (edited 07-28-2003).]

Lurking
07-28-2003, 10:09 PM
now it seems to be a countdown until the new drivers come out supporting 1.5! I just hope that my 5900 Ultra can handle the OpenGL Shader Language.

- Lurking

evanGLizr
07-29-2003, 02:57 AM
Originally posted by Korval:

To nVidia's defense, they started developing Cg before glslang was publically being tossed about. Indeed, I believe (but am not sure) that Cg was publically released before 3DLabs unveiled their glslang proposal to the ARB. nVidia saw a need just as much as 3DLabs did. And they went to fulfill that need.

That is not correct. glslang was first presented on OpenGL BOF at Siggraph 2001. At that time Bill Mark (http://www.cs.utexas.edu/users/billmark/) (CG's lead designer) was still working at Stanford as a researcher on the Stanford Real-Time Programmable Shading Project (http://graphics.stanford.edu/projects/shading/) , it wasn't until October 2001 when he joined NVIDIA (From Oct 2001 - Oct 2002, I worked at NVIDIA as the lead designer of the Cg language).

The original "GL2" whitepapers were presented to the ARB meeting on September (http://www.opengl.org/developers/about/arb/notes/meeting_note_2001-09-11.html) the same year and made public on December 2001 (http://www.3dlabs.com/support/developer/ogl2/whitepapers/index.htm) .

CG wasn't offered to the arb until a year later or so:


"Cg" discussion
NVIDIA wanted to discuss their goals with Cg (although they are not offering Cg to the ARB).

June 2002 ARB meeting (http://www.opengl.org/developers/about/arb/notes/meeting_note_2002-06-18.html#cg) .

Those are the facts, take your own conclusions.

Zengar
07-29-2003, 06:51 AM
You know, I don't like an idea of a high-level language that is build into OpenGL. I can't say why. It... somehow limits my free space. I would prefer every single card having an assembly processor(may it be the different assembly every time) and that having an compiler that translates my hight level code into the assembly, with all possible optimisations. really, this is the way glslang and all HLSL work(on a driver level). But with glslang this asembly is being hidden inside the driver - the developers would have no access. What I mean - access to the assembly should be granted openly, so that anyone could write his own HLSL. I think it's a big mistake to make a HL language to the core.

davepermen
07-29-2003, 07:12 AM
Originally posted by Korval:



D = a*b + dot(r + p, z * q)


really do anything for optimization that simply specifying the sequence of "opcodes" doesn't?

yes, if there is an opcode wich does exactly that in an extension of the standard asm support..

thinking of ati here, wich could, if a and b would be scalars, do it in the parallel unit parallel to the dot product (if that is a dp3..)..

makes it much more easy to directly compile down to the code of the actual hardware, than instead having to do two steps. its like loosing precicion if you rip a cd to mp3 and then convert to ogg. no mather how high the mp3 settings are, the ogg will never sound as good/equal as if you directly rip it to ogg..

here, "compression artefacts" are losses in optimisation, means, compiling twice results in overall less performance in the end.

there is no gain of using asm. none.

Korval
07-29-2003, 10:32 AM
yes, if there is an opcode wich does exactly that in an extension of the standard asm support..

How is interpreting that equation any easier than interpreting the sequence of opcodes that it would generate? They both evaluate to expression trees; what is it about the C-method that makes optimizing it into a single hardware opcode more likely than the assembly one? There's nothing that says that each assembly opcode must be equivalent to one or more hardware opcodes. If the assembly compiler sees something it recognizes or knows to look for, then it can optimize it just as well (and probably faster, since recognizing it is easier) than the C case.


thinking of ati here, wich could, if a and b would be scalars, do it in the parallel unit parallel to the dot product (if that is a dp3..)..

Once again, why is it that the assembly compiler can't recognize these opcodes and do optimizations from them? Nothing is lost symantically in compiling the language into the ISA assembly.


makes it much more easy to directly compile down to the code of the actual hardware, than instead having to do two steps. its like loosing precicion if you rip a cd to mp3 and then convert to ogg. no mather how high the mp3 settings are, the ogg will never sound as good/equal as if you directly rip it to ogg..

This is a false analogy, and you should have known better than to propose this one.

Nothing is lost during the compilation to assembly. The ISA was designed such that nothing of actual value to the compiler is lost. The only difference between an expression written as in my example and an assembly-like expression is that the latter is easier to parse. They both contain the exact same information.


there is no gain of using asm. none.


Once again, saying it doesn't make it true. Making assertions doesn't win arguments. Arguments win arguments.

Besides, you're not even thinking like a programmer in this. You thinking from having a pre-concieved notion of, "Assembly bad, C good." Consider, for a moment, being told that the ISA approach is the way it's going to go. And now, you have to write an optimizing compiler for it. You don't have the luxury of saying, "Assembly bad, C good." You've got a job to do. You have to make it work. And, by looking into the problem from that direction, you will come to the realization that it can work, and just as well as the C case.

davepermen
07-29-2003, 07:16 PM
i use myself a lot of assembly on intel platforms myself, so i would never say assembly is bad.

but tell me any reason its good here?

if you don't have a perfect mapping from highlevel to assembler, a merely 1:1 mapping, you will loose performance because you loose info you can use for optimisation. and that IS true in all sort of assemblies.

else every p4 could by itself determine loops and all that, and rewrite everything for sse simd runtime.

the p4 is a great example on a platform that does not fit well to the oneforall x86/x87, and gains a lot by using direct highlevel to machine code compilation without going to a very old asm before..

i get up to 5x speed increase.

as opengl2.0 and its shader language want to stay for the next few, say 10 years, too, just like the old one, you have to take care of massive structural changes in hw that cannot get expressed really nice with asm anymore.

every compilation is a lossy "compression". my analogy holds still true. for optimisation, its lossy.

al_bob
07-29-2003, 08:21 PM
you will loose performance because you loose info you can use for optimisation
You and others keep repeating this. Yet no one can produce a single example where this is irrevocably true.


the p4 is a great example on a platform that does not fit well to the oneforall x86/x87, and gains a lot by using direct highlevel to machine code compilation without going to a very old asm before..
You mistaken why compilers compile directly to assembly - it's for the *compilation* speed gain, not the run-time speed-gain.
This same move happened a few years back when C++ compilers would output intermediate assembler (or native assembler) instead of C code for a C compiler.


as opengl2.0 and its shader language want to stay for the next few, say 10 years, too, just like the old one, you have to take care of massive structural changes in hw that cannot get expressed really nice with asm anymore.
There doesn't have to be one single static version of the assembly language.

sqrt[-1]
07-29-2003, 08:33 PM
Just to take a recent example on old hardware:
If you had F = (A*B) + (D*E)

On standard assembly this would go to a MUL then a MAD instruction (and perhaps a temporary register) . However in register combiners you could have done the whole line in one instruction. So unless you want to add a ton of specific "weird" instructions to the assembly spec, you could never optimize for all these cases without allowing the driver access to the high level language.

The only real argument I can see against a high level language whould be a driver not optimizing as much as a person could (or performing unnecessary instructions). However, since there is no specific hardware to target, this low level optimization cannot be done by developers and is best left to the driver.

al_bob
07-29-2003, 08:38 PM
On standard assembly this would go to a MUL then a MAD instruction (and perhaps a temporary register) . However in register combiners you could have done the whole line in one instruction.
You underestimate the people writing drivers. After all, MUL / MAD can be used to build a syntax tree (or whatever it's called - I'm no compiler expert), which can then be optimized to whatever else the hardware supports.

Pop N Fresh
07-29-2003, 09:32 PM
You underestimate the people writing drivers. After all, MUL / MAD can be used to build a syntax tree (or whatever it's called - I'm no compiler expert), which can then be optimized to whatever else the hardware supports.So the "assembly language" is actually going to parsed, a syntax tree created and then code generated from the tree. ie: either way you're putting a full compiler into driver.

That being the case, what does the "assembly language" get you other than the ability to write MUL TEMP, a, b; MAD frag.out, c, d, TEMP; instead of out.frag = (a * b) + (c * d) ?

davepermen
07-29-2003, 09:46 PM
Originally posted by al_bob:
[B]You and others keep repeating this. Yet no one can produce a single example where this is irrevocably true.

use cg and you'll see yet quite some losses..

use a to-x86 compiler and run it on a p4 and you see quite some losses

if the assemblylanguage in the end is just another representation of the c-style language, its useless, as you can as well compile the c-style language if it has to get compiled in the drivers anyways.

if its not, then it means performance losses.


there are those two ways:

compile to a general asm (like x86), and directly run that. that means all hw will be limited to x86 only implementations. p4 shows how that can limit performance

convert to a general assemblerstyle, or even binary representation of the c-code, and send that to the driver. compile there to get best performance.


the first one is stupid, i guess everyone agrees.
the second one is debatable, but i would prefer a binary intermediate format then. it gains that you can add other language-to-glbin compilers, and stuff..
not that i like that you can have tons of languages to do the same task. it means people will use tons of languages, depending on their likes, and that means you have to be able to read them all, too.. much more work for everyone.


the current idea is to use the c-style language as the representation of the shader, and optimize from this directly to the best.

and i think everyone agrees that we should have only one optimisation/compilation step. from shader code to shader.

its not actually important in wich language the shader is stored, c,asm,binary, then..

i'd go for binary then. like java..

al_bob
07-29-2003, 10:04 PM
the second one is debatable, but i would prefer a binary intermediate format then. it gains that you can add other language-to-glbin compilers, and stuff..
not that i like that you can have tons of languages to do the same task. it means people will use tons of languages, depending on their likes, and that means you have to be able to read them all, too.. much more work for everyone.
How is that "glbin" different from the assembly language? It has all the same limitations and problems as assembly.

The only real differences are:
- "glbin" is already parsed. No need to do that in the driver.
- "glbin" isn't directly writable to.

Whether the second one or not is an advantage is debatable (although I like the idea of exposing this in text form).

The first one though is what some of the people here are worried about. Ever wrote an assembly-language parser? It's much *much* easier and faster than a C parser, simply because the syntax is compeltely dumbed down.


use cg and you'll see yet quite some losses..
I'm confused now. How would you know how to optimize for the given card, given that the information isn't readily available? How is the Cg compiler mature? Is there an HLSL compiler that you can name that will compile to native code so that you can compare to Cg?


and i think everyone agrees that we should have only one optimisation/compilation step. from shader code to shader.
C compilers have multiple optimization stages (at the front-end and the back-end, at least; Some have more). Why should it be different for shaders?

kansler
07-29-2003, 11:36 PM
Originally posted by Zengar:
What I mean - access to the assembly should be granted openly, so that anyone could write his own HLSL. I think it's a big mistake to make a HL language to the core.

MicrosŲft Visual Fragment++ anyone?

Korval
07-29-2003, 11:54 PM
use a to-x86 compiler and run it on a p4 and you see quite some losses

Compared to what? An equivalently-clocked P3? The P4 will eat it up because it has a larger front-side bus. The losses due to the CPU will be hidden because the app is more memory bound than anything else.

Compare it to an equivalently-clocked AthlonXP? No contest, the XP will win. Of course, Athlons have always won against Pentium-class processors, clock-for-clock that is. You can compile the code specifically for a P4, and the XP will still smoke it.

The second point, however, is irrelevant (though still true). The first point is the important one. New hardware doesn't just change how the internals of the shader processor works. The entire thing gets faster. Whatever speed losses, compared to optimized code, exist can be covered up by the speed gains from mere brute force.

And, of course, that assumes that the compiler for this new GPU lacks the information necessary to compile the ISA code optimially, which you have still provided no proof of. Give me one example, just one example, of some fundamental structure in C that must be retained in order for optimization to work properly.


On standard assembly this would go to a MUL then a MAD instruction (and perhaps a temporary register) . However in register combiners you could have done the whole line in one instruction. So unless you want to add a ton of specific "weird" instructions to the assembly spec, you could never optimize for all these cases without allowing the driver access to the high level language.

The solution would be a compiler that was smart enough to regocnize this expression in the assembly, and optimize it appropriately. If the compiler isn't smart enough, then nVidia clearly hasn't done their job correctly.

You'd have to do the same in a C compiler. Only, it's a lot harder to parse.


So the "assembly language" is actually going to parsed, a syntax tree created and then code generated from the tree. ie: either way you're putting a full compiler into driver.

We already have that. ATi's ARB_fp drivers have to build a dependency graph for texture accesses just to compile it's shaders. And 3DLabs's hardware is scaler based; it doesn't look anything like those opcodes.

What we want is a language that is low level enough that compilers for different high-level languages can be written. We don't mind optimizations being done in the drivers.


convert to a general assemblerstyle, or even binary representation of the c-code, and send that to the driver. compile there to get best performance.

Which is precisely what we are discussing. Welcome to the conversation. Glad you could make it http://www.opengl.org/discussion_boards/ubb/wink.gif


it means people will use tons of languages, depending on their likes, and that means you have to be able to read them all, too.. much more work for everyone.

I tend to prefer the freedom of choosing a shading language.

Maybe Pixar would like to take all those Renderman shaders they have and put them into some kind of hardware form. Of course, in order to do it with glslang, they'd have to, by hand, walk through all their shader code and transcribe it.

However, with the paradigm we're proposing, all they need to do is make a compiler for their shaders.

To be fair, they could write a compiler to go to glslang. But, making a compiler from something like C to something like C is a pain. An assembly-like language is much nicer in terms of transcription.

Also, what if I see certain defencies in glslang? For example, for the kinds of things I want to do (piecing together bits of shaders to build up a full shader), I'd like to have header files. Of course, glslang doesn't allow that (at least, not without lots of string copying and splicing). I'd like a shading language that does. Or, I'd like to augment the (public) glslang compiler with a #include directive.

I can't do any of that with glslang as the last stop until hardware. The best way to do this is with a simpler intermediate language that is easy to compile to.


and i think everyone agrees that we should have only one optimisation/compilation step. from shader code to shader.

Why? I've got no problems with multi-step processes, especially when one of them is a pre-process step that makes a runtime step that much faster.

A non-trivial amount of optimization can be done at the glslang-to-ISA compiler level. Dead-code can be eliminated. Basic, fundamental optimizations can be made here that would not need to carry over into the driver's compiler. In that way, all the driver's compiler needs to worry about is converting the ISA assembly into hardware opcodes, and doing hardware-specific optimizations on them. An assembly parser is much easier than a C parser. Easier means more bug-free. And bug-free is good.


i'd go for binary then. like java

A binary format makes it harder to write compilers (especially for machines with different endian-ness). Also, it makes it a bit more difficult to extend the language.


Is there an HLSL compiler that you can name that will compile to native code so that you can compare to Cg?

To be fair, there is.

All Radeon 9500+ drivers come with a preliminary implementation of glslang (the extension strings aren't exposed, but you can get the entrypoints). I can't tell you how well it works, but it is there. I imagine that ATi intends to expose these extensions directly once 1.5 is fully approved.

Mazy
07-30-2003, 12:13 AM
Well, they expose the old GL2 proposed extension, so its not up to date with the new ARB one in the current (3.6) catalyst drivers, but its still working. Haven't tested the speed, just played with it to make sure that it fits my engine.

I guess we have to wait and see how this turns out. I think i saw small hints about a revision in the ARB_*_programs in the latest meeting notes, so you might still use anoter compiler, and afaik, CG announced that they should be able to 'compile' to GLslang sooner or later ( i might be wrong there ).

davepermen
07-30-2003, 12:33 AM
Originally posted by Korval:
Compared to what? An equivalently-clocked P3? The P4 will eat it up because it has a larger front-side bus. The losses due to the CPU will be hidden because the app is more memory bound than anything else.



compared to compiling with a directly p4 optimizing compiler that can use more than only x86 asm, and can actually know how the hw behind works. for example the intel c++ compiler wich gains up to 5x speed increase over vc6 in software rendering apps here..


you have one serious issue: an assembler does not OPTIMIZE. what you want in the driver is a compiler. that means a translator,interpreter,and optimizer.

we generally take the "write in assembly" as "write and feed directly that way into hw WITHOUT FURTHER CHANGES".

the assembly you want is simply a highlevel shading language unrelated to hw, wich looks like good old assembly, more or less. for most people, a c styled language unrelated to hw is much more convencient, thats why cg is here, thats why hlsl is here, thats why cgslang will be here.

haven't ever seen a compiler for real assembly before. only for pseudoassembly, like for example ARB_fp and ARB_vp.. they have compilers ("shader optimizers, cheaters.. what ever http://www.opengl.org/discussion_boards/ubb/biggrin.gif") in the background..


i have no problem in having asm as high level shading language in opengl. its just not hip.


and if you say you want assembler, everyone means you want a one-to-one-mapping of your code to hardware.

secnuop
07-30-2003, 06:44 AM
Originally posted by Korval:
What we want is a language that is low level enough that compilers for different high-level languages can be written. We don't mind optimizations being done in the drivers.

Let me turn some of the questions you've been asking around. Is there anything you can do in an assembler-style intermediate language that you cannot do in a higher level language like glslang? What precludes you from compiling your favorite high-level language into a trivial glslang main function with a simple list of expressions?

As an example, I'd bet it wouldn't be very difficult to write a glslang backend for Cg. I can't say for certain since I've never written a Cg backend, but I don't see any obvious technical obstacles.

In fact, I recall from programming language courses (way back when) that this is how some other non-shading languages were originally developed. Their compilers emitted c source code as an "intermediate language", which was then compiled to the native CPU format by an established c compiler.

Tom Nuydens
07-30-2003, 07:58 AM
We should postpone this discussion until all video cards have the exact same feature set. People here have drawn comparisons with how CPUs are programmed, but in doing so they overlook one crucial fact: a Pentium4 can perform every computation an Athlon can do and vice versa. The optimal way to get to the result may be different on both CPUs, but you know that both of them can get there.

This is far from true for Radeons and GeForces. The biggest concern right now should not be finding the optimal path for each card, it should be finding a path (preferably one that doesn't end up with software emulation). Until all cards have a "complete" feature set, I believe IHVs will continue to expose custom programming interfaces, be it assembly or HLSL, and the only way to get properly optimized shaders will be to use these proprietary interfaces instead of the standardized ones.

-- Tom

[This message has been edited by Tom Nuydens (edited 07-30-2003).]

Korval
07-30-2003, 09:07 AM
you have one serious issue: an assembler does not OPTIMIZE.

We're not discussing an "assembler". As you point out later, modern assembly shader languages are compiled, not assembled.


Is there anything you can do in an assembler-style intermediate language that you cannot do in a higher level language like glslang?

You mean, besides the reasons I've already given?


From Korval:
1) Freedom of high-level language. We aren't bound to glslang. If we, for whatever reason, don't like it, we can use alternatives.

2) Ability to write in the assembly itself.

And let's not forget the notion that writing an optimizing C compiler is a non-trivial task. Neither is writing an optimizing assembler of the kind we are refering to, but it is easier than a full-fledged C-compilier. Easier means easier to debug, less buggy implementations, etc. And, because there will then only need to be one glslang compiler, all implementations can share that code.


I can't say for certain since I've never written a Cg backend, but I don't see any obvious technical obstacles.

You mean, besides the whole, "Outputting to a C-like language rather than something simpler," problem?


The biggest concern right now should not be finding the optimal path for each card, it should be finding a path (preferably one that doesn't end up with software emulation).

Um, no. I refuse to accept a non-optimal solution. Whatever solution is picked should allow for the production of an optimal shader for the hardware in question. Performance is still the #1 priority; without it, you can't afford to write longer shaders. Maybe you don't work in a performance-based realm, but that doesn't mean that performance isn't vital. It's just not vital for you.


Until all cards have a "complete" feature set, I believe IHVs will continue to expose custom programming interfaces, be it assembly or HLSL, and the only way to get properly optimized shaders will be to use these proprietary interfaces instead of the standardized ones.

That's rediculous.

I've never doubted that the glslang in-driver compiler would produce optimal results. It is in every card manufacturer's best interest (if they support the language) to produce optimal results; that's why Intel's compiler works so well.

Glslang will get there. However, all I'm saying is that glslang isn't the only way to do it; there are other approaches that can still get the performance, but are lower-level, so that high-level compilers can easily be written to compile to them. That way, we aren't bound to glslang.

Currently, most cards that could even consider supporting glslang can't support certain features in hardware (texture access in vertex shaders, etc). So, it's only a matter of finding which subset of functionality makes cards run reasonably fast.

You're not going to see an ARB_vertex_program_2 or ARB_fragment_program_2, ever. ATi's throwing everything behind glslang; that's how they plan to expose functionality.

[This message has been edited by Korval (edited 07-30-2003).]

roffe
07-30-2003, 09:17 AM
FYI,

talked to a 3DLabs glslang compiler writer/developer at SIGGRAPH yesterday. He more or less said this(not exact quote): "It seems that some people like to have an assembler like language to fool around with, to do some tweaking. How often do you look at your cpu generated code? We're putting a lot of effort into making the compiler as good as possible, so you can concentrate on high-level parts of your algorithm/shader".

EDIT: Duh,my question was something like:
Will 3DLabs be releasing any performance docs/tips on shader coding? As an example I brought up NVIDIA's register-usage issues.


[This message has been edited by roffe (edited 07-30-2003).]

t0y
07-30-2003, 09:50 AM
Korval:



quote:The fact is that today's hardware interface is very, very similar to arb_fp.

Oh really? I wouldn't be too surprised if the 9500+ series wasn't just running suped-up versions of ATI_fragment_shader-based hardware. The dependency chain is what makes me think of this. If you made an ATi_fragment_shader implementation with 4 passes and lots of opcodes/texture ops per pass, you could implement ARB_fp on top of that kind of hardware, as long as you could build a dependency chain.

And even if they're not, how can you be so sure that the hardware looks anything like ARB_fp?


That's not exactly what I meant. I meant that arb_fp is not exactly independent of the underlying hardware of both nvidia and Ati latest gpu's. It looks to me as a hack to make them "compatible". But that's just me. As I said I have no experience with programmable hardware (I have an R100 http://www.opengl.org/discussion_boards/ubb/frown.gif)
I'm sure the 9500+'s internal ISA is very similar to the extensions it provides, but that just backs up my claims that you can still use CG or your own high-level language in any case. The optimal interface is there!




quote:I don't like the idea of having different code for arb_fp1 arb_fp1.1 arb_fp2k5.

quote:A new extension for each generation! But isn't that what's supposed to change? How future proof is that?

I hope you don't expect glslang 1.0 to be the final version. If you do, you'll be sorely disappointed.

Whatever shading language the driver uses will change over time (probably about once per generation). Whether it happens to be glslang or an ISA of some form doesn't matter.


It will change a lot more if you use ASM. As long as it's the hardware-specific extensions changing, we can (as drivers permit) rely on glslang to make the necessary optimizations/changes. Just like CG (should)!



quote:If speed is what matters most, just use the hardware-specific extensions. If we want shaders that work across generations optimally (compiler dependent of course), a high level language is a good solution. Both forward and backward compatability is important.

Why is "forward compatibility" important? As long as old shaders work on new hardware, and work faster than they did on older hardware, then you're getting all the functionality you need.


I don't get it... You want the best performance out of your shaders or not? if not, if you just want them to "work", then you should be ok... otherwise.. well, you should change your opinion.




Also, as I pointed out before, you have failed to give a reason as to why a lower-level solution could not be compiled into just as optimized a solution as the higher-level one. All you've done is simply state that this is the case; that doesn't make it true.


You don't realize you're talking about a high-level assembly language. Something that has to be "translated" to the native assembly. Anyway I prefer writing "var_a=var_b*var_c;" than using a bunch of movs and temp regs in x86 letting the compiler figure out the optimal code.

[stupid example]
Next thing you'll be proposing a new instruction for "e=sqrt( sqr(a*b+c)+(c*d)+c+d);" like "do_a_sqrt_of_the_sqr_of... r1,r2,r3,r4,r5,r6...". I know you could decompose into several instructions, but isn't it obvious that you should preferably use the high-level syntax?
[/stupid example]

al_bob
07-30-2003, 10:51 AM
Next thing you'll be proposing a new instruction for "e=sqrt( sqr(a*b+c)+(c*d)+c+d);" like "do_a_sqrt_of_the_sqr_of... r1,r2,r3,r4,r5,r6...". I know you could decompose into several instructions, but isn't it obvious that you should preferably use the high-level syntax?
If the hardware exposes such an instruction, it's up to the compiler in the driver to convert the equivalent serie of opcodes to that instruction. Please, read the posts above for details.


I don't get it... You want the best performance out of your shaders or not? if not, if you just want them to "work", then you should be ok... otherwise.. well, you should change your opinion.
You want your shaders to run optimally on current hardware. But you also want them to run *faster* on future hardware. It's up the NV/ATI/whoever to make them run optimally (or not) on future hardware. After all, who knows what new hardware will have in capabilities (appart from the people here who work for those companies, of course)? Besides, it's in those companies' interest to make things run optimally on their hardware.


compared to compiling with a directly p4 optimizing compiler that can use more than only x86 asm, and can actually know how the hw behind works. for example the intel c++ compiler wich gains up to 5x speed increase over vc6 in software rendering apps here.
Of course, you're not comparing Intel C to assembly, or VC6 to assembly, or Intel C to VectorC (or any another vectorizing compiler). You're comparing Intel C from 2003 to VC6 from 1998. This "5x" figure is completely meaningless, and, as far as I'm concerned, completely bogus. Vectorizing compilers are good, but not *that* good.


Of course, Athlons have always won against Pentium-class processors, clock-for-clock that is.
Yes and no. I can trivially write a loop that will run 2x faster (clock for clock) on a P4 than on an Athlon XP. I can also trivially write a loop that will run faster on an Athlon 800 MHz than on a Pentium 4 3.2 GHz. It's all about knowing what the CPUs are good at (or not).
That said, the information necessary to convert one code sequence optimal to one chip to the optimal code for another is not lost in the assembly.


If the compiler isn't smart enough, then nVidia clearly hasn't done their job correctly.
nVidia also doesn't have unlimited resources to throw at the problem.

Tom Nuydens
07-30-2003, 11:20 AM
Originally posted by Korval:
Um, no. I refuse to accept a non-optimal solution. Whatever solution is picked should allow for the production of an optimal shader for the hardware in question. Performance is still the #1 priority; without it, you can't afford to write longer shaders.

I was thinking of NVidia's floating-point precision woes. The only way to get an NV30 to perform at its best is to exploit the fact that you can lower shader precision where possible. Nor ARB_fp nor GLslang allow you to do this: all they can give you is a shader that makes optimal use of a subset of the hardware. If you want to make optimal use of the full HW capabilities, your only choice is a proprietary interface.

Of course I don't expect this particular problem to persist beyond the current HW generation, but lord knows what's in store for us when the next generation comes around http://www.opengl.org/discussion_boards/ubb/smile.gif

-- Tom

secnuop
07-30-2003, 12:48 PM
"Is there anything you can do in an assembler-style intermediate language that you cannot do in a higher level language like glslang?"

You mean, besides the reasons I've already given?

1) Freedom of high-level language. We aren't bound to glslang. If we, for whatever reason, don't like it, we can use alternatives.
2) Ability to write in the assembly itself.


You've given me some reasons why you prefer writing in an assembler-like intermediate language, but I'm still unconvinced that there's anything you can do in an assembler-style intermediate language that you can't do in a high level language like glslang.

As one obvious example, the assembler-like statement:
"ADD r0, r1, c0"
can be trivially converted to the glslang statement:
"r0 = r1 + c0"

Or:
"DP4 r0, c0, v0"
becomes:
"r0 = dot(c0, v0)"

In the end, both the assembler-style program and the trivial glslang conversion should produce the same hardware instruction sequence. So what do you gain if your alternative language frontend producs ARB_vp/fp sourcecode as its intermediate language that you don't get if your alternative language frontend produces glslang sourcecode as its intermediate language? What advantage do you get by having the "ability to write in the assembly itself"?

zeckensack
07-30-2003, 02:07 PM
Originally posted by secnuop:
What advantage do you get by having the "ability to write in the assembly itself"?
If I may answer, it's compile time. The first conversion could be done off line and thus save work on the user's machine.

Korval,
believe it or not, I think now we're in agreement about what an intermediate language should be (if we need one). My apologies if you meant all that from the onset, we could have spared us a lot of the trouble then.

So, to recap: an intermediate code representation must
a)encode all constructs of the high level language (eg we need a 'sine' encoding vs a Taylor series)
b)preserve all type information
c)preserve all instruction flow information (functions, loops, branches)
d)preserve lifetime information (variable scopes) or put all scoped temporaries into a flat temp var space. Can't say for sure, I think I'd prefer the latter.

These are the requirements for not destroying any information. The first processing step can also do
a)syntax error checking
b)semantics error checking (outputs aren't written to, that kind of stuff)
c)flow analysis and dead code removal
d)constant folding

There are also a few things this first step should not do, most prominently register allocation and scheduling.

The speed gains by doing that sort of stuff off line aren't too promising. After all, you're representing everything in some sort of byte code instead of strings but string parsing is hardly slow enough to make this matter.

The processing steps required for this sort of thing are thin and light http://www.opengl.org/discussion_boards/ubb/wink.gif
You're IMO not saving much work. With the exception of flow analysis. And this will have to be repeated in the 'second pass' if this is still the one that does the optimizing work.
If you leave it out of the first pass (and forfeit dead code analysis), you're doing very close to nothing.

I don't think this will produce any appreciable execution time savings. I hope I'm clear now.


But of course you have a second point, allowing other layered front ends to access an intermediate code interface. And if I got your idea of intermediate code right this time, I won't argue about that.

Korval
07-30-2003, 02:10 PM
We're putting a lot of effort into making the compiler as good as possible, so you can concentrate on high-level parts of your algorithm/shader

Since I don't plan on buying 3DLabs hardware anytime in the near future, I don't care what their compiler can do. I'm interested in what ATi's compiler can do, and what nVidia's compiler (if they even deign to make one) can do.


Will 3DLabs be releasing any performance docs/tips on shader coding? As an example I brought up NVIDIA's register-usage issues.

The whole point of a high-level language is that you let someone else deal with the hardware-based performance issues. Clearly, 3D Labs has no knowledge of how to optimize for GeForceFX cards; only nVidia knows that. And in either case, you shouldn't have to do anything special for them.


It will change a lot more if you use ASM.

Based on what do you say this? Do you have some knowledge of a feature you expect to see in the near future?


You want the best performance out of your shaders or not? if not, if you just want them to "work", then you should be ok... otherwise.. well, you should change your opinion.

Shaders writen, and compiled (to the ISA), on older hardware should work faster than they did on the new hardware. If they don't work as fast as they could if they were re-compiled (with new opcodes), then fine. As long as the brute-force method on old hardware is faster, everything should be fine.


Anyway I prefer writing "var_a=var_b*var_c;" than using a bunch of movs and temp regs in x86 letting the compiler figure out the optimal code.

You don't have to write in the ISA; there would be (off-line) compilers to turn glslang, or Cg, or whatever, into the ISA. You never have to touch the assembly.


The only way to get an NV30 to perform at its best is to exploit the fact that you can lower shader precision where possible. Nor ARB_fp nor GLslang allow you to do this: all they can give you is a shader that makes optimal use of a subset of the hardware. If you want to make optimal use of the full HW capabilities, your only choice is a proprietary interface.

True, but that's a "failing" of glslang and ARB_fp. The languages aren't rich enough to specify precision information. And the default precision they require is too great for 16-bit floats.


So what do you gain if your alternative language frontend producs ARB_vp/fp sourcecode as its intermediate language that you don't get if your alternative language frontend produces glslang sourcecode as its intermediate language?

Well, for one, you don't have to go through the pain of writing out glslang rather than assembly. Writing assembly is, comparitively, easy. Writing correctly-formed glslang (with conditionals, code blocks, etc) is much harder.

Also, C/C++ compilers these days expect code to be written in a certain way. They compile code expecting that a human wrote it. So, you wouldn't see things like:




int iTemp1 = A * B;
int iTemp2 = C * D;
E = iTemp1 + iTemp2;


So they may not optimize it correctly if your compiler spits out that kind of glslang code. They would expect this kind of code for an assembly-based language. Certainly, they wouldn't prioritize this case for optimization over other, more likely, cases.


What advantage do you get by having the "ability to write in the assembly itself"?

Really short shaders (Like Result = tex(TexCoord0)) can just as easily, and much faster, be written in the assembly rather than in glslang, which involves various visual overhead (if nothing else, putting it in a function).

dbugger
07-30-2003, 10:42 PM
I don't believe glslang will be the final solution. To have a standardized high-level language is a Good Thing; it will save us all lot's and lot's of work. An assembler language that can be executed on all gfx-hardware is vital however, and I believe new ARB extensions will provide that. My point is, glslang will co-exist with lot's of other high-level languages (like Cg) but be the standard for high level gfx programs.

[Edit] - Perhabs it will even be possible to write asm blocks in glslang just like in C.

[This message has been edited by dbugger (edited 07-31-2003).]

t0y
07-31-2003, 01:24 AM
al_bob:

quote:Next thing you'll be proposing a new instruction for "e=sqrt( sqr(a*b+c)+(c*d)+c+d);" like "do_a_sqrt_of_the_sqr_of... r1,r2,r3,r4,r5,r6...". I know you could decompose into several instructions, but isn't it obvious that you should preferably use the high-level syntax?


If the hardware exposes such an instruction, it's up to the compiler in the driver to convert the equivalent serie of opcodes to that instruction. Please, read the posts above for details.


I meant a all-new opcode in the intermediate asm. In this case, all backward compatability breaks apart.

If you still prefer to write the sequence of instructions needed to mantain compatability (hoping for the compiler to optimize) instead of writing in higher-level then our opinions are different.

My idea of perfection ( http://www.opengl.org/discussion_boards/ubb/biggrin.gif) is to have a varyable number of direct hardware interfaces, and a an independent language to use them transparently. Both cg (Offline) and glslang(online) are able to accomplish this. I just don't see the need for another limited general ASM between them. The main disadvantage of CG is of course the compile time needed, but we're not talking about pages and pages of shader code, are we?


Korval:


quote:It will change a lot more if you use ASM.

Based on what do you say this? Do you have some knowledge of a feature you expect to see in the near future?


The more abstract (higher-level) a language is the easier it is to make it fit the harware restrictions and features. It's exactly because I don't have any knowledge of future features that I prefer glslang over "your" intermediate ASM. How many changes have you seen made to C/C++ to make it work in a particular architecture? Are you able to say that it has lost any kind of optimization abilities? ASM imposes restrictions on how you do things!



quote:You want the best performance out of your shaders or not? if not, if you just want them to "work", then you should be ok... otherwise.. well, you should change your opinion.

Shaders writen, and compiled (to the ISA), on older hardware should work faster than they did on the new hardware. If they don't work as fast as they could if they were re-compiled (with new opcodes), then fine. As long as the brute-force method on old hardware is faster, everything should be fine.

What ISA? the itermediate or the lowlevel? Anyway, if you let that re-compile up to the driver this problem goes away.


quote:So what do you gain if your alternative language frontend producs ARB_vp/fp sourcecode as its intermediate language that you don't get if your alternative language frontend produces glslang sourcecode as its intermediate language?

Well, for one, you don't have to go through the pain of writing out glslang rather than assembly. Writing assembly is, comparitively, easy. Writing correctly-formed glslang (with conditionals, code blocks, etc) is much harder.


??? Are you really sure about this? I thought high-level languages were created to ease the life of the programmer. Does CG have any reason to exist after all?



quote:What advantage do you get by having the "ability to write in the assembly itself"?

Really short shaders (Like Result = tex(TexCoord0)) can just as easily, and much faster, be written in the assembly rather than in glslang, which involves various visual overhead (if nothing else, putting it in a function).


What's wrong with glBindTexture(tex)?

Your argumentation fails in the case of complex shaders in which case, the plethora of asm instructions needed will result in severe visual and algorithmic overhead (lack of simplicity and clarity).

Korval
07-31-2003, 11:14 AM
I meant a all-new opcode in the intermediate asm. In this case, all backward compatability breaks apart.

Why would the ISA have this opcode in it? If the hardware has an opcode for this natively, then the driver will have to look at a sequence of ISA opcode commands and determine that this is what is being requested.

It'd have to do the exact same thing in the case of the C-expression. Except, it would have to parse a C-expression, which is much harder than parsing a sequence of simple operations.


My idea of perfection ( ) is to have a varyable number of direct hardware interfaces, and a an independent language to use them transparently.

That has to be the worst proposal ever.

You only want 1 interface for compiling and using shaders. Having more than 1 does nothing for you, unless that 1 is somehow sub-optimal. If you can produce an optimal 1, you don't need the others.

Therefore, the optimal solution is to produce the (singular) optimal interface.


Both cg (Offline) and glslang(online) are able to accomplish this.

Maybe you aren't paying attention to what glslang ultimately means, but Cg will be religated to only nVidia hardware. There aren't going to be more assembly extensions; if you want advanced shading in GL, you'll be expected to use some version of glslang. That's the way ATi and 3D Labs want it, and that's the way it's going to be. Cg will either not be usable on GL at all, or will be purely an nVidia-hardware (and extension) only thing.


The more abstract (higher-level) a language is the easier it is to make it fit the harware restrictions and features.

That is a generalization, given without factual basis.

I would agree that there is some minimum "level" below which you are losing out on optimization possibilities. The question is what does this bare minimum look like. Being able to write expressions in C-style buys you nothing in terms of optimization, compared to a simple list of operations.


ASM imposes restrictions on how you do things!

In what way? Be specific; just saying that "ASM imposes restrictions" isn't going to cut it. What restriction does it impose?


What ISA?

Did you read the thread? The ISA would be the assembly that would be compiled from a high-level language.


Anyway, if you let that re-compile up to the driver this problem goes away.

What do you mean by that? The ISA must be compiled by the driver. I don't expect hardware vendors to do what Intel does and have microcode ops that automatically convert the assembly into hardware instructions. No, I expect the ISA to be compiled.


Are you really sure about this?

Yes. Had you been paying attention to the context of my message (which was a compiler writing to C, vs. a compiler writing to asm), you would not need to ask this question. It's much easier to compile from a complex language to a simpler one. It's harder to compile to an equally complex language; there's a lot of stuff you have to deal with (porting control structures, etc).


What's wrong with glBindTexture(tex)?

Do you use OpenGL? glBindTexture only makes the texture avaliable for use; it does nothing in terms of the per-fragment processing.

Perhaps you were suggesting glTexEnv-stuff? Even so, if 75% of what I render requires shaders, I'm going to write my engine such that 100% of what is rendered requires shaders. That way, I don't have to special case code or have some virtual function somehwere. They can all follow the same code path.


Your argumentation fails in the case of complex shaders in which case, the plethora of asm instructions needed will result in severe visual and algorithmic overhead (lack of simplicity and clarity).

Says who? Based on what do you draw this conclusion? What "severe visual and algorithmic overhead"?

secnuop
07-31-2003, 02:28 PM
Why would the ISA have this opcode in it? If the hardware has an opcode for this natively, then the driver will have to look at a sequence of ISA opcode commands and determine that this is what is being requested.

It'd have to do the exact same thing in the case of the C-expression. Except, it would have to parse a C-expression, which is much harder than parsing a sequence of simple operations.

Actually, I'd argue the opposite. I wouldn't call it nontrivial, but it's pretty straightforward to write a parser for a c-like language. I'd venture a guess that just about every CS major has a course on compilers that does just this (mine did).

It seems much more difficult to examine a sequence of short assembler-like expressions to divine what they're really trying to do and to see if they can be substituted with a more efficient opcode. Imagine "reversing" sine or cosine Taylor series expansion in a vertex program, for example. Isn't it easier to decompose relatively complex operations into simpler operations than it is to determine which simple operations can be combined into one complex operation?

For this reason I'd suggest that an intermediate language should have as much contextual information as possible. As another example, if a user wants to normalize a vector, the intermediate language should state that the user wants to normalize a vector, not that the user wants to compute the length of a vector and then to divide by the computed length. That makes it easy for devices with a "normalize" opcode to use it directly, and devices that don't can translate this into a length calculation and a divide.


[This message has been edited by secnuop (edited 07-31-2003).]

Korval
07-31-2003, 04:46 PM
Actually, I'd argue the opposite. I wouldn't call it nontrivial, but it's pretty straightforward to write a parser for a c-like language. I'd venture a guess that just about every CS major has a course on compilers that does just this (mine did).

How does graduates having taken a course in compilers make writing C parsers easier than writing assembly parsers (which would be the opposite of my statement). You can't expect me to believe that programming the expression priorities and rules in C is ever easier than just reading a simple opcode and sticking it in an expression tree. The former is a complex operation requiring a lot of rules. The latter... is a call to ExpressionTree::ApplyExpression. The priorities are implicit in the assembly.

And that's just expressions. Parsing a long series of nested if-else clauses is painful compared to parsing a sequence of conditional jump instructions.


As another example, if a user wants to normalize a vector, the intermediate language should state that the user wants to normalize a vector, not that the user wants to compute the length of a vector and then to divide by the computed length. That makes it easy for devices with a "normalize" opcode to use it directly, and devices that don't can translate this into a length calculation and a divide.

Which, of course, is no different than glslang's library of avaliable functions (like sin, normalize, etc). You wouldn't be able to determine what a sin operation is any easier from a C-representation of a Taylor series than the assembly.

So yes, the assembly would have to be suitably rich to handle cases, just as glslang does. Just as glslang will have to be updated with unanticipated features for the library (for example, the memory-buffer binding stuff I mentioned), so too would the assembly.

Pop N Fresh
07-31-2003, 07:45 PM
You can go to 3DLabs website and download the (open source) code for a glslang compiler. The parser is already written and is available. No one is going to have a problem parsing the language.

Coriolis
07-31-2003, 08:16 PM
Parsing is easy. A programming staff that cannot even get a language parser right is never going to manage getting an OpenGL driver to work. Parsing is also fast. The bottleneck in compilers is never the parsing; it is usually the symbol loookups. In this respect, I'd expect a language that is closer to assembly language to be slower, since assembly language programs tend to have far more symbols than equivalent high-level language programs.

I don't want multiple programming languages for OpenGL shaders. I want there to be one language that is expressive enough to do anything without being awkward to use, and then for everybody to use that one language. Having more than one language is bad for the very same reasons that having more than one low-level language is bad, or that having VAR and VAO is bad.

I don't think that having an assembly language syntax necessarily loses any information, if you make that syntax sufficiently high level. But what advantage is there to this syntax then? You don't get the advantage of writing directly to the hardware that you normally get with assembly language syntax. You don't make the job of the compiler implementers all that much easier; either way they have to parse the language and come up with an intermediate format that is most useful to them, and this format is going to be different from vendor to vendor. Parsing C expressions is easy, and parsing nested if statements is easy. In fact, to make an assembly language useful, you have to add a lot of the parsing complexities that high-level languages have; otherwise the programmer has to hand-calculate all the numeric constants in the code, which is tedious and error-prone.

It seems to me that people opposed to having glslang in OpenGL itself don't really want an assembly language syntax to program in so much as they want a universal intermediate language that nobody manually edits but merely allows multiple high level language front ends. If they wanted to be able to manually edit it then the language would need all the parsing complexities of a high-level language, and one of the arguments often espoused is that the assembly language syntax is easier to parse, so this can't be the case. My question is, what is the advantage to having multiple high level languages instead of a single high level language that is expressive enough to solve all problems? The only one I can think of is programmer efficiency for certain classes of problems, yet I think that is far outweighed by the advantages of having a single knowledge base and the inate ability to universally share shaders.

Korval
07-31-2003, 10:13 PM
Parsing is also fast. The bottleneck in compilers is never the parsing; it is usually the symbol loookups.

Parsing C is not fast compared to parsing assembly. How could it be; one of them is very complicated, the other is very simple (indeed, designed for ease of parsing).


I don't want multiple programming languages for OpenGL shaders.

That's good for you. That's not good for everyone else. It is better to have options than to be forced into a single solution that may not be appropriate for your needs.

Let's say that you already have developed a shading language. Let's say you're Pixar, and you therefore practically invented the term, "shader." Wouldn't it be nice if they could write a compiler to take RenderMan shaders and run them on OpenGL?


Having more than one language is bad for the very same reasons that having more than one low-level language is bad, or that having VAR and VAO is bad.

How so? The problem with VAR vs. VAO is that they are two different API's. We're not talking about having two different API's (or, at least, I don't want to). We're suggesting that the shading language should be on a lower level, one that is easier to compile to from higher level languages.


Parsing C expressions is easy, and parsing nested if statements is easy.

Have you ever written a C expression parser? It is a falsehood to say that the set of statements:




MUL TEMP1, R1, R2
MUL TEMP2, R3, R4
ADD RESULT, TEMP1, TEMP2


is harder to parse than:




RESULT = R1 * R2 + R3 * R4;


I can write code in 10 minutes to parse the assembly. It'd take some time to write code to parse the general case of the C-version, if for no other reason than operator precidence issues. I can't just read it from left to right; I have to find the expressions greatest precidence and parse them. And iterate over this process.


In fact, to make an assembly language useful, you have to add a lot of the parsing complexities that high-level languages have; otherwise the programmer has to hand-calculate all the numeric constants in the code, which is tedious and error-prone.

Maybe you haven't quite understood the idea, but, for the general case, programmers don't write the ISA by hand. It isn't designed to be easy to write, nor is that its goal. The goal is to give high-level languages a suitable target language to compile to. The ISA should be easy for compilers to produce code, not people.


the inate ability to universally share shaders

Share algorithms, not shaders. They're inhierently more useful. As long as everyone shares a language, they can understand the algorithm, and they can code it up in whatever way they see fit.

Also, languages, especially those based on C, aren't that fundamentally different. Cg and glslang may use different words for different concepts, but they aren't like the difference between C and Lisp (which are very fundamentally different). Sure, someone out there could write some kind of language that is fundamentally different from C and use it, but so what? That should be avaliable as an option, just so that people can use whatever they feel is best for themselves. Choices are good.

t0y
08-01-2003, 09:28 AM
Korval:
This is getting to be a flame war, so I'll try to avoid any kind of irony in this post. I believe (from you posts in the board) that your opengl and graphics knowledge is far better than mine, but i'm still entitled to have my own opinions.


quote:I meant a all-new opcode in the intermediate asm. In this case, all backward compatability breaks apart.

Why would the ISA have this opcode in it? If the hardware has an opcode for this natively, then the driver will have to look at a sequence of ISA opcode commands and determine that this is what is being requested.

It'd have to do the exact same thing in the case of the C-expression. Except, it would have to parse a C-expression, which is much harder than parsing a sequence of simple operations.


Why? Imagine for a second that it was a standard formula in maths (or shaders) that was starting to be affordable in gpu's... Just like you use the dot product instructions instead of sums and muls. It could be introduced to simplify shader coding.



quote:My idea of perfection ( ) is to have a varyable number of direct hardware interfaces, and a an independent language to use them transparently.

That has to be the worst proposal ever.

You only want 1 interface for compiling and using shaders. Having more than 1 does nothing for you, unless that 1 is somehow sub-optimal. If you can produce an optimal 1, you don't need the others.

Therefore, the optimal solution is to produce the (singular) optimal interface.

The optimal interface is the one that matches exactly the underlying hardware. It might not be the easiest to grasp but it would be optimal and easy for the compiler to parse. That's why I prefer the availability of hardware-specific ASM extensions over a generalized one.
Higher-level languages should target this specific interface depending on what hardware you're running (run-time). And that's why I prefer glslang in the drivers. You have the power to choose between compatability and full control.

This lower-level interfaces can perform dead-code elimination, instruction reordering, etc., much more easily than with a higher-level ASM, just because they don't have to deal with hardware virtualization and algorithmic dependencies. They should be as low-level as possible keeping all control in the hands of whatever or whoever feeds code to them.



quote:Both cg (Offline) and glslang(online) are able to accomplish this.

Maybe you aren't paying attention to what glslang ultimately means, but Cg will be religated to only nVidia hardware. There aren't going to be more assembly extensions; if you want advanced shading in GL, you'll be expected to use some version of glslang. That's the way ATi and 3D Labs want it, and that's the way it's going to be. Cg will either not be usable on GL at all, or will be purely an nVidia-hardware (and extension) only thing.


Again, I must admit that I have absolutely no experience in writing shaders. But I read somewhere and I always had the idea that CG was able to adapt to any platform as long as you define the profiles for them. I still don't know why or how glslang limits CG's existance.
Ultimately it could mean the end of CG, but that's because we would have a better alternative. This is what makes nVidia angry.

If you really like a ASM-like language you can always create one yourself. In my perfect solution you would have all the interfaces you could ever need as long as you keep updating the language. In your own words, that's a lot easier to accomplish than writing a C-like compiler.


quote:The more abstract (higher-level) a language is the easier it is to make it fit the harware restrictions and features.

That is a generalization, given without factual basis.

I would agree that there is some minimum "level" below which you are losing out on optimization possibilities. The question is what does this bare minimum look like. Being able to write expressions in C-style buys you nothing in terms of optimization, compared to a simple list of operations.

A true ASM will never be that minimum level. The factual basis I can give you is the longevity of C/C++ (and others) over x86 asm and whatever intermediate asm compilers target when parsing high-level languages. We're talking about a future-proof solution, not one that changes every generation.



quote:ASM imposes restrictions on how you do things!

In what way? Be specific; just saying that "ASM imposes restrictions" isn't going to cut it. What restriction does it impose?

Have you ever programmed in x86 asm? Simple algorithms may take ages to adapt to the limitations of the hardware. Only a really general ISA or language can help you code focusing on the algorithm. This isn't of course noticeable with simple shaders, but things are changing fast. That's the reason high-level languages were created (including CG): to help focusing on the algorithm.



quote:What ISA?

Did you read the thread? The ISA would be the assembly that would be compiled from a high-level language.

We have 2 ISAs here: native and higher level. Today, the native one is closely related to hardware-specific extensions, the higher-level is arb_fp.


quote:What's wrong with glBindTexture(tex)?

Do you use OpenGL? glBindTexture only makes the texture avaliable for use; it does nothing in terms of the per-fragment processing.

Perhaps you were suggesting glTexEnv-stuff? Even so, if 75% of what I render requires shaders, I'm going to write my engine such that 100% of what is rendered requires shaders. That way, I don't have to special case code or have some virtual function somehwere. They can all follow the same code path.


That was me being ironic http://www.opengl.org/discussion_boards/ubb/wink.gif.
Anyway, when I started in win32 programming I was overwhelmed with the size and complexity of a simple hello world in a window (not a messagebox). But that didn't make me go back to the "wonderful" world of DOS. And I don't regret it.



quote:Your argumentation fails in the case of complex shaders in which case, the plethora of asm instructions needed will result in severe visual and algorithmic overhead (lack of simplicity and clarity).

Says who? Based on what do you draw this conclusion? What "severe visual and algorithmic overhead"?

Do you make all your programs in pure ASM? I'm sure you don't. Why?

al_bob
08-01-2003, 10:21 AM
Why? Imagine for a second that it was a standard formula in maths (or shaders) that was starting to be affordable in gpu's... Just like you use the dot product instructions instead of sums and muls. It could be introduced to simplify shader coding.
What makes you think that the drivers don't already do that for dp3/4? Why is your "new standard formula that becomes afordable" any different?


The optimal interface is the one that matches exactly the underlying hardware.
It wouldn't, because it would only be "optimal" for a very very short time (less than 6 months). That's not usually enough time to build good HL compilers that target it.


If you really like a ASM-like language you can always create one yourself.
That compiles to glslang? Wouldn't that be backwards? I don't think you've thought this through very well...


We're talking about a future-proof solution, not one that changes every generation.
There is no future-proof solution. x86, C and C++ are all evolving, and so are the shading languages. The language(s) will adapt to the hardware, and vice-versa.


Have you ever programmed in x86 asm? Simple algorithms may take ages to adapt to the limitations of the hardware
If you don't like coding in assembly, then don't. But do not prevent *me* from doing so!


Do you make all your programs in pure ASM? I'm sure you don't. Why?
Shaders, even long ones, are trivial compared with the applications that drive them (games). You do not write a half-million-line C program in asm. Do can write a small portion of it (100-1000 lines) in asm though, if ever it's needed. It's the same for shaders.

[This message has been edited by al_bob (edited 08-01-2003).]

t0y
08-01-2003, 10:58 AM
quote:Why? Imagine for a second that it was a standard formula in maths (or shaders) that was starting to be affordable in gpu's... Just like you use the dot product instructions instead of sums and muls. It could be introduced to simplify shader coding.


What makes you think that the drivers don't already do that for dp3/4? Why is your "new standard formula that becomes afordable" any different?


It's different because a new instruction is introduced, thus changing the language and breaking compatability. If you prefer to write different code-paths or shaders for every different generation of hardware you want to support, you might as well code for hardware-specific interfaces.



quote:The optimal interface is the one that matches exactly the underlying hardware.


It wouldn't, because it would only be "optimal" for a very very short time (less than 6 months). That's not usually enough time to build good HL compilers that target it.


Yes it would. The R300 won't change at all in the next 100 or so years. If you read my post you'll notice I refer to hardware-specific interfaces or ASM languages that don't need "upgrading". 6 months is more than enough time to mature a generation of hardware specific compilers. Time is running tight, but you still have to wait 1 year or so before you start seeing actual software using the new features. We're just starting to see full dx8 games appearing. If you want performance "now" just use the direct-hardware extensions.



quote:If you really like a ASM-like language you can always create one yourself.


That compiles to glslang? Wouldn't that be backwards? I don't think you've thought this through very well...

No... That compiles to hardware specific ASM.



quote:We're talking about a future-proof solution, not one that changes every generation.


There is no future-proof solution. x86, C and C++ are all evolving, and so are the shading languages. The language(s) will adapt to the hardware, and vice-versa.

But some solutions are more future-proof than others. And coding in C is much more future proof than coding in x86 ASM.



quote:Have you ever programmed in x86 asm? Simple algorithms may take ages to adapt to the limitations of the hardware


If you don't like coding in assembly, then don't. But do not prevent *me* from doing so!


But I do! Very much. But I also know that I only do it for fun and for the challenge, not for productivity or for performance reasons. At least I never really had to do it.



quote http://www.opengl.org/discussion_boards/ubb/biggrin.gifo you make all your programs in pure ASM? I'm sure you don't. Why?


Shaders, even long ones, are trivial compared with the applications that drive them (games). You do not write a half-million-line C program in asm. Do can write a small portion of it (100-1000 lines) in asm though, if ever it's needed. It's the same for shaders.


That's fair. I exaggerated on purpose. But that was only to show obvious advantages of C over ASM. And you use the right words to agree with me: "if ever it's needed". You should only write in ASM if you really have to, or if the code is time-critical. Otherwise you're just wasting your time. It's no different for shaders. Since shaders are a lot simpler than full-blown programs a compiler is more than able to do that for you.

Korval
08-01-2003, 01:32 PM
It's different because a new instruction is introduced, thus changing the language and breaking compatability. If you prefer to write different code-paths or shaders for every different generation of hardware you want to support, you might as well code for hardware-specific interfaces.

This conversation is getting circular. You keep using the same arguments, and I keep using the same facts to counter them.

As I've stated several times, glslang will change with each generation! It will have to do so in order to keep up with the hardware.

Therefore, you're already going to have to have generational codepaths. However, the code you used on older generations should still work on the newer ones, though maybe not as optimally.


If you want performance "now" just use the direct-hardware extensions.

Once again you use an argument that clearly goes against established facts.

There will be no more "direct-hardware extensions." At least, not out of ATi or 3D Labs. ATi has never been big proponents of making lots of extensions, and 3D Labs is pushing glslang too hard to be considering a "backdoor" shading language that can work without glslang.

In short, the fundamental idea with glslang is that it replaces all other shading languages in OpenGL. One of the fundamental reasons for putting a platform-neutral shading language in the core at all is so that everyone can use the same interface to the hardware.


And coding in C is much more future proof than coding in x86 ASM.

Of course, we aren't talking about x86 assembly. We're talking about a proposed assembly-like language. It's only relationship to x86 is that they share the ISA model of design: a single, fixed (with the ability to adding extensions) assembly language that is translated into hardware opcodes.


You should only write in ASM if you really have to, or if the code is time-critical.

That's funny. Shaders for any performance application where rendering plays a significant part in that performance are always time-critical.

Coriolis
08-01-2003, 03:50 PM
I'm too lazy to do the interleaved quoting thing, so this is mostly in response to Korval's reply to my other post.

I have written numerous parsers. I've designed and implemented several full languages. I've written entire microprocessor emulators. I do know what I'm talking about with parsers. Note that I equate simplicity with the difficulty of an implementation, not with the time it actually takes to implement it. It is pretty easy to ride in an airplane from Boston to Paris, but it takes a long time.

Tokenizing C and tokenizing assembly are comparable performance-wise, so the tokenizers can be written just as fast for either one, within a few percent. Parsing an assembly that is always computer generated is going to be easier than parsing C, and probably faster. However, standard optimizing logic says this doesn't matter, since parsing is not a bottleneck. If you ever want to be able to hand edit the assembly, then you really need the ability to use numeric expressions for constants, and the assembly language becomes comparable in complexity to parsing a C-like syntax. I do realize Korval doesn't seem to care about this, but al_bob appears to.

As for time it takes to implement; who cares? I don't care if the driver writers have to spend a day or two writing the parser instead of an hour. If their spending an extra couple days now saves every shader writer a few minutes per shader for the lifetime of glslang, that is a good tradeoff. I also expect it will be faster and easier overall for the driver writers to implement glslang if it is integrated directly into OpenGL, since they don't have to waste time writing any code for reading and writing an intermediate assembly language.

The key assumption I was making behind my preference for a single language is that the language is expressive enough to handle all problems without being too cumbersome to use. If this is the case, then having more than one language is redundant. I think the main difference between Korval's stance and my own is that he believes no such language exists. I don't know if glslang is this language (I haven't investigated it too much), but I do think that such a language does exist.

Korval
08-01-2003, 04:46 PM
If you ever want to be able to hand edit the assembly, then you really need the ability to use numeric expressions for constants, and the assembly language becomes comparable in complexity to parsing a C-like syntax.

Neither ARB_vertex_program nor ARB_fragment_program allow for this facility. x86 assembly doesn't either. And yet, people write in it just fine.


As for time it takes to implement; who cares? I don't care if the driver writers have to spend a day or two writing the parser instead of an hour.

First, I kinda like driver stability. Less code means less chance of bugs.

Secondly, I'd rather they spent that day or two on optimizing VBO, getting superbuffers hammered out and implemented, or any number of other important things they could be doing. After all, each day taken by parsing a C-language is one day subtracted from doing other things.


I also expect it will be faster and easier overall for the driver writers to implement glslang if it is integrated directly into OpenGL, since they don't have to waste time writing any code for reading and writing an intermediate assembly language.

If we were using an ISA structure, driver writers wouldn't be compiling glslang at all. They wouldn't know or care about the existence of the high level glslang. Instead, they'd be compiling the ISA.

It would be up to a stand-alone project (managed by the ARB, I would suppose) that does the actual glslang compiling.


I think the main difference between Korval's stance and my own is that he believes no such language exists.

That, and that there are plenty of shaders out there written in plenty of shading languages. Don't think that D3D 8.0 was the beginning of "shaders."

Wouldn't it be nice if they didn't have to go and rewrite all of their shaders in order for them to run on hardware? All they'd need to do is write a compiler for them.

aboeing
08-01-2003, 07:34 PM
Hi, I joined this topic a bit late, but I want to put in my 2 cents.

First of all, I want to say I think the idea of a HLSL in OpenGL is a BAD idea.
Reasons being:

1) Its called OPEN gl, not CLOSED gl, that is, I should be able to use OpenGL with C or Java or Delphi or whatever.

Similarly, I should not be limited to one specific shading language. In the last 10 years we have seen popularity shift between BASIC, Pascal, C, C++, and Java, so what makes anyone think that we should be stuck with one shading language for the next 10 years?



I think the main difference between Korval's stance and my own is that he believes no such language exists

I donít think that such a language exists either. Programming languages have existed for a long time now, if someone could have come up with the perfect easy to use absolutely future proof one, wouldn't they have done it already?



As I've stated several times, glslang will change with each generation! It will have to do so in order to keep up with the hardware.

So why have it at all? And why should it be part of OpenGL? If the language is always changing anyway, why not let it compile to a different assembly for each generation? At least this way people could implement different languages.

2) A HLSL is NOT going to be the fastest (performance) option. I think we all already agree on this. [a compiler is never going to be the fastest option - insert discussion about vectorizing compilers here]



That's funny. Shaders for any performance application where rendering plays a significant part in that performance are always time-critical.


Couldnít agree more. Otherwise we would all be using software rendering because itís a lot more flexible.

3) A HLSL compiler is going to be complicated to write. No offence to the people at ATI or nVidia, but their drivers have a number of problems in them, and I donít think they will be able to write a decent compiler, and have it fully optimizing too. [The Cg compiler has a number of bugs in it.. MSVC has a number of bugs too, and it has been in development for a very very long time now - so the odds of them ever getting it perfect are fairly slim]



First, I kinda like driver stability. Less code means less chance of bugs.

Couldnít agree more.



As for time it takes to implement; who cares? I don't care if the driver writers have to spend a day or two writing the parser instead of an hour. If their spending an extra couple days now saves every shader writer a few minutes per shader for the lifetime of glslang, that is a good trade-off. I also expect it will be faster and easier overall for the driver writers to implement glslang if it is integrated directly into OpenGL, since they don't have to waste time writing any code for reading and writing an intermediate assembly language.


But if you had an intermediate assembly language, then you wouldnít have to write a compiler. You say youíve done this, so you know it takes a long time. (a lot longer than writing an assembler- which I assume you have done too). Especially if it is supposed to optimize well. So it would be easier for the people writing the drivers just to support the intermediate assembly language.

Vendors could still write their own compilers. No one is going to stop them. They can optimize them all they want. They can spend just as much time and money as Intel did writing their compiler.

(and I DO care btw, I donít want to have to wait for an extra year before the new gfx card becomes usable because they havenít finished writing the compiler yet)

4) I donít see the problem with some form of intermediary assembly language, and, I donít see the problem with this having the scheduling information in it. Generally, executing instructions in such an order that they are not dependent on the previous instruction will increase the amount of parallelism possible (thus increase the potential performance), regardless of the actual design of the GPU.

Its not like we cant have one or the other. If we have an intermediary assembly language, we can still have a higher-level language (not just limited to one). If the people who make GCC can do it, Iím sure the people working for 3dLabs, ATI, nVidia, SGI, etc can put their heads together and come up with one too. (If they cant, do you really want them writing your compilers?)



Why? Imagine for a second that it was a standard formula in maths (or shaders) that was starting to be affordable in gpu's... Just like you use the dot product instructions instead of sums and muls. It could be introduced to simplify shader coding.


If your trusting someone enough to write an optimizing compiler, then Iím sure that you could trust them enough to write a driver that recognizes a few mul's followed by an add as a dot product.. (Similar for any other new function they invent..)

5) What happens when we have very complex shaders and the compile time becomes very large? Wouldn't it be better just to have a precompiled assembly for this?

In order to seriously optimize code the compilers need to do a lot of thinking. (Some compilers run GAís to optimize the ordering of instructions!) Assembling is a lot faster than this.

6) It is going to be a lot more difficult for someone to enter the grafix card industry and compete with the existing products if they not only have to design the gfx card, but also write a top-notch compiler.

Using 3dLabs open source implementation is bound to produce sub-optimal results.

And just a final note:
How much longer are we going to be rendering the way we are? What happens when some other form (raytracing/radiosity/whathaveyou) becomes possible in real time (as I imagine it (raytracing) might soon) Ė we will need a more generalized language than some shading language. Why not just make C or C++ the official OpenGL supported shading language.

zeckensack
08-01-2003, 10:09 PM
Originally posted by Korval:
Have you ever written a C expression parser? It is a falsehood to say that the set of statements:


MUL TEMP1, R1, R2
MUL TEMP2, R3, R4
ADD RESULT, TEMP1, TEMP2


is harder to parse than:




RESULT = R1 * R2 + R3 * R4;


I can write code in 10 minutes to parse the assembly. It'd take some time to write code to parse the general case of the C-version, if for no other reason than operator precidence issues. I can't just read it from left to right; I have to find the expressions greatest precidence and parse them. And iterate over this process.The high level stuff is harder to parse, agreed. Operator precedence needn't exist in an in-between representation, it's simply a little help to ease (high level) programming. Could be eliminated. Agreed.

Parsing is still only a minor part of what you need for an optimizing backend compiler. Many of the complexities of the high level language need to be carried over to your assembly. Btw, your inflationary use of the term "assembly" complicates this discussion.
I know what you mean now, but that took me almost two pages of this thread and others are still struggling.

Your "assembly" can't be compiled left-to-right either. It's almost as complex as GLslang itself. I've already thrown up a set of requirements for the structure of such an intermediate code representation in my last post, I'd appreciate any comments.


Originally posted by zeckensack
a)encode all constructs of the high level language (eg we need a 'sine' encoding vs a Taylor series)
b)preserve all type information
c)preserve all instruction flow information (functions, loops, branches)
d)preserve lifetime information (variable scopes) or put all scoped temporaries into a flat temp var space. Can't say for sure, I think I'd prefer the latter.
These are the requirements for not destroying any information.

The first processing step can also do
a)syntax error checking
b)semantics error checking (outputs aren't written to, that kind of stuff)
c)flow analysis and dead code removal
d)constant folding

There are also a few things this first step should not do, most prominently register allocation and scheduling.Can we agree on this set of musts, mays and should nots?

t0y
08-02-2003, 03:55 AM
Korval:

quote:It's different because a new instruction is introduced, thus changing the language and breaking compatability. If you prefer to write different code-paths or shaders for every different generation of hardware you want to support, you might as well code for hardware-specific interfaces.

This conversation is getting circular. You keep using the same arguments, and I keep using the same facts to counter them.

As I've stated several times, glslang will change with each generation! It will have to do so in order to keep up with the hardware.

Therefore, you're already going to have to have generational codepaths. However, the code you used on older generations should still work on the newer ones, though maybe not as optimally.

Maybe I got this wrong, but I always thought glSlang was created to eliminate the need for different codepaths across generations and HW vendors. If it does change, then it's because it's flawed from the start. It should be as flexible as possible. Maybe we don't have the flexible enough GPU's to allow this, but even if current hardware doesn't support all features of the language we shuldn't really care. Just like OpenGL did for 3D in its beggining, glSlang should stabilish the standard for the next years for shading.



quote:If you want performance "now" just use the direct-hardware extensions.

Once again you use an argument that clearly goes against established facts.

There will be no more "direct-hardware extensions." At least, not out of ATi or 3D Labs. ATi has never been big proponents of making lots of extensions, and 3D Labs is pushing glslang too hard to be considering a "backdoor" shading language that can work without glslang.

In short, the fundamental idea with glslang is that it replaces all other shading languages in OpenGL. One of the fundamental reasons for putting a platform-neutral shading language in the core at all is so that everyone can use the same interface to the hardware.


There will always be the GPUs native code. If vendors don't expose this interface it's their own choice. If they feel there's no need for such a low-level access to the hardware they won't do it. I would prefer otherwise though.



quote:And coding in C is much more future proof than coding in x86 ASM.

Of course, we aren't talking about x86 assembly. We're talking about a proposed assembly-like language. It's only relationship to x86 is that they share the ISA model of design: a single, fixed (with the ability to adding extensions) assembly language that is translated into hardware opcodes.

I still don't get your opinion. Your asm-like language is not translated to hardware opcodes. It's compiled. I just don't see the advantages of using a single, fixed, with extensions (?) high-level ASM (because it has to be high-level) over a single, fixed, without extensions C-like language.

You initially prefered ASM over glSlang because it would kill cg (and other offline languages), not exactly because it was better, remember? This all fell into a nvidia vs ARB dicussion when it shouldn't have.



quote:You should only write in ASM if you really have to, or if the code is time-critical.

That's funny. Shaders for any performance application where rendering plays a significant part in that performance are always time-critical.

Yes, very funny indeed. But the ASM I was talking about was native not intermediate. And for that time-critical shaders you should have the option to code down to the metal ASM (if it really fits your needs), just like my "worst proposal ever" allows. But glSlang might just eliminate the need for this, and the fact that it's built into the driver just helps it do its job.

john
08-02-2003, 04:25 AM
I think what Korval + whoever are advocating is something equivilent to Java bytecode (and gcc's intermediate assembler).

the way I understand their arguments (and, to be honest, i've only skimmed---I am busy these days) is that you have a very primitive assembly which acts as a wrapper around the actual hardware implementation, but can still be used as a building block for higher level languages.

I see the language that they are proposing is something akin to Java byte code in the same way that you write Java in a high level langauge, but that is compiled into a virtual assembly that you can run on a machine that translates (or emulates, or whatever term you want to use) that assemebly into the actual native machine code for that particular architecture.

I think, to be honest, that it is an intriguing idea. I was initially against the use of assembly as a description language, because a lot of what people are saying against it is true (at least, for CPUs), but I DO like the idea of a language that its abstract enough to hide the real assembly, yet having a langayage that doesnt' dictate what high level language someone writes their shaders in.

Korval
08-02-2003, 01:59 PM
Can we agree on this set of musts, mays and should nots?

For the most part. There may be some minor details, but those would only crop up with a real spec in-hand. The principle is sound.


Maybe I got this wrong, but I always thought glSlang was created to eliminate the need for different codepaths across generations and HW vendors. If it does change, then it's because it's flawed from the start. It should be as flexible as possible. Maybe we don't have the flexible enough GPU's to allow this, but even if current hardware doesn't support all features of the language we shuldn't really care. Just like OpenGL did for 3D in its beggining, glSlang should stabilish the standard for the next years for shading.

You seem to forget your history. OpenGL did a pretty crappy job at establishing a standard for the future. It's vertex array API was, at best, il-concieved for performance purposes. GL 1.0 didn't even have texture objects or vertex arrays. Oh, sure, GL got some things right, but it got a lot wrong, too. It's only been a few months since a decent, non-platform-specific, vertex transfer system was released; there's no excuse for VBO's not having been concieved 3 or 4 years ago. You can't expect to get it all right the first time.

Also, you don't want to get it all right the first time. One of the fundamental problems with OpenGL, that still exists, is not having a real idea of what is going to be fast or slow in GL. The spec only guarentees that it will be done, not how fast. As such, there is a subset of the language that just isn't used, because everybody knows that using it is a bad idea. Unfortunately, the information on what functionality will drop performance is not readily avaliable. Look at how many people ask, "I'm using feature X and now I'm getting 5fps."

If the shading language exposes features that aren't readily avaliable on hardware, then, in order to use it in any kind of performance application, people have to learn which functionality will cause a drop to a software renderer, and then write around that functionality. Since the API can't tell us this, we, basically, have to experiment, or hope that vendors release information telling us what features we can and can't use.

It doesn't do anyone any good to have features that can't really be used. All it does is confuse people, making them believe that feature X exists, when they would never accept the performance penalty.


And for that time-critical shaders you should have the option to code down to the metal ASM (if it really fits your needs), just like my "worst proposal ever" allows.

You seem to keep missing something: all shaders are time-critical (that's why we're not using software-renderers). Therefore, they all must run as optimally as possible.

Another point that you have missed is that the fundamental idea of either approach (glslang or the ISA) is that they will get optimal performance. The goal is to give programmers ease-of-use while still achieving optimal performance.

The reason Intel's compiler is so good, compared to others, is that they have detailed knowledge of their CPU's that other people don't have. They know precisely how to schedual opcodes so that they go through with a minimum of stalls. They know

In order for you to optimize a shader programmed in the native instructions of the chip, you would have to know as much about the hardware as the designers do. And this changes for each vendor and for each hardware revision. Hence, the need to put this burden on the driver writers; they know enough to deal with this issue far better than any of us.


the way I understand their arguments (and, to be honest, i've only skimmed---I am busy these days) is that you have a very primitive assembly which acts as a wrapper around the actual hardware implementation, but can still be used as a building block for higher level languages

I wouldn't necessarily call it "primitive", as it would have to keep a lot of the high-level constructs around, but that's the idea.

john
08-02-2003, 07:54 PM
When i said primitive, I had meant to say "building block", not "simple", or "limited".

stupid english. :-)

t0y
08-03-2003, 01:16 AM
Sorry for the numbering, but it's easier than copy/pasting full quotes.

Korval:
1. OpenGL had some mistakes, yes, but it was far from "crappy".

All the flaws you mention (tex objects, VBOs) refer to the ability to keep data on the server (the card). Their need is directly related to the boom in consumer card's availability which brought us a real generational leap from previous solutions. You'll also notice that both extensions are completely transparent and easily implemented in previous hardware. You won't get the same performance but they work without further limitations.

The same thing will happen once we get superbuffers, another future-proof generalization if made right.

VBO's late arrival is due to the fact nVidia dominated the market for so many years. As long as we had VAR nVidia didn't really cared about it since developpers were actively using their proprietary extensions and helping their business without really having a choice. VBO's (and other recent ARB extensions) only appeared because of (mainly) ATi trying to remove nVidia from having the performance monopoly in OpenGL and this is fundamentaly a good thing. Isn't it?


2. I don't thing I'm missing anything.

"Another point that you have missed is that the fundamental idea of either approach (glslang or the ISA) is that they will get optimal performance."

They should get optimal performance. While the compiler isn't mature enough for a spicific GPU (as you said, it may take months) you should have the ability to take the path of native ASM whose compiler is much simpler because of its similarity with the hardware.
This native ASM compiler should take care of instruction scheduling, dead code elimination and trivial things like that, so that you can still achieve your goal of optimal performance.


John:
3.
"assembly which acts as a wrapper around the actual hardware implementation, but can still be used as a building block for higher level languages."

A stable assembler wraper for the hardware is nice, but it could easily become a overly complicated high-level asm to compensate hardware's differences. I don't know about java's bytecode, but I'm sure it is very complex and that's why nobody uses it directly. Anyway, its objective is to hide source code and ease the interpreter's job (among other things, I'm sure), not for programmers use.

C/C++ has been used as building blocks for higher level languages, so I believe the same applies for glSlang. The main arguments seen here against it are: Run-time compilation (probably time consuming) and making CG redundant (which I think is the main reason we are still arguing here). IMHO I don't think these are important considering what it can (should) achieve.



[This message has been edited by t0y (edited 08-03-2003).]

john
08-03-2003, 02:50 AM
A stable assembler wraper for the hardware is nice, but it could easily become a overly complicated high-level asm to compensate hardware's differences

well... I don't know exactly what you're trying to say. The intermediate assembler (which I am neither advocating nor defeating) is high level insofar as it is not directly implemented on h/w. But this is not unusual; DLX is an idealised assembly whicih is classically low-level (since it IS an assembler) but isn't implemented on h/w... so, is it h/l or l/l? <shrugs> The actual point of the intermediate proposed ASM was that it was an abstract conceptual assembler which is translated by the driver when the programs are bound into h/w specific shader intrstructions. SO, from that point of view, the proposed ASM is _always_ highlevel, irrespective of added complexities.


I don't know about java's bytecode, but I'm sure it is very complex and that's why nobody uses it directly.

no one codes direclty in x86 machine code, either. I haven't heard of a java assembler, and I suspect Sun would be against it because it defeats the intension of java. Sun want people to program JVMs with Java, not byte code.


C/C++ has been used as building blocks for higher level languages,

that is true. Well, I wouldn't say C++ is used as an intermediate language, but C certainly is. But, then, C is NOT high level; it isn't much more than assembly anwyay.


so I believe the same applies for glSlang

Certainly; I agree. A compiler only translates one language into another.

I am still impartial to the assembly proposal. I just like arguing. :-) or debating, or whatever you want to call it. I DO think it is an interesting idea, though.

cheers,
John

aboeing
08-03-2003, 03:07 AM
Also, you don't want to get it all right the first time. One of the fundamental problems with OpenGL, that still exists, is not having a real idea of what is going to be fast or slow in GL. The spec only guarentees that it will be done, not how fast. As such, there is a subset of the language that just isn't used, because everybody knows that using it is a bad idea. Unfortunately, the information on what functionality will drop performance is not readily avaliable. Look at how many people ask, "I'm using feature X and now I'm getting 5fps."


The alternative to this is having a language that gets updated every time a new gpu is released with added functionality, and I imagine this would get even more annoying (rewriting your program to include/exculde this particular feature for certain GPUs). If you had a nice complete&simple (RISC) assembly language, then there wouldnt be any limiting in functionality and there wouldnt be any need to rewrite the language all the time. (I dont think it would need to be complex at all - basically look at how all the rendering is done in movies, which instructions/features their shaders use, and your done. you can build a complete & simple assembly language) [

Vendors could still release compliers for their platforms, and just not-use or avoid using instructions which cary penalties on their platforms. - (most) the end users would never have to know what to avoid.

Aparently codeplays VectorC gives better performance than intel's compiler, so the vendors arn't always the best people. We should still allow someone else to write a compiler for someone elses gpu. If you want to have gslang I think vendors should be forced to provide an interface to their internal assembly language - it would solve all these problems. (if you didnt like the drivers compiler, you could just drop into their assembly language, plus people could write their own compilers&languages..)

Anyway, if you think about it, MS is likely to be using a different shading language in directX, which means vendors are going to have to write two compilers anyway, so they probably store their information into some intermediary format anyway which they then assemble down to their GPUs instructions. (to save on development time) (also, they would probably write one compiler for all their different GPUs - more reason for an intermediary step). if they are going to need it anyway, then they might aswell expose it.

zeckensack
08-03-2003, 03:32 AM
Originally posted by aboeing:
Anyway, if you think about it, MS is likely to be using a different shading language in directX, which means vendors are going to have to write two compilers anyway, so they probably store their information into some intermediary format anyway which they then assemble down to their GPUs instructions. (to save on development time) (also, they would probably write one compiler for all their different GPUs - more reason for an intermediary step). if they are going to need it anyway, then they might aswell expose it.The DXG HLSL compiler is written by MS and sits in the runtime layer (!=driver). All drivers ever see is PSx.y assembly. This is a major shortcoming as documented by the recently incepted PS2_a profile (which exists because the PS2 profile sucks for NV cards, to put it bluntly).

Nico_dup1
08-03-2003, 03:32 AM
Korval et al. want a "High Level Assembler", others a "Low Level C"...both have to be compiled by the driver to native code, both have to be parsed (the "assembler" has to look for constructs like taylor series expansions for sini to replace them with a native sin() if present, not an easy job to do, too). So this is all about the syntax? And other HLSLs can be compiled to a high level assembler as well to a low level c.

I found only two points where opinions differ:
1. glslang tries to be "forward-looking" and puts a whole lot stuff that is by now unsupported in it. Some people don't like that and want new functionality to be released as new extension when available. This has nothing to do with the language, it's a different decision.
2. Some people like to save parse time by using an immediate format. A generic immidiate formate has to be parsed for optimal performance, too. So what you're asking for (?) may be the possibility to get the compiled native code back to save to disk and to load that the next time the app starts. This was discussed by the ARB but was not found important.

aboeing
08-03-2003, 06:30 AM
Sorry, I didn't make myself clear.

If MS keeps the DirectX shaders the way it is, (HLSL/Cg compiling down to PS/VS assembly) then intermediary assembly is going to be around anyway.

If MS decides to try to do something similar (HLSL/Whatever compiling in the drivers) then we have the situation I described above, and the vendors will probably have an intermediary system.

Either way, you have a very good chance of there being an intermediary format around regardless, so why not expose it?

In fact, in the first case, (I dont want to start a holy war) because DirectX is very common in games I would imagine vendors would put a lot more effort into ensuring optimal performance with DirectX (evidence: ATI only recently got decent OpenGL drivers).

Since DX9 exposed PS3&VS3 we can expect this system to be around for at least the next 2 years,.....

Korval
08-03-2003, 02:06 PM
OpenGL had some mistakes, yes, but it was far from "crappy".

Maybe I'm alone in this, but playing games of, "Guess the optimal API to use" is not particularly enjoyable.


All the flaws you mention (tex objects, VBOs) refer to the ability to keep data on the server (the card). Their need is directly related to the boom in consumer card's availability which brought us a real generational leap from previous solutions.

IE, the future. GL 1.0 wasn't too future-proof, and neither was 1.1. If the ARB couldn't predict something like that, even when 3DFX was getting the Voodoo 1 together, then how can they be trusted to find a future-proof high-level language.


The alternative to this is having a language that gets updated every time a new gpu is released with added functionality, and I imagine this would get even more annoying (rewriting your program to include/exculde this particular feature for certain GPUs).

That's not how it works. The ISA gets updated and extended. However, much like OpenGL itself, old language features are still supported. Therefore, shaders compiled to the ISA before the update would still work on compilers after the update.

The job of translating from one shader version to another is on the compiler. You compile for the new version, using as much of your old high-level language as possible. Granted, your HLL will likely have to be updated too, in order to expose new features, but that's expected.


the end users would never have to know what to avoid.

If a vender doesn't support, say, texture accesses in the vertex shader, that API in glslang still exists. A compiler can't just ignore the command; the rest of the shader depends on this working. You can't just ignore instructions; you have to compile the program as written, or it will not work (and inexplicably so).


the "assembler" has to look for constructs like taylor series expansions for sini to replace them with a native sin() if present, not an easy job to do, too

If the user/compiler didn't use the 'sin' opcode, but instead wrote a Taylor expansion, they deserve what they get.

Having functions like 'sin', et al, is a necessary first step in any hardware abstraction shading language.


glslang tries to be "forward-looking" and puts a whole lot stuff that is by now unsupported in it.

Have you looked at glslang? It's certainly not as "forward-looking" as it could be.


A generic immidiate formate has to be parsed for optimal performance, too.

As we have stated, parsing an assembly-like language is easier than parsing actual C.


If MS keeps the DirectX shaders the way it is, (HLSL/Cg compiling down to PS/VS assembly) then intermediary assembly is going to be around anyway.

But their "intermediary assembly" is too low-level. It loses important constructs for optimization, and introduces actual register numbers (forcing the HLSL compiler to do register allocation, which is always a bad idea). The glslang approach is superior to the DX approach, because the assembly doesn't carry enough of the high-level constructs from HLSL to make optimization the job of the compiler alone.

aboeing
08-03-2003, 08:52 PM
The job of translating from one shader version to another is on the compiler. You compile for the new version, using as much of your old high-level language as possible. Granted, your HLL will likely have to be updated too, in order to expose new features, but that's expected.

Well I dont expect this, I think its poor planning. When these new features in the HLL are released, you may decide to write a program for that card, and all of a sudden it wont work with the older cards. Now people are going to need to know which instructions they can and cant use. This will achieve absolutely nothing. We are going to have the same problem we have now, where we need to write specific versions for each GPU.



If a vender doesn't support, say, texture accesses in the vertex shader, that API in glslang still exists. A compiler can't just ignore the command; the rest of the shader depends on this working. You can't just ignore instructions; you have to compile the program as written, or it will not work (and inexplicably so).

Then thats plain bad luck. Maybe the next set of GPU's will support all the features of the language. We should have a complete languge, and vendors can choose to implement whichever features they want. Eventually they will implement them all, and everyone will be happy. If they dont, we can work around it, and complain a lot until they, or a competitor does.

If you force people to use an incomplete language this doesn't push the vendors forwards. If we have a complete language then the vendors have something to work towards. It forces progress. (wasn't this the whole point of PS3?)



If the user/compiler didn't use the 'sin' opcode, but instead wrote a Taylor expansion, they deserve what they get.

Given that most modern CPU's represent sin/cos internally as Taylor series or newton-raphson approximations, then why bother having a sin keyword at all. Likewise for all other complex instructions. keep it simple. at least that way a user/compiler can control the precision of their approximation with the number of terms they use in their expansion.



But their "intermediary assembly" is too low-level.

Agreed, im not sure how one would do this, but some system where everything except the registers are allocated would be good..
Just some way of setting instructions as out-of-order as possible, and some way for the drivers to rearrange things so that the registers get allocated optimally... a whole bunch of 'hint' flags for each "instruction" ?

Korval
08-03-2003, 10:43 PM
We should have a complete languge, and vendors can choose to implement whichever features they want.

The language, to some extent, has to mirror the hardware.

For example, let's take the case of vertex shader texture accesses vs. bindable memory arrays. These two APIs look very different. They have different limitations. However, only one of them will ultimately be used, because they provide the same basic functionality. If glslang forces the texture-access API, but the hardware finds that the bindable-memory version is much easier to implement (let alone the host of other reasons to use it), then drivers will have to do a lot of behind-the-scenes work to implement the glslang version.

Instead of trying to predict the future, simply be ready to pounce when the future comes. To the extent that OpenGL predicted the future correctly, it has done a good job. The problems come when something unexpected happened; the ARB hasn't acted fast enough to provide appropriate extensions to access functionality in a platform-neutral fashion. This must change if OpenGL is to continue to be used.


If we have a complete language then the vendors have something to work towards. It forces progress. (wasn't this the whole point of PS3?)

Vendors absolutely should not work towards fulfilling an API. Their hardware featureset should be based, not on moving towards an API, but on building functionality that software vendors want. After all, it is software that changes things. API's should follow hardware vendor's leads, not the other way around (which is why it is a good idea for GL to be overseen by a group of hardware vendors).

If nVidia hadn't been moving towards programmability back in the GeForce 1 days (register combiners), where would we be now? After all, the API wasn't there until they defined it. That's a particular example of not being able to predict the future; there are plenty more.


Given that most modern CPU's represent sin/cos internally as Taylor series or newton-raphson approximations, then why bother having a sin keyword at all. Likewise for all other complex instructions. keep it simple. at least that way a user/compiler can control the precision of their approximation with the number of terms they use in their expansion.

If they want to use a Taylor expansion, they are free to do so. But, they should always have the opportunity to get the (potentailly) hardware-accelerated version.


Agreed, im not sure how one would do this, but some system where everything except the registers are allocated would be good..

The appropriate level is not far from the ARB extensions. You declare variables (and, in the ISA, structs, arrays, etc). And you operate on them. Variables have identifiers declaring what kind they are (type information, plus identifiers like "varying" or "uniform"). Temporaries would either have to be explicitly created, or built on the fly.

These are different from the explicit register allocation in the various Direct3D languages. It allows implementations to virtualize resources much easier (which is one reason why ARB_* and glslang take this approach).

[This message has been edited by Korval (edited 08-04-2003).]

mattc
08-04-2003, 01:08 AM
hi folks, just wanna jump in quickly - gotta say, kinda rare on this forum to see a fairly well developed discussion like this, cheers korval for getting the ball rolling http://www.opengl.org/discussion_boards/ubb/smile.gif

anyhow, my points...

- an abstract asm-like language would be a nice bonus for people who like such stuff and would easily be the best possible compromise for all affected (coders, h/w designers, driver teams)... i'd consider it an excellent replacement for proprietary extensions http://www.opengl.org/discussion_boards/ubb/smile.gif

- having a c like language (i.e. more human oriented) will be a necessity sooner or later, doing this now means more probs later (more revisions as things inevitably change) but an earlier start (horses for courses)...

- not having sin/cos/tan in whatever language we eventually end up with is a ****ing sin of stupidity, no two ways about it. h/w accelerated, accurate floating point is how things will have to be... and i for one can't wait, i had enough of the ****ty shortsighted "minimum spec increment" approach the mainstream pc industry seems to like so much.

consider this: we've had 32 bit fp simd instructions with sse & 3dnow for quite a while now... on the other hand, graphics h/w with its zillions of transistors and incredible memory bandwidth is only just moving towards a greater-than-8-bit datapath - considering how obvious *that* particular wall was, you'd think all the "we're great for gfx" vendors would've been a little less shortsighted.

if you wanna do taylor series-style optimisation in order to milk the current h/w that's fine but there's absolutely no reason for that to preclude having dedicated, high quality trig opcodes (and yes, even x87 fpu have them - so "trig-less" gfx cards are really just cheapo nasty data shifters) which *will* be fast enough for anything one day - remember, it's essentially a divisor, much ilke texture perspective correcion... and will give you the sort of quality that only high precision can yield.

my 2p - sorry if this sounds a bit fundamentalist but i'm sick of speed being used to justify dropping quality when there's no real reason for that; both can coexist in one h/w card... after all, it's ultimately quality, not speed, which drives any field forward http://www.opengl.org/discussion_boards/ubb/smile.gif

Korval
08-04-2003, 09:20 AM
consider this: we've had 32 bit fp simd instructions with sse & 3dnow for quite a while now... on the other hand, graphics h/w with its zillions of transistors and incredible memory bandwidth is only just moving towards a greater-than-8-bit datapath - considering how obvious *that* particular wall was, you'd think all the "we're great for gfx" vendors would've been a little less shortsighted.

This is a little off-topic, but you are aware that there are hundreds, if not thousands, of differences between a graphics chip and a CPU, yes? That internal memory bandwidth, let alone computational expense, is a precious commodity, not to be squandered lightly (having register combiners at all is better than just having internal floating-point with no programmability or configurability). And, that, because of how hardware acceleration works, in order to get floating-point processing, you would have to replicate something like SSE 4-8 times (cards have 4-8 pixel pipes, each of which would have to have its own SSE-style unit). It'd be even more than that, because the functionality that SSE exposes is inferior to what modern per-fragment operations allow (Dot3, for example, is not an SSE op).

In short, you have to crawl before you can walk. They didn't give us floating-point until now because it just wasn't cost-effective. These are consumer graphics cards, after all.


but i'm sick of speed being used to justify dropping quality when there's no real reason for that; both can coexist in one h/w card... after all, it's ultimately quality, not speed, which drives any field forward

Also OT, but this is not true.

Every feature of modern hardware can be perfectly emulated on a CPU. Indeed, a CPU can do much better than this, or next, generation's hardware. It'll always be able to do better.

So... why don't we use our CPU's? Performance. We've given up full control over the renderer, and therefore quality, in order to gain performance. And consumer graphics card makers trade off quality for performance; after all, consumers buy graphics cards for the speed of the product they're interested in, not the look. That's why the Radeon8500 ultimately failed against the GeForce4, even though it was functionally superior to it in most ways.

mattc
08-05-2003, 08:30 AM
korval, i had a proper reply going but lost it so i'll keep it brief...

considering how much you had to "say" regarding sse (as if i said that's exactly what gpu's should use), it seems to me you enjoy writing lengthy replies too much to see the gist of what i was saying. point taken about these cards being consumer level, though.

regarding quality and the bizarre radeon8500/gf4 rant, if no one wanted better quality we'd still be using 80x40 mono "graphics", not waiting for doom3 or whatever.

Korval
08-05-2003, 10:47 AM
as if i said that's exactly what gpu's should use

You made the comparison. You, basically, equated SSE and 3DNow with GPU's, without any comment regarding the real fact that these are two very different pieces of hardware. As such, my argument was valid.


it seems to me you enjoy writing lengthy replies too much to see the gist of what i was saying.

The gist of the comments was some notion about how CPU's have had SSE and 3DNow for a while, so why have GPU's not had floating-point processing? The comment makes no sense; you're equating two very different things together. My post was intended to show the rediculousness of the claim.


regarding quality and the bizarre radeon8500/gf4 rant, if no one wanted better quality we'd still be using 80x40 mono "graphics", not waiting for doom3 or whatever.

The fundamental fact is that performance and quality are inextricably linked. A "quality" feature that, effectively, can't be used because it hampers performance too much isn't a useful feature, and is therefore left out of the card. Only features when can be implemented at reasonable speeds are considered. Which is why nVidia and ATi aren't deuling over ray-tracing-based cards.

Nakoruru
08-05-2003, 10:59 AM
Wow, long thread...

Anyway. I am not sure if a single standard assembly language across all GPUs is a good idea at all. Assembly language is good for getting straight at the hardware, but if the underlying ISA is not the same as the standard, then we have turn everything on its head.

This does not mean I am for putting glslang in the drivers. The reason is that it forces us to use glslang as our language to communicate with the driver, and some of use may want to do things differently.

I think that drivers should expose their NATIVE assembly languages, with no pretentions of backwards compatibility at all. If glslang is to be standard, then it should be as a standard compiler interface which submits code to the assembler in the driver. That way, you can write glslang and submit it on any platform that follows the glslang compiler standard.

I believe that it is FUD for people to suggest that languages like Cg have to compile to a standard, inefficient, assembly language that no real hardware actually implements. nVidia's approach, with native profiles, is actually quite flexible. You can have a standard assembly language, but you are not required.

The advantage of seperating the compiler from the assembler (like in a normal C toolchain) is that now, ambitious people, can write compilers for languages different than glslang. The driver only understands assembly.

glslang could still be built into the driver, but it would be nice if vendors would expose the native assembly interface. I think that a translator from myslang to glslang would be less than optimal, just like translating assembly language.

I used to believe that translating assembly language was trivial, but one of the guys from 3Dlabs IIRC corrected me by telling me that the vertex program 'reassembler' for their card is more than 10k lines of code.

The ideal in my opinion would be to have a standard interface for exposing the assembly language of a card, but to not define an actual language. This interface could be the very same as the one which exposes glslang if it is built into the driver.

glslang would be a seperate program, with a standard interface, provided by the IHV, which produces assembly code. All opengl implementations would be required to provide it.

Cg profiles, or some other language not dreamed of yet, could be built to also spit out assembly for specific chips.

It would also be possible for anyone to create a portable assembly language reassembler which would spit out the translated code, but I find the idea of a portable assembly language to be pretty absurd and pointless.

My main point is that even while it is good that OpenGL have a standard high level shading language (perhaps even built into the driver), vendors should always provide a way to get at the true low level so that people who want to develop their own tools on top of the hardware have an efficient path.

There is no reason to cut off people from using the low level assembly if you make it clear that the assembly may totally change next generation ^_^

At some point, the assembly will settle down, simply because eventually each vendor will hit upon something that works well and they will not want to totally reimplement the glslang or Cg compiler every 12 months. Once this happens, it may even make sense to write directly in assembly language. You wouldn't be able to if it is hidden.

Obviously, directly exposing a vendor specific assembly will be as an extension, and admittedly it is kinda weird to require that all vendors implement an extention, but that makes much more sense to me than forcing them to implement a goofy standard assembly language.

I think that this position is basically the same as nVidia's, except that they probably do not want to compromise and let glslang be a part of the driver. Also, I believe that their stand on a standard assembly language is due to marketing, i.e., they want to make sure that Cg can run anywhere. The only way that can happen without cooperation is to force a less than optimal standard assembly language on them. I am sure that nVidia would rather everyone create an optimized profile for Cg and then everyone can forget about ARB_vp and ARB_fp (which I believe exist mostly so that Cg can run on more than just nVidia hardware).

Korval
08-05-2003, 01:28 PM
I used to believe that translating assembly language was trivial, but one of the guys from 3Dlabs IIRC corrected me by telling me that the vertex program 'reassembler' for their card is more than 10k lines of code.

No one is pretending that the compiler for the ISA would be simple or trivial. The lanugage would have to retain enough high-level constructs to make compilation quite complex, though less so than for a full-fledged glslang compiler.


The ideal in my opinion would be to have a standard interface for exposing the assembly language of a card, but to not define an actual language. This interface could be the very same as the one which exposes glslang if it is built into the driver.

Besides the hell that this would cause, in terms of supporting different hardware, understand that the knowledge required to optimize code for hardware begins with the native language; it doesn't end there.

After you get a native language, then you run into scheduling. Doing scheduling correctly requires, basically, hardware schematics of the processor in question. In general, it also requires access to the hardware developers themselves. This is one of the reasons why Intel's compiler is better than other x86 compilers; they know their chips better than anyone.

nVidia, ATi, 3DLabs; they can do optimizations, because they have the intimate knowledge of their hardware. We cannot, because we do not have that knowledge. We, therefore, must have an intermediary between us. The way the ARB has chosen is to use glslang as that intermediary. We're simply proposing that a lower-level language be developed to do the same thing.

tfpsly
08-05-2003, 11:02 PM
Originally posted by Korval:
nVidia, ATi, 3DLabs; they can do optimizations, because they have the intimate knowledge of their hardware. We cannot[...]

Which is why I believe it'd be far easier for driver coders to optimize something that would as simple as:
out = texture1*dot(texture2,color)^n

than the lines of asm that would do the same thing. Recoding some asm into another asm language is quite hard, and you'll easily miss some instructions that could help much on a given platform but are so specific that a reassembler would never "think" about using them.

mattc
08-06-2003, 12:12 AM
korval, it's not really possible to argue with you seeing as you generally answer to your own assumptions, not what's really being said. plus you like throwing in unrelated stuff to add "weight" to your arguments (like gf4/radeon8500 stuff and now ray tracing - very relevant and on-topic). i'll leave it here, seeing as you're always right; enjoy your thread.

Nakoruru
08-06-2003, 04:31 AM
Korval, you seem to think that the only reason that assembly is more efficient is because it is low level, therefore, any assembly language, even one that is abstract and not implemented by any hardware is better than a high level language for efficiency.

The big problem is that assembly language simply does not have enough information in it to optimize well. A compiler for glslang will almost certainly produce more efficient code than the ultimate result of a compiler which produces an abstract assembly which is then reassembled.

Even if outsiders will not be able to create the most efficient code because they do not have intimate knowledge, it would still be better than an abstract asm.

Perhaps what you imagine is not asm, but a text format representation of the interface between a compiler front and back end, like gcc's register processing language, which looks more like Lisp than assembly. Such a language would make fewer assumptions than glslang, but not have most of the information about what you are trying to do stripped out of it like assembly.

i.e., it would be easier for compilers to target while still being high level enough to allow for better hardware specific optimizations and it would also be easier (on average) to write a translator for.

aboeing
08-06-2003, 04:35 AM
Hey, lets try to keep it civilized.



Anyway. I am not sure if a single standard assembly language across all GPUs is a good idea at all. Assembly language is good for getting straight at the hardware, but if the underlying ISA is not the same as the standard, then we have turn everything on its head.


This does seem to be what generally people oppose, but a strict assembly language isn't what were proposing. And even so, I am not so sure it really is such a big problem, since we have DirectX PS/VS and arb vp&fp.



I used to believe that translating assembly language was trivial, but one of the guys from 3Dlabs IIRC corrected me by telling me that the vertex program 'reassembler' for their card is more than 10k lines of code.


Fair enough, the glslang implementation of theres, which is not complete, comes to 19654 lines of code.



Recoding some asm into another asm language is quite hard,

The real question here is which is harder? Writing a asm re-orderer/interpreter or a full fledged compiler?

Which option will be more beneficial&flexible for us? A standard higher level language, or a standard lower level language?

People keep making it sound like they are going to be the ones writing in assembly if that approach is adopted. You wont. It will be just like we have now, you will code in Cg, or whatever language you want, and behind the scenes that will get compiled into an intermediary representation. Just like what happens when you use GCC.

Anyway, from 3dLabs glslang:
"4. Reduction of the tree to a linear byte-code style low-level intermediate representation is likely a good way to generate fully optimized code.
There is some possibility of standardizing such a byte code."

Correct me if I am wrong, but what they are talking about here is the intermediary assembly language we are suggesting. Notice the "good way to generate fully optimized code".

EDIT:
(Nakoruru posted while I posted)
Yes, this GNU-RTL like language is what I had in mind. And I assume thats what Korval is talking about too. (And I assume that is similar to what 3dLabs is refering too)

[This message has been edited by aboeing (edited 08-06-2003).]

Nakoruru
08-06-2003, 09:14 AM
aboeing,

We seem to be on the same page. I have not read the latest 3Dlabs papers on glslang, so maybe they aren't far away from what I am thinking either.

The reason that 3DLabs ARB_vp translator was so complicated was because the paradigm of their assembly language was completely different than nVidia's (which ARB_vp is based on). The main difference being htat ARB_vp is vector based, while 3Dlabs underlying hardware is not IIRC.

I agree that the normal programmer is not the one who will be writing assembly or register tranfer language, or whatever, most of the time.

Pop N Fresh
08-06-2003, 09:33 AM
One of the nice things about OpenGL is the ease you can get something up and runnning. Having the driver take shaders in a format that's not easily human readable would be a mistake as you are then requiring an additional tool step to get something working. Also file i/o or some sort of embedded resource would be needed store the pre-parsed shader bytes. This just doesn't fit in the with style of OpenGL.

Now assumming we need a human readable format we need to decide what that format looks like. We want high-performance but need to make as little assumptions about the underlying hardware as possible. We need a language that works as a sort of 'portable assember'. This is exacly what C was designed to be and seems like a well-tested model to base a shader language on.

I'll note that parsing overhead is trivial. Many high-performance games use scripting languages like lua or python for game code.

Korval
08-06-2003, 10:01 AM
Which is why I believe it'd be far easier for driver coders to optimize something that would as simple as:
out = texture1*dot(texture2,color)^n

than the lines of asm that would do the same thing.

Why? They are both, ultimately, expression trees, which is what the optimizer gets when it goes to optimize. The assembly version is significantly easier to parse by comparison.


Recoding some asm into another asm language is quite hard, and you'll easily miss some instructions that could help much on a given platform but are so specific that a reassembler would never "think" about using them.

Considering that the "reassembler" is fully in the control of the people who most know about the hardware (ie, IHV's), and these same people have a vested interest in making the "reassembler" compile to the most optimal internal code possible... why would they miss something? They have a really good reason to make their compiler as optimal as possible. That's like saying, "VBO is a bad extension because IHV's could implement it just like they do regular vertex arrays." We both know they aren't going to, so there's no real reason to complain about an eventuality that will not be realized.

BTW, I would not consider it a "reassembler". Because it is responsible for the low-level optimizations, it is much more of a compiler.


korval, it's not really possible to argue with you seeing as you generally answer to your own assumptions, not what's really being said. plus you like throwing in unrelated stuff to add "weight" to your arguments (like gf4/radeon8500 stuff and now ray tracing - very relevant and on-topic). i'll leave it here, seeing as you're always right; enjoy your thread.

Well, considering that we've been able to convert Zeckensack, to a degree, I'd say that the argument has gone pretty well for my side. So, clearly, there must be some substance to my arguments.

As for the OT parts, if you will read the thread again, I clearly noted them as such.


it would be easier for compilers to target while still being high level enough to allow for better hardware specific optimizations and it would also be easier (on average) to write a translator for.

...Didn't I say something like that?

Oh well. Another convert http://www.opengl.org/discussion_boards/ubb/wink.gif


One of the nice things about OpenGL is the ease you can get something up and runnning. Having the driver take shaders in a format that's not easily human readable would be a mistake as you are then requiring an additional tool step to get something working.

2 things:

1: The ISA isn't necessarily not human readable. ARB_vp is human readable; the ISA wouldn't be terribly far removed from it.

2: The "ease you can get something up and runnning" with OpenGL days have never applied to shaders. Shader programming has always required time and effort, simply in understanding how they all interrelate together. That "ease" comes in the form of the fixed-functionality of OpenGL.


Also file i/o or some sort of embedded resource would be needed store the pre-parsed shader bytes.

It's not in binary. The ISA would be stored as text.


I'll note that parsing overhead is trivial. Many high-performance games use scripting languages like lua or python for game code.

Do you really think that they re-parse each line as it is used? Lua can be pre-compiled into a nice, neat binary form on disk. I can't say about Python, but I wouldn't be surprised. And, even if there is no off-line Python compiler, the moment it gets loaded, it is converted into a memory representation of that code. Compiled, as it were.

Also, you will note that they don't use these scripts in time-critical areas. And, as mentioned before, shaders are always time-critical.

[This message has been edited by Korval (edited 08-06-2003).]

al_bob
08-06-2003, 10:30 AM
2: The "ease you can get something up and runnning" with OpenGL days have never applied to shaders. Shader programming has always required time and effort, simply in understanding how they all interrelate together. That "ease" comes in the form of the fixed-functionality of OpenGL.
That's not to say that writing simple test code that uses shaders isn't also done with ease.
Of course, having access to AllegroGL's automatic extension loading mechanism helps http://www.opengl.org/discussion_boards/ubb/smile.gif

Pop N Fresh
08-06-2003, 10:40 AM
1: The ISA isn't necessarily not human readable. ARB_vp is human readable; the ISA wouldn't be terribly far removed from it.People have been talking about byte-code representations and parse trees. "isn't necessarily not human readable" also implies "could perhaps be non-human readable".
2: The "ease you can get something up and runnning" with OpenGL days have never applied to shaders. Shader programming has always required time and effort, simply in understanding how they all interrelate together. That "ease" comes in the form of the fixed-functionality of OpenGL.If you look in the OpenGL programming guide they use simple arrays to create data for use by DrawArrays or DrawElements in thier examples. You can make an array of chars for use by ARB_vertex_program in a simular way. Although you will no doubt later move to using external model and shader files for flexibility. If using the shading language requires an offline compile step this in no longer possible.
It's not in binary. The ISA would be stored as text.This has nothing do with it whether its binary or text. If it is produced by an offline compiler it needs to be loaded into the program somehow.
Do you really think that they re-parse each line as it is used? Lua can be pre-compiled into a nice, neat binary form on disk. I can't say about Python, but I wouldn't be surprised. And, even if there is no off-line Python compiler, the moment it gets loaded, it is converted into a memory representation of that code. Compiled, as it were.You mean just like a shader language will be parsed and compiled when the shader is loaded? You mean my analogy was exact? "I'd say that the argument has gone pretty well for my side". Only because people get exasparated with debating tactics like the above and stop bothering. Just like I am after this last post.

Korval
08-06-2003, 12:39 PM
People have been talking about byte-code representations and parse trees. "isn't necessarily not human readable" also implies "could perhaps be non-human readable".

While I, personally, think it might not be unreasonable to have a byte-code form in addition to the regular text format, I would suggest that text be the accepted way of feeding in ISA shaders. Otherwise, it becomes too difficult to write trivial shaders. Parsing a text assembly-esque shader isn't really so difficult; it's not that much more expensive than parsing the binary byte-code (which still must retain various high-level syntax).


You can make an array of chars for use by ARB_vertex_program in a simular way.

...

I don't follow. How is building a vertex array at all similar to building a vertex shader? Besides the fact that they are both stored in memory?


If it is produced by an offline compiler it needs to be loaded into the program somehow.

So do textures, meshes, etc. Loading such things has nothing to do with OpenGL.


You mean just like a shader language will be parsed and compiled when the shader is loaded? You mean my analogy was exact?

Yes, and no.

As I pointed out, Lua can, and frequently is, compiled offline, and loaded as a post-compiled substrate that is then used by the Lua interpreter. Which is precisely how the ISA scheme works.


Only because people get exasparated with debating tactics like the above and stop bothering. Just like I am after this last post.

If you want to go on believing that, it is your perogative. However, a number of people have seen the arguments for what they are, weighed the evidence, and found that a lower-level approach to a glslang would be benificial on a number of fronts, while still providing enough high-level constructs to optimize on a variety of architectures.

Even 3DLabs agrees with me, as aboeing pointed out (except for the part about using a byte-code rather than text).