PDA

View Full Version : Pixel Shaders with CG



Ninja
07-02-2002, 10:39 PM
Is it true that pixelshaders doesn't work for opengl in CG right now?
When will it do so?

knackered
07-02-2002, 10:46 PM
NVidia will release a profile soon.
Doubtful whether any other vendor will release a profile at all - so you'll only be able to use Cg on nvidia cards.

davepermen
07-02-2002, 10:53 PM
GL_ARB_vertex_program is out soon. and www.cgshaders.org (http://www.cgshaders.org) states that it works on all gl1.4 hardware. so vertex functions can be coded soon. pixelfuncs, well.. gl1.5 http://www.opengl.org/discussion_boards/ubb/smile.gif

Ninja
07-02-2002, 11:27 PM
1. Isn't there a good chance NVidia will compile Cg down to GL2 once it became more standard.

2. When will opengl 1.5 be released?

3. What to do if I want to write opengl pixel shaders right now?

-Ninja

Korval
07-03-2002, 09:52 AM
What to do if I want to write opengl pixel shaders right now?

You'll have to use the vendor-specific extensions(ATI_fragment_shader or NV_texture_shader1_2_3 & NV_register_combiner & 2).

knackered
07-03-2002, 10:05 AM
I'm not sure if it's worth bothering with Cg in OpenGL, to be honest.
1) it only works on nvidia hardware in opengl. Other vendors probably won't support it...everyone's got their own agendas...just look at the proliferation of vendor specific extensions to see what I mean.
2) its very limited on even the most up to date nvidia hardware
3) because of these limitations, you'll have to learn about vertex programs and register combiners *anyway* to understand why you can't do what you want to do in Cg - which defeats the point, somewhat.
4) There's far more documentation and help available for using register combiners, vertex programs, and fragment shaders etc. at the moment, and probably will be for a good while yet. (lots of gotchas etc.)
5) It looks like it's far more limiting than the shader language proposed in the opengl 2.0 specs...so why bother?

You get the point?

It looks like it will be quite useful in D3D though.

Nutty
07-03-2002, 10:22 AM
1) That wont be true when ARB_Vertex_Program comes out. And if ARB_Fragment_Program comes out as well, then pixel "shaders" will work on all gl hardware that supports that.

2) Only limited by the quality of the code produced by the compiler, which will get better.

3) Nothing new there. Try and write, and debug a full game project without knowing anything of assembler on the target hardware. Bit hard, but you dont need to be a god at it, thats where the compiler does the work for you. But knowledge of the hardware helps immensely with problems/debugging.

4) Well yeah, but how much doc/examples was there when VP's 1st came out? I remember doing stuff on them with detonator version 7, and there wasn't much stuff about.
http://opengl.nutty.org/cgtest1.zip My 1st example http://www.opengl.org/discussion_boards/ubb/smile.gif

5) Not really, theres stuff unsupported as yet. But it's basically C, and you can do anything in C. Theres no reason why further language support can't come out later to make it easier to code for future hardware.

Nutty

knackered
07-03-2002, 11:23 AM
Why would someone choose Cg over the gl2 shading language?
3dlabs Wildcat VP does support that shading language now (to a large extent). The new matrox chipset also looks as if it could. The gl2 shading language has really cool things in it like being able to take the frame buffer pixel value as input into your shader, thus unifying combiners and blending.
On the current hardware (ie. non gl2 compliant) pixel/vertex shaders are not at all difficult to write using the basic assembly-like opcodes that have been introduced, because they can never be over a certain size/complexity because of hardware limits.
As soon as those hardware limits are overcome, then the hardware will probably be gl2 compliant, and therefore able to use the gl2 shading language...where does that leave Cg? Looking pretty dated, I would imagine.

This is all irrelevant anyway, as gamers hardware is always eons behind the current generation.

zed
07-03-2002, 09:00 PM
>>where does that leave Cg? Looking pretty dated, I would imagine.<<

nah mate, they'll release an updated cg spec.
personally this is related to what sucks the most about cg.

Q/ why are there no support for looping etc capabilities
A/ because nvidia cards dont support it yet (no other reason)

what a very fair standard that is

(adapted from blackadder where the witchsmeller explains how to they tell if a person's a witch)

Nutty
07-03-2002, 11:27 PM
Why would someone choose Cg over the gl2 shading language?

erm.. because Cg is here and working right this moment, and gl2 isn't. As far as consumer hardware goes anyway.

I dont think you understand precisely what Cg is. It's not an API. It's a language, and languages dont go out of date, just because some new hardware comes out.

Thats like saying we're gonna have to start coding Unix in Java cos C is out of date.

Why is there no loops/branching, well probably because no-hardware at all supports it, except the P10 based cards, but these features will probably appear in the compiler pretty soon anyway, with the profile determining weather they can be used or not.

On current hardware vertex/fragment programs are not difficult, true, but Cg creates a common interface to all of them, and _including_ gl2 as well. When it eventually arrives, and is implemented on consumer class hardware. Theres nothing stopping you using Cg and OpenGL 2.0 _together_.

Nutty

zed
07-03-2002, 11:46 PM
>>Why is there no loops/branching, well probably because no-hardware at all supports it, except the P10 based cards,<<

What a very fair standard that is http://www.opengl.org/discussion_boards/ubb/smile.gif

---------------------------------
Edmund: Witchsmeller, my dear, if you do happen to come across someone who's a bit -- you know, um -- witchy, how do you prove him guilty?

Witchsmeller: By trial or by ordeal.

Edmund: Ah, the ordeal by water...

Witchsmeller: No, by axe.

Edmund: Oh!

Witchsmeller: The suspected witch has his head placed upon a block , and an axe aimed at his neck. If the man is guilty, the axe will bounce off his neck , so we burn him; if he is not guilty, the axe will simply slice his head off .

Edmund: What a very fair test that is.

knackered
07-03-2002, 11:52 PM
Cg is nvidia specific, Nutty. So no generic solution exists, until gl2.


I dont think you understand precisely what Cg is. It's not an API. It's a language, and languages dont go out of date, just because some new hardware comes out.

I understand it's supposed to be a language. But, it's a language with some major restrictions (no looping, or branching). I'm sorry, but even Cobol had branching, so I'm loathed to call Cg a language in its current state.


but Cg creates a common interface to all of them, and _including_ gl2 as well.

No it doesn't. It creates an interface that will work on NVidia hardware in OpenGL, until the ARB introduce a fragment_shader extension - and people will always complain that that will be too limiting (NVidia and ATI have vastly different capabilities in hardware) - "use register combiners, they're more powerful!"...just the same arguments that people came up with when d3d introduced pixel shaders - "they're not as powerful as register combiners!".

But it all comes down to choice. Do you want NVidia to control the capabilities of your shading language, or do you want a comittee of vendors to agree on a standard?

davepermen
07-04-2002, 12:00 AM
first: hy nutty http://www.opengl.org/discussion_boards/ubb/wink.gif


Originally posted by Nutty:
erm.. because Cg is here and working right this moment, and gl2 isn't. As far as consumer hardware goes anyway.
yeah. okay, the current compilers are crap, but thats (i hope soon) not a topic. currently its the best cause only thing to use. BUT coding the stuff in assembler is not _THAT_ hard because most programs are rather short they can't get _THAT_ long anyways.
nontheless, its here, and for now its the best thing to have..

I dont think you understand precisely what Cg is. It's not an API. It's a language, and languages dont go out of date, just because some new hardware comes out.
yes a language, but a language wich is not yet working at its own standart. cg will change for every additional hardware with an additional feature. old cg code still works, but backwardcompatibility is not provided by cg. can't be. thats no fault of cg (except that its too early/useless because gl2 is what cg wants to be at the moment the hw supports it etcetc http://www.opengl.org/discussion_boards/ubb/wink.gif), its a fault of the hw. todays hw is too restricted (mainly "pixelshaders" wich you cant call yet pixelprograms, not even pixelfunctions.. more extended texture_stages..)

and i dont like to use a language wich is not standart.

Thats like saying we're gonna have to start coding Unix in Java cos C is out of date.
no its not. we would choose c# http://www.opengl.org/discussion_boards/ubb/smile.gif

Why is there no loops/branching, well probably because no-hardware at all supports it, except the P10 based cards, but these features will probably appear in the compiler pretty soon anyway, with the profile determining weather they can be used or not.
and the moment loops come in the first incompatible cg code will start. and versionconflicts are here again.. like html on webpages, like dll's in windows, like gl1.0,1.1,1.2,1.3,1.4 etc.. like dx. its nothing bether..

On current hardware vertex/fragment programs are not difficult, true, but Cg creates a common interface to all of them, and _including_ gl2 as well. When it eventually arrives, and is implemented on consumer class hardware. Theres nothing stopping you using Cg and OpenGL 2.0 _together_.
two parts: fragment programs of todays hw will not fit into GL_ARB_fragment_program, thats why nvidia is saving it for gl1.5.. we will have to code a fallback for gf3/4 (and <gf3 as well) manually i guess. so the common interface is ****ed up.. cg's common interface is as common as gl is, so, well, it doesn't help us there much (for the community i mean.. (hm?! http://www.opengl.org/discussion_boards/ubb/wink.gif))

second. why using cg on opengl2 hw? cg for gl2 will feature a lot of stuff that will not compile for non gl2 hw anyways, so its no difference than using gl2 language directly.
theres nothing stopping me from doing this, except its useless..

cg is cool. right now. but i dont see more than a (very cool) nvparse in it. not now, not for the future. sorry..
www.davepermen.net (http://www.davepermen.net) <<soon

davepermen
07-04-2002, 12:02 AM
Originally posted by Ninja:
2. When will opengl 1.5 be released?

after gl1.4? http://www.opengl.org/discussion_boards/ubb/smile.gif
when will gl1.4 be released? soon? i dont know, ask matt or cass, they possibly know it..

knackered
07-04-2002, 12:16 AM
...and still you people refuse to consider direct3d - all I can say is you must know an awful lot of people running linux/irix! http://www.opengl.org/discussion_boards/ubb/smile.gif I know game developers don't....

Nutty
07-04-2002, 03:07 AM
There is no pre-requisite that states all programming languages must have loops and branches. Nvidia said they will appear, but at this moment in time it's pointless putting it in. They'd rather get it out for ppl to use now.


No it doesn't. It creates an interface that will work on NVidia hardware in OpenGL,

I'm sorry, but it _does_ create a common interface. If you look at the DX side of things, it will work on any DX8 compatible hardware.

2ndly, there is nothing stopping ATI from releasing a profile for ATI_VertexProgram, or whatever it's called, before ARB_vertex_program.

And 2ndly I'm not refusing to consider D3D. Again using Cg has benefits there, as provided suitable fragment program profiles come out, you can use the exact same Cg shader for D3D code and OpenGL.

I really dont understand why soo many ppl bash it, just because it was created by a single company. I still think it's going to be a fair while b4 gl2 appears on nvidia/ati hardware. Once some decent common interfaces for vertex and fragment programs arrive, then Cg will work across D3D, and OpenGL on all hardware that supports the common interfaces. _AND_ even if they dont support the common interfaces, they can still make a profile for their vendor specific extensions.

Nutty

knackered
07-04-2002, 03:29 AM
Nutty, nvidia will not enable looping/branching until their hardware can support it, then they will enable it, even if other vendors aren't able to support it.
This just isn't right that NVidia decide this.

As for whether it's a language or not, I'd say it's a very limited script, not a programming language. Without branching and looping, you're basically feeding values into a script, which then can call a small subset of mathematic functions on those values, and output results. That's a programming language? In the same way HTML is a programming language? We'll have to agree to disagree on the usage of the phrase 'programming language'.

Nutty
07-04-2002, 04:13 AM
nvidia will not enable looping/branching until their hardware can support it, then they will enable it, even if other vendors aren't able to support it.

and?!? Well you could say it's not fair that GL2 will support features that other vendors can't support in hardware. It's the same situation. Will you call the ARB unfair, because gl2BlahBlahBlah is only implemented in hardware on certain cards?

As to weather it is a programming language or not is neither here nor there. It's based on C, with a few things removed. Does that suddenly make it not a language. I personally dont think so.


This just isn't right that NVidia decide this.

Why isn't it right? Nvidia are in a free country are they not? They have the right to do what they want. IF it's a bad mistake they'll pay for it by losing support. And perhaps a couple million dollars of R&D down the drain.

IS it fair that M$ dictate what goes in DX? IS it fair that SGI dictated the original OpenGL api all those years ago?

btw. please dont think I'm having a go. Just lots of ppl seem to bash Cg, so I thought I'd play devil's advocate, as not many ppl seem to be supporting it! http://www.opengl.org/discussion_boards/ubb/smile.gif

Personally I think it's good. Provided that we get fragment profiles for gl, and other vendors participate.

Nutty

GeLeTo
07-04-2002, 05:09 AM
I really dont understand why soo many ppl bash it, just because it was created by a single company. I still think it's going to be a fair while b4 gl2 appears on nvidia/ati hardware.
Maybe because of the posibility that gl2 will not appear on nvidia hardware. And nvidia will use Cg as an excuse not to support gl2. I would have been much more enthusiastic about Cg if nvidia did show some commitment to gl2. This can be much worse than glide vs. opengl because 3dfx at least did support opengl. The difference between pure gl2 and gl1.x+nvidia extentions is very big so we'll have to either use two different APIs or choose D3D. And I am not talking just about the shader stuff.

[This message has been edited by GeLeTo (edited 07-04-2002).]

richardve
07-04-2002, 06:17 AM
Originally posted by knackered:
...and still you people refuse to consider direct3d - all I can say is you must know an awful lot of people running linux/irix! http://www.opengl.org/discussion_boards/ubb/smile.gif I know game developers don't....

I've read some of your previous posts and I'm really wondering what you're doing here on an OpenGL forum..

Are you perhaps working for Microsoft?
Maybe you're the monkeyboy himself?
Or the king of nerds: Bill Gates?

Now please go away or I will send you a penguin with a red hat.

ash
07-04-2002, 06:26 AM
Originally posted by someone:
nvidia will not enable looping/branching until their hardware can support it, then they will enable it, even if other vendors aren't able to support it.



Originally posted by Nutty:
and?!? Well you could say it's not fair that GL2 will support features that other vendors can't support in hardware. It's the same situation. Will you call the ARB unfair, because gl2BlahBlahBlah is only implemented in hardware on certain cards?


The difference, obviously, is that OpenGL2 is ultimately controlled by the ARB, and the ARB doesn't (in principle) benefit from favouring any particular company, whereas nvidia does.

However there aren't any bonus points for fairness. Ultimately we just use whatever works best. The point is that evidence like this that nvidia is acting for the benefit of their own vested interests is evidence that they're not acting for the graphics community as a whole, which is perhaps an indication of how well Cg will work for us in the future.

Ash


[This message has been edited by ash (edited 07-04-2002).]

zed
07-04-2002, 09:53 AM
Originally posted by knackered:
...and still you people refuse to consider direct3d - all I can say is you must know an awful lot of people running linux/irix! http://www.opengl.org/discussion_boards/ubb/smile.gif I know game developers don't....

u will be very surprised, the number 1 game in the US at the moment is opengl only (neverwinter knights)
also a lot of top games this year have been opengl only, in fact 10-1 odds (u cant resist http://www.opengl.org/discussion_boards/ubb/smile.gif ) opengl only games have achieved the top position in the US more than d3d only ones.

knackered do u give in or should i say nee again?

Korval
07-04-2002, 11:50 AM
First of all, the OpenGL vs D3D has no place in a discussion of Cg.

As far as Cg is concerned... in many ways it is the right way to go if nVidia pulls it off correctly. OpenGL 2.0 has one fatal flaw: it is utterly useless. Not only does it not exist yet, but when it does no hardware will support it. It will be 2 to 3 years before we see GL 2.0 in hardware.

I have a card in my computer that fully supports Cg at this very moment. When nVidia releases the GeForce 5, (once again, if nVidia does it right), all of my compiled Cg code will work just fine on it. I'll download a new Cg compiler with new expanded functionality, but all the old Cg shaders will still compile. When the hardware is avaliable for it, Cg will provide GL 2.0 capabilities.

As long as the Cg compiler is fully backwards compatible (and provides the ability to compile somewhat complex shaders for older hardware to the extent that it is possible), Cg is a better solution than GL 2.0. Ultimately, the problem with GL 2.0 is that, while it is a nice future, it is a useless present and will not be very usable for the near future. Cg is here and mildly useful now; it will be here when the GL 2.0 shader is around, and it will still be useful.

As to the argument that nVidia is using Cg as a power-grab to reclaim the market... of course they are. They are caught between 2 organizations beyond their control: Microsoft and the ARB.

They can't control what GL 2.0 is simply because everyone on the ARB competes with them. They want to bring the market leader down, so they will do everything in their power to make GL 2.0 as difficult as possible for nVidia to use.

At the same time, Microsoft benifits by having multiple graphics card vendors in a good position, which is why the 1.4 Pixel Shader was, basically, written by ATi (to offset the fact that the 1.1 PS was written by nVidia). D3D 9's shaders don't provide any side in the market with an advantage.

Trapped between GL 2.0 being out of their control and D3D 9 not providing them the advantage D3D 8 did, they have one option: make their own language. In a way, Cg is a lot like D3D's shaders, only with nVidia in charge of the language. Also, it gives us, as users of the language, growing room, which is not what OpenGL 2.0's shaders are designed for.

Once you provide conditional branching and looping constructs at both the vertex and pixel levels (and they could use the exact same interface), really, there's nothing Cg couldn't do that isn't already part of the language.

Is this a blatently monopolistic move? Sure, since it is highly likely that none of their compeditors will be writing a Cg version. At the same time, there are worse companies who could monopolize the graphics card market. That's one of the reasons I don't mind that Microsoft has its monopolies: as long as they keep producing products I like to use and that are productive, I will continue to use them. And as long as nVidia continues to produce products that are of a high quality, I am willing to overlook their blatent power-grab.

As long as Cg is backwards compatible, and the language itself doesn't change too much (as I said, the only additions are adding looping and conditional branching syntax), it should be a better alternative than the vaporware that is GL 2.0.

knackered
07-04-2002, 01:01 PM
Originally posted by Korval:
First of all, the OpenGL vs D3D has no place in a discussion of Cg.
Yes it does. Cg is for d3d also, and it works very well on that API because it only has to communicate with a single interface, rather than lots. Cg for opengl is bad because it doesn't....basically....work. Until an ARB pixel shader extension becomes available, it's useless for any non-nvidia cards. In fact, until nvidia release a nvidia pixel shader profile, it's useless even on their hardware! What are you using Cg for, Korval? Transforming your light vectors into tangent space, and then kicking in with nvparse? http://www.opengl.org/discussion_boards/ubb/wink.gif


It will be 2 to 3 years before we see GL 2.0 in hardware.
The hardware is available now - or didn't you know? 3dLabs Wildcat VP.


I have a card in my computer that fully supports Cg at this very moment.
You have an nvidia card and you program in d3d then...


When nVidia releases the GeForce 5, (once again, if nVidia does it right), all of my compiled Cg code will work just fine on it.
That's good for you. Meanwhile we'll all be using the cut-down consumer 3dlabs cards, writing in gl2 shading language. 3dlabs have been bought by Creative, don't you know?


Once you provide conditional branching and looping constructs at both the vertex and pixel levels (and they could use the exact same interface), really, there's nothing Cg couldn't do that isn't already part of the language.
Ditto for OpenGL2.0 - difference being gl2 already has these crucial features.


Is this a blatently monopolistic move? Sure, since it is highly likely that none of their compeditors will be writing a Cg version. At the same time, there are worse companies who could monopolize the graphics card market. That's one of the reasons I don't mind that Microsoft has its monopolies: as long as they keep producing products I like to use and that are productive, I will continue to use them. And as long as nVidia continues to produce products that are of a high quality, I am willing to overlook their blatent power-grab.
Then that is very sad. Use d3d then.

The only thing that will stop me moving my attention fully towards d3d is opengl 2.0.
I've just finished reading the specs (and I've read the Cg specs too), and it looks like a developers dream.

Summary of my opinion on Cg: it is nice, but very limited, and controlled by a single vendor.

Sorry for sounding turse with you - but I'm still outraged at nvidia basically plagiarising a lot of 3dLabs work. Cg is a cut down version of the gl2 shader language.


[This message has been edited by knackered (edited 07-04-2002).]

folker
07-04-2002, 01:06 PM
Originally posted by Korval:
First of all, the OpenGL vs D3D has no place in a discussion of Cg.

As far as Cg is concerned... in many ways it is the right way to go if nVidia pulls it off correctly. OpenGL 2.0 has one fatal flaw: it is utterly useless. Not only does it not exist yet, but when it does no hardware will support it. It will be 2 to 3 years before we see GL 2.0 in hardware.

I have a card in my computer that fully supports Cg at this very moment. When nVidia releases the GeForce 5, (once again, if nVidia does it right), all of my compiled Cg code will work just fine on it. I'll download a new Cg compiler with new expanded functionality, but all the old Cg shaders will still compile. When the hardware is avaliable for it, Cg will provide GL 2.0 capabilities.

As long as the Cg compiler is fully backwards compatible (and provides the ability to compile somewhat complex shaders for older hardware to the extent that it is possible), Cg is a better solution than GL 2.0. Ultimately, the problem with GL 2.0 is that, while it is a nice future, it is a useless present and will not be very usable for the near future. Cg is here and mildly useful now; it will be here when the GL 2.0 shader is around, and it will still be useful.

As to the argument that nVidia is using Cg as a power-grab to reclaim the market... of course they are. They are caught between 2 organizations beyond their control: Microsoft and the ARB.

They can't control what GL 2.0 is simply because everyone on the ARB competes with them. They want to bring the market leader down, so they will do everything in their power to make GL 2.0 as difficult as possible for nVidia to use.

At the same time, Microsoft benifits by having multiple graphics card vendors in a good position, which is why the 1.4 Pixel Shader was, basically, written by ATi (to offset the fact that the 1.1 PS was written by nVidia). D3D 9's shaders don't provide any side in the market with an advantage.

Trapped between GL 2.0 being out of their control and D3D 9 not providing them the advantage D3D 8 did, they have one option: make their own language. In a way, Cg is a lot like D3D's shaders, only with nVidia in charge of the language. Also, it gives us, as users of the language, growing room, which is not what OpenGL 2.0's shaders are designed for.

Once you provide conditional branching and looping constructs at both the vertex and pixel levels (and they could use the exact same interface), really, there's nothing Cg couldn't do that isn't already part of the language.

Is this a blatently monopolistic move? Sure, since it is highly likely that none of their compeditors will be writing a Cg version. At the same time, there are worse companies who could monopolize the graphics card market. That's one of the reasons I don't mind that Microsoft has its monopolies: as long as they keep producing products I like to use and that are productive, I will continue to use them. And as long as nVidia continues to produce products that are of a high quality, I am willing to overlook their blatent power-grab.

As long as Cg is backwards compatible, and the language itself doesn't change too much (as I said, the only additions are adding looping and conditional branching syntax), it should be a better alternative than the vaporware that is GL 2.0.

You claim that "OpenGL 2.0 is not existing yet" and "no useful". Well, our software is supporting OpenGL 2.0 shaders on the real existing hardware (P10).

OpenGL 2.0 has the aim to set a standard and visions for the future instead of only reflecting existing hardware. Because of this, the OpenGL 2.0 functionality is a superset of all other existing shader languages. But of course, at any time, existing hardware easily can support only parts of OpenGL 2.0, so that OpenGL 2.0 is useful both today and tomorrow.

OpenGL 2.0 also has the aim to be an open standard across all hardware vendors including NVidia, ATI, 3dlabs, Matrox, etc. etc. etc.

I think it is worth pushing these aims and so pushing OpenGL 2.0.

barthold
07-04-2002, 01:11 PM
Korval,

1) OpenGL2 is not vaporware. It has support from multiple hardware vendors, and tons of
ISVs.

2) A lot of the functionality written in the OpenGL2 white papers has been implemented for the Wildcat VP as extensions to OpenGL 1.3. Real ISVs are using that today, creating some amazing visual effects.

3) The proposed direction for OpenGL2 encompasses much more than just a shading language. Again, see the white papers. Some other majorly important aspects are better memory management, more efficient data movement, and more control over synchronization for the application.

In contrast, Cg is only a shading language with less functionality than our GL2 extensions offer today (for example, a full fragment shader with looping and conditionals).

Barthold
3Dlabs

Nutty
07-04-2002, 01:25 PM
How many gamers have P10 based cards in their PC's?

Probably less than a handfull.

How many have DX8 compatible cards? Quite alot.

Which HLSL runs on DX8 compatible hardware now? Cg. Not GL2.

In the future GL2 will probably make more sense. But for some apparent reason ppl have got it into their heads that just because NV want developers to use Cg now, that it is going to destroy all the plans of GL2. What utter nonsense.

What about the fact that NV claim Cg is compatible with the DX9 shading language? IF this is true, then for developers supporting gl and DX9, they need only 1 set of shaders. Not 2 if they wanted to use DX9 HLSL and GL2's HLSL.

I simply dont understand what the big deal is. GL2 is an api, that happens to have a C based shading language, and for consumer class hardware it is no-where near ready. Cg is for developers to use now. To make writing vertex/fragment programs easier. Yes we're all aware that fragment programs aren't yet available for GL, but this hopefully will be remedied soon. IF other companies such as ATI, matrox etc.. dont get involved, then it's only going to hurt themselves in the long run. Unless they suddenly release GL2 or their own shading language which eclipses Cg.

IMHO.

Nutty

folker
07-04-2002, 01:29 PM
Some corrections of wrong statements:


Originally posted by Korval:
They can't control what GL 2.0 is simply because everyone on the ARB competes with them. They want to bring the market leader down, so they will do everything in their power to make GL 2.0 as difficult as possible for nVidia to use.


Obviously nonsense. Why should it be difficult for NVidia to use OpenGL 2.0? (Do you believe in aliens preventing NVidia from using the OpenGL 2.0 shader language? ;-)

In contrary, NVidia always says that they will support OpenGL 2.0 as soon as it is approved as standard by the ARB.


Trapped between GL 2.0 being out of their control and D3D 9 not providing them the advantage D3D 8 did, they have one option: make their own language.


Obviously wrong facts. The D3D9 shader language is the same as Cg. So NVidia didn't make their own language. Only their own implementation.

In fact, NVidia and Microsoft worked together for Cg == DX9 SL. This is the opposite of being trapped.

harsman
07-04-2002, 11:44 PM
First of all, I don't think Cg is so bad. It is a great move to make writing shaders easier for developers, nvparse cubed or something along those lines. However, the OpenGL 2 shading language is more forward looking and aims higher which I think is a good thing. The api playing catch up with the HW is what's been plagueing OpenGL for a few years now, and it should stop. However, while the shading language is an important feature of OpenGL 2.0 there's other stuff that I see no reasons why vendors wouldn't support. The superior memory/data management and better synchronisation are great, and really needed. The idea of a "pure" OpenGL with more functions moved to glu also appeals to me, I've never understood what stuff like edge flags and selection are doing in the api anyway. I honestly don't care what shading language I use as long as it is compatible across multiple vendors and I get to use the *other* good stuff from OpenGL 2.0 And please, no assembler level shading interfaces to shading. I've seen this advocated by several people at nvidia but I've yet to see any benefits listed other than "the API should expose lower level interfaces, Cg is the high level interface".

Dave Baldwin
07-04-2002, 11:52 PM
Originally posted by Nutty:


And 2ndly I'm not refusing to consider D3D. Again using Cg has benefits there, as provided suitable fragment program profiles come out, you can use the exact same Cg shader for D3D code and OpenGL.
Nutty


This is a suitable time to comment on another Cg 'unique selling point' - you can run the same shader on OGL and DX.

How could anyone not buy into this story - paradise - write once run anywhere. Worked for Sun so why shouldn't it work for nvidia with Cg?

How much effort goes into the shader part of a game/app and now much goes into the rest of the 3D part? Even if the shader part were portable between OGL and DX the remainder of the 3D segment of program is highly non portable and conveniently overlooked.

Take Doom 3 as a topical case in point. Fantastic graphics, but all achieved with one 20 or so line fragment OGL2 shader. Needless to say most of the effort in getting the superb lighting effects, etc. are in the 99% of the graphics code not in the fragment shader. How much porting effort to DX would have been saved by writing in Cg? 5 minutes, and that is being generous.

This is based on the current snapshot of Doom 3 and other games will no doubt use more shaders of higher complexity (I hope otherwise I might as well hang up my architects hat) but I still assert that the true value of shader portability between two otherwise very different APIs has very little real value or benefit. Great marketing story though.

Dave.
3Dlabs

Dave Baldwin
07-05-2002, 12:07 AM
Originally posted by folker:
Some corrections of wrong statements:

Obviously wrong facts. The D3D9 shader language is the same as Cg. So NVidia didn't make their own language. Only their own implementation.

In fact, NVidia and Microsoft worked together for Cg == DX9 SL. This is the opposite of being trapped.

If my book for Cg == DX9 means that a program written for Cg compiles unchanged on DX9 HLL.
Comparing both specs this is clearly not the case. Cg has types not present in DX9 HLL, DX9 HLL has loops just to name two blatant differences! Unfortunately the DX9 HLL spec is under NDA so I cannot go into any more details.

I asked Microsoft about this as well and their comment was nvidia are portraying more collaboration than there has been and they don't consider the languages the same and have no intension of supporting nvidia in this endeavor.

Dave.
3Dlabs

Robbo
07-05-2002, 01:19 AM
I am sure the next gen NVIDIA will be fully DX9 compliant.

folker
07-05-2002, 01:35 AM
Currently, we have the following situation regarding Cg:

If you use DX8: For NVidia hardware, you can use Cg. To use features of other hardware (e.g. ATI) you must use assembly vertex/pixel programs (because Cg only supports features of NVidia cards).

If you use DX9: You can use the DX9 shader language for all hardware. But to use different features of different hardware, you have to write different shaders.

If you use OpenGL: For NVidia hardware, you can use Cg. To use features of different hardware (e.g. ATI) you must use ATI extensions etc. etc.

So all in all, Cg is indeed very useful: Cg is a powerful new "nvparse" for NVidia hardware making it much easier in many situations to write shaders instead of using an ugly assembly language. No question. And since NVidia has a big market power, it is automatically somehow a standard (in the same way as many game companies support nv extensions, or in the same way as
many game companies used Glide some years ago). And this is all fine, and so Cg is a good thing making live easier, no question.

OpenGL 2.0 has different aims which are very much worth pushing: A future-oriented, open shader language standard for all hardware vendors.

It is very likely that basically all hardware vendors will use the OpenGL 2.0 shader language, because the current situation of having a jungle of properitary OpenGL extensions clearly reached its limits, and the demand for a standard shader language for OpenGL is growing every day. And OpenGL 2.0 shaders are the (only) open shader standard solving this problem.

And indeed, nearly all hardware vendors are stongly supporting OpenGL 2.0. And really a lot of software vendors.

So we will se a lot of OpenGL 2.0 support for many hardware vendors in near future. The Wildcat VP from 3dlabs is only the beginning...

folker
07-05-2002, 01:48 AM
Originally posted by Dave Baldwin:

This is a suitable time to comment on another Cg 'unique selling point' - you can run the same shader on OGL and DX.

How could anyone not buy into this story - paradise - write once run anywhere. Worked for Sun so why shouldn't it work for nvidia with Cg?

How much effort goes into the shader part of a game/app and now much goes into the rest of the 3D part? Even if the shader part were portable between OGL and DX the remainder of the 3D segment of program is highly non portable and conveniently overlooked.

Take Doom 3 as a topical case in point. Fantastic graphics, but all achieved with one 20 or so line fragment OGL2 shader. Needless to say most of the effort in getting the superb lighting effects, etc. are in the 99% of the graphics code not in the fragment shader. How much porting effort to DX would have been saved by writing in Cg? 5 minutes, and that is being generous.

This is based on the current snapshot of Doom 3 and other games will no doubt use more shaders of higher complexity (I hope otherwise I might as well hang up my architects hat) but I still assert that the true value of shader portability between two otherwise very different APIs has very little real value or benefit. Great marketing story though.

Dave.
3Dlabs



Doom 3 wouldn't be possible with Cg, because (in contrast to nvparse) Cg does not expose the same power as assembly and combiner shaders required by Doom 3.

Not talking that you cannot use Cg to implement the ATI code path for example.

Doom 3 is very easily possible with OpenGL 2.0 as Carmack demonstrated. One reason is that the P10 fragment shaders are much more flexible, and that OpenGL 2.0 easily exposes all the features required for Doom 3.

And as soon as another hardware vendor provides OpenGL 2.0 support (e.g. ATI?), Doom 3 will automatically run on this hardware using OpenGL 2.0 without having to write a new code path.

knackered
07-05-2002, 02:15 AM
Originally posted by Nutty:
How many gamers have P10 based cards in their PC's?

Probably less than a handfull.

How many have DX8 compatible cards? Quite alot.

Well then write shaders using nvparse/ati fragment shaders/vertex programs for those cards - dx8 compatible cards aren't capable of supporting complex shaders anyway, so writing asm shaders for them is easy...you of all people should know that Nutty. Cg is not going to help you here...as I say, until the ARB release a general pixel shader extension which is compatible with DX8 cards.
For the gl2 compatible cards, use the gl2 path...and write wholly better looking shaders.

I'm not really worried about what gamers have - I write simulations, and I decide what hardware to use! http://www.opengl.org/discussion_boards/ubb/wink.gif

Eric
07-05-2002, 03:02 AM
I don't mind reading posts from 3DLabs commenting on Cg but then I'd also like to hear people from NVIDIA (and ATI/Matrox) giving their opinion...

Anyway, I cannot see what the big deal is. Nutty is completely right: GL2.0 is useless right now, at least for the kind of work he's doing (Game Dev). No gamer has a P10 and I don't think this is about to change.

Now, for research work, I'd love to get my hands on GL2.0-enabled hardware but I can tell you I WON'T use GL2.0 in my apps right now (and I am not in the gaming industry just in case you were wondering).

Cg can be useful right now for anyone who targets NV products in general. It is up to other vendors to provide a profile for their own card. As Nutty said, if they don't then NVIDIA will lose some money (mind you they're loaded...) but if they do then it might be good for us while patiently waiting for GL2.0-CONSUMER-cards.

I have nothing against 3DLabs at all but the fact that they have a product that supports some features of GL2.0 is not enough for developers to use GL2.0 ! Typically managers will look at the user base before deciding on that and I am afraid the P10 is not exactly what I call "widely used" these days...

Anyway, I hope GL2.0 becomes a reality soon but meanwhile I'll have a go at Cg and why not even try DX9...

Regards.

Eric

davepermen
07-05-2002, 03:10 AM
now:
use cg
tomorrow:
use gl2

why?
one is nvidia only
the other is arb-gl

nvidia just tries to get its own hands in this.

Thaellin
07-05-2002, 04:24 AM
Some of you have been espousing Cg as a 'now' thing, and GL2 (eric, at least) as 'useless'. I'd have a different label for Cg: niche.

In order for Cg to be truly useful, multiple vendors would need to provide 'profiles' for their hardware. nVidia has tried to make this sound like the most reasonable thing in the world. What would the effect be on a competitor, though?

Say 'SpiffyGFX' is working on their next gen product. nVidia spews forth Cg as the savior of graphics programming. Now, at sgfx, to support this language, I need to pull developers and resources off my existing (mainstream) projects to write a profile for this thing. The net effect being a slow-down on my development cycle in order to support an unproven technology with uncertain impact and an unknown future. This /alone/ is enough to make me choose not to bother developing a CG profile.

Now, say I have extra developers twiddling their thumbs and I put them on the project. What advantage do I gain? By endorsing a competitor's development efforts, I not only lend credibility to that competitor, but I put myself in a catch-up position. As the sole proprietor of the language specification, nVidia is the only company with a native ability to fully support the 'latest' specification. Given the build-up prior to Cg's release (none), one can assume future specification updates will be similarly launched out of the blue. Note that nVidia really has little option on this point - if they publish the specification early, their competitors will know what nVidia's hardware plans are ahead of time.

So, we'll go one step further, and say my hardware development company has signed all the necessary paperwork to get information about this thing in time to implement it before it hits market. We'll even assume that nVidia plays nice and adheres to the advance specification even if their QA team turns up a hardware bug which makes it hard to support a new feature. Now I have a direct competitor specifying what I can support. If nVidia chooses to add some wacky event-driven scheme with callbacks and volume rendering, then I either get it in hardware or publish a card which gets reviews like "Supports CG, but the implementation sucks compared to the GeForceWallaWalla card."

This works the other way, too. Say my new card design has holy-grail-quality support for higher-order surfaces, but nVidia's doesn't. Guess what feature will NOT be in the Cg language?

So, unless nVidia develops the profiles FOR them, there will be no profiles for non-nVidia cards. So I (back in software developer land here) can support Cg on nVidia cards, but have to write special support in "Other API" for other cards. Or I could choose to just write the whole thing in "Other API" and save development dollars while still producing a kick-ass title.


In regards to OpenGL2.0 as 'useless', I find it amazing to think how quickly people have forgotten the OpenGL1.x development path. Until 2000 (or so), you could NOT count on a consumer-level card implementing thr full OpenGL path in hardware. It would have been a foolish assumption, and your game would have run like crap. Even today, you can't write general case OpenGL and assume it will run well - there is a fast path for each card, and straying from it bring penalties.

OpenGL2.0 will be no different in this respect. Some things will be fast, some things will be slow. Some features will force a partial software fallback, others a total software fallback. Some features will already be in hardware. and others will be implemented as soon as IHVs see an ISV demand for them. In short, your application optimization process will not be changed much, though the design process will. A well-designed application will also scale nicely across newer hardware while still performing well on the 'old' stuff we have available today.

Thanks for actually reading all that,
-- Jeff Threadkiller

folker
07-05-2002, 04:29 AM
Originally posted by Eric:
I don't mind reading posts from 3DLabs commenting on Cg but then I'd also like to hear people from NVIDIA (and ATI/Matrox) giving their opinion...


Both NVidia and 3dlabs are commenting heavily both Cg and OpenGL 2.0 in this discussion board (see for example the Carmack thread). They post immediately if they think something is worth saying... ;-)


Anyway, I cannot see what the big deal is. Nutty is completely right: GL2.0 is useless right now, at least for the kind of work he's doing (Game Dev). No gamer has a P10 and I don't think this is about to change.

OpenGL 2.0 is not only for the P10 board.
OpenGL 2.0 is an open standard, and there will be soon consumer cards supporting OpenGL 2.0.



Now, for research work, I'd love to get my hands on GL2.0-enabled hardware but I can tell you I WON'T use GL2.0 in my apps right now (and I am not in the gaming industry just in case you were wondering).


From the perspective of software vendors (especially middleware), you anyway have to support everything: OpenGL 2.0 because it will be a standard, Cg because NVidia hardware is important, DX9 HLSL because it will be a standard for DX9 etc.

(Our philosophy is to support everything which is important. Both OpenGL 2.0 shaders and Cg definitely are important.)



Cg can be useful right now for anyone who targets NV products in general.


Cg is not ready for being used in practise. No fragment shader suppport in OpenGL Cg, calling the Cg compiler exe at application runtime, some other minor problems, etc. We made some ugly experiences with the current Cg implementation in practise... But of course, NVidia is working on that, and this will also change in future, no question.



It is up to other vendors to provide a profile for their own card.


To be realistic, no competiting hardware vendor (e.g. ATI) will write Cg profiles, no question. They will use OpenGL 2.0 shaders.



As Nutty said, if they don't then NVIDIA will lose some money (mind you they're loaded...) but if they do then it might be good for us while patiently waiting for GL2.0-CONSUMER-cards.


NVidia won't loose money. Cg is very useful also if it is only supported by NVidia and only supports NVidia hardware. In the same way as nvparse.



I have nothing against 3DLabs at all but the fact that they have a product that supports some features of GL2.0 is not enough for developers to use GL2.0 ! Typically managers will look at the user base before deciding on that and I am afraid the P10 is not exactly what I call "widely used" these days...


The strong argument for OpenGL 2.0 is that it will be a standard accross hardware vendors.

And for software developers it is important to look into the future. If you don't start developing for the future early enough, you soon get problems of being behind.

Eric
07-05-2002, 05:25 AM
Thaellin, folker,

I thought I made myself clear when saying that I was waiting for GL2.0 and that in my opinion Cg would probably only be used during a transition period.

Now, you both seem to think that GL2.0 is there already.

Can you then answer this simple question: how do I start using GL2.0 on my GF4? OK, bad question, this is an NVIDIA card!!!! So, apart from the Wildcat VP, which graphics card can I buy this afternoon which will provide me with GL2.0 drivers?

...
...
...

That's what I thought...

Now, if as a developer I can not get my hands on such a card at the local PC World (kidding), how the hell will my clients do???

I still believe that it is too soon for GL2.0 to be used in real apps, that's all. And because of that, I think Cg is an acceptable alternative for (NV) cards when you are planning to create shaders (let's not forget that it is actually stupid to compare GL2.0 to Cg, like it is to compare DirectX to GL!)...

One thing I forgot to mention is that what I am really waiting for at the moment is GL1.4. Of course GL2.0 sounds nice and cool but I honestly think it is still too far away.

About the "OpenGL 2.0 will be a standard...", I agree. But then again, can you name a lot of people that would decide today to go for GL2.0 only?

Anyway, perhaps I'll be proven totally wrong (I don't mind) and we'll see GL2.0-enabled cards & software by XMas.... but I doubt it !

Regards.

Eric

[This message has been edited by Eric (edited 07-05-2002).]

GeLeTo
07-05-2002, 06:01 AM
Can you then answer this simple question: how do I start using GL2.0 on my GF4?
The memory management, synchronisation, most opengl objects, etc. can be supported by the GF4. The only substantial thing that can't be supported are the shaders. Not too much of a problem because:
1. NVidia can expose their combiners/asm shaders/Cg shaders through an interface similar to GL_GL2_shader_objects
2. NVidia can use a cut down version of the GL2 shader language similar in functionality to Cg
3. As some people mentioned the shaders are a tiny part of the whole 3D implementation. I personaly don't mind rewriting a few shaders. Translating from Cg to the gl2 shading language and vice-versa is quite straightforward. I would prefer #2 though.

So if NVidia decides to implement a gl2 subset for the GF4 the only difference between the NVidia and the gl2 codepaths will be just a few shaders. If they don't - I'll have to use lots and lots of extentions on top of GL1.4 to get the same functionality.

Thaellin
07-05-2002, 06:12 AM
Originally posted by Eric:
Thaellin, folker,

I thought I made myself clear when saying that I was waiting for GL2.0 and that in my opinion Cg would probably only be used during a transition period.


Hi Eric,
No, I hadn't really picked up on that. This begs a further question, however: if the Cg language is merely a transitional step, why should I invest time in learning it instead of a language with a longer life expectancy?



Can you then answer this simple question: how do I start using GL2.0 on my GF4? OK, bad question, this is an NVIDIA card!!!! So, apart from the Wildcat VP, which graphics card can I buy this afternoon which will provide me with GL2.0 drivers?


There are a few things to bring up here:

How do I start using Cg on my ATI Radeon8500, or Matrox Parhelia (sp?)? I can't - there are no profiles. When can I expect profiles? I can't. When can I expect OpenGL 2.0 support? Eventually.

Now, given that I value my time and cannot spend it getting used to a language that I will not be able to use extensively, Cg seems a bad decision.

I am not in nVidia's circle of friends, where Cg was discussed and implemented by ISVs prior to release on the market (nor have I a need to be *shrug*). As a developer who was 'suddenly' exposed to this shader language, I cannot instantly produce an application that benefits from Cg.

Given that I cannot produce a useful application utilizing Cg shaders right now, what is the point of learning it at all? Products which are already well into development will not adapt to Cg because it creates additional development risk (and only benefits a subsection of the market). Projects that are early enough in the cycle to be designed to account for Cg will (in all likelihood) be in development for 10-20 more months. With that release window, one can afford to look at OpenGL2.0 as a better solution, since it will work for 'all' cards.

I just don't see it happenning.



One thing I forgot to mention is that what I am really waiting for at the moment is GL1.4. Of course GL2.0 sounds nice and cool but I honestly think it is still too far away.


Depends on the scale at which you're working. I'm definitely looking forward to 1.4 and standardized shaders to play with, but realistically, GL2 is where I need to focus.



Anyway, perhaps I'll be proven totally wrong (I don't mind) and we'll see GL2.0-enabled cards & software by XMas.... but I doubt it !


The spec will have to be formally adopted first... Or at least the extensions. It's been an essentially stable concept for quite a while, though. I'd be rather surprised if hardware in development right now did not take this direction into account.

However, as I said, the new GL may not be widely accelerated for a while. At worst, however, it should be a slight performance hit (this assuming you only use the SL). And it will get better in newer cards. Just like pure OpenGL applications 'magically' improved when T&L went into silicon, they will improve when more of the GL2 specification moves into hardware.

-- Jeff

Nutty
07-05-2002, 06:15 AM
wow.. alot of comments.


For the gl2 compatible cards, use the gl2 path...and write wholly better looking shaders.

Personally I dont think a better language == better looking shaders. The source code may be more readable, but thats all. 2ndly. There are no gl2 compatible consumer cards now, and there won't be for probably at least 6 months, maybe even a year in the consumer graphics card market. From rumours I've seen NV is not going to support it until NV35. Thats only a rumour tho.


I'm not really worried about what gamers have - I write simulations, and I decide what hardware to use!

Well We're discussing the merrits of Cg. Created by Nvidia. And Nvidia is primarily a consumer 3d accelerated graphics card company, and the majority of their cards are used for playing games, so I dont think it's fair to comment on Cg's use in other industries.


OpenGL 2.0 is not only for the P10 board.
OpenGL 2.0 is an open standard, and there will be soon consumer cards supporting OpenGL 2.0.

How soon is soon? IF you're talking this year, I'd wage money on you being wrong.

At the end of the day you use the tools that gives you good results that you like. IF a new compiler that produced much better quality code, and a gl fragment profile came out tomorrow. I'd use that from now on. I wouldn't even bother using the low-level extensions.

But anyway, it's clear alot of ppl think it's a bad move on nvidia's part. Fine, dont use it then. But if you're suddenly expecting GL2 to appear any day on nvidia/ati/matrox hardware, then you're in for a long wait.

IMHO http://www.opengl.org/discussion_boards/ubb/smile.gif

I'm sure Matt said it would be a while until gl2 is supported on nv cards too.

Nutty

folker
07-05-2002, 06:22 AM
Originally posted by Eric:
Thaellin, folker,

I thought I made myself clear when saying that I was waiting for GL2.0 and that in my opinion Cg would probably only be used during a transition period.

Now, you both seem to think that GL2.0 is there already.

Can you then answer this simple question: how do I start using GL2.0 on my GF4? OK, bad question, this is an NVIDIA card!!!! So, apart from the Wildcat VP, which graphics card can I buy this afternoon which will provide me with GL2.0 drivers?

...
...
...

That's what I thought...

Now, if as a developer I can not get my hands on such a card at the local PC World (kidding), how the hell will my clients do???

I still believe that it is too soon for GL2.0 to be used in real apps, that's all. And because of that, I think Cg is an acceptable alternative for (NV) cards when you are planning to create shaders (let's not forget that it is actually stupid to compare GL2.0 to Cg, like it is to compare DirectX to GL!)...

One thing I forgot to mention is that what I am really waiting for at the moment is GL1.4. Of course GL2.0 sounds nice and cool but I honestly think it is still too far away.

About the "OpenGL 2.0 will be a standard...", I agree. But then again, can you name a lot of people that would decide today to go for GL2.0 only?

Anyway, perhaps I'll be proven totally wrong (I don't mind) and we'll see GL2.0-enabled cards & software by XMas.... but I doubt it !

Regards.

Eric

[This message has been edited by Eric (edited 07-05-2002).]

I agreed with you completely, Cg is and will be useful for NVidia cards. But also note that the current Cg implementation is not ready for real-world applications yet - we worked with Cg and there are some issues which have to be fixed (see my last posting).
I expect that they will be fixed much quicker than there will be OpenGL 2.0 support on consumer cards. But I disagree with the picture that Cg is ready to be used, and OpenGL 2.0 shaders are in far future.

In fact, support for OpenGL 2.0 shaders is already there. At the moment only for P10 cards, right, but obviosly this will change in near future. Note how overwhelming the commitment of hardware vendors to OpenGL 2.0 is. Also note that for example ATI - a company which produces consumer hardware - heads the OpenGL 2.0 working group. So what does us tell this?

And as mentioned in my last posting: As software developer you must look into the (near) future. If you start supporting and using OpenGL 2.0 shaders if every consumer has an OpenGL 2.0 capable 3d card, you are too late. So I am sure that our decision of starting working with OpenGL 2.0 was definitely the right decision (not mentioning that I like the OpenGL 2.0 concepts ;-)

BTW, as sample, Doom3 wouldn't be possible with Cg at all. But Doom3 runs already with OpenGL 2.0 shaders.

Some comment: I don't like the discussion "Cg against OpenGL 2.0" at all. Cg and OpenGL 2.0 have different aims. You don't have to decide between Cg or OpenGL 2.0, this would be silly. And of course, Cg is very useful (everyone who had to write assembly shaders will immediately agree with that). But OpenGL 2.0 is also important, and OpenGL 2.0 will be more and more important in the future. I am somehow surprised that some people seem to want to make down OpenGL 2.0 by claiming for example "OpenGL 2.0 is not existing and therefore not relevant" or "we don't need OpenGL 2.0 because there is Cg".

folker
07-05-2002, 06:34 AM
Originally posted by Nutty:
How soon is soon? IF you're talking this year, I'd wage money on you being wrong.
Nutty

ATI is producing consumer cards. And ATI (not 3dlabs) is heading the ARB OpenGL 2.0 working group. And ATI has no high-level shader language for OpenGL yet, but urgently needs one.

Because of this, I assume that ATI will support the OpenGL 2.0 shader language within this year.

How much money do you bet? ;-)

Eric
07-05-2002, 06:38 AM
Thaellin,

My reason for investing some time in Cg (to be honest I haven't found this time yet http://www.opengl.org/discussion_boards/ubb/wink.gif) is that NVIDIA may releases a profile for GL2.0 when it becomes available so it shouldn't be a waste of time (and if it is, that can't be worse than when I decided to use Quickdraw3D one month before Apple decided to drop it for OpenGL http://www.opengl.org/discussion_boards/ubb/wink.gif)...

Meanwhile, it enables me to experiment things while waiting for GL1.4 and then GL2.0.

Now, I must say that I fully understand the issue you have regarding the fact that other vendors are unlikely to produce profiles for Cg but I have to admit that I mainly develop for NV cards (well, the code runs on all targets but it takes benefit of NV cards only).

Anyway, the thing that bothers me most is that people keep bashing at Cg while nobody forces them to use it: if they don't want to use it, then they can wait for GL1.4 and GL2.0....

Regards.

Eric

Eric
07-05-2002, 06:50 AM
Originally posted by folker:
I am somehow surprised that some people seem to want to make down OpenGL 2.0 by claiming for example "OpenGL 2.0 is not existing and therefore not relevant" or "we don't need OpenGL 2.0 because there is Cg".

Fully agreed.

Actually my point of view is more like "OpenGL 2.0 is not available yet while Cg is".

As Nutty said, if something much better than Cg was to be released tomorrow, I would probably use it. If GL1.4/2.0 was released tomorrow, I'd use it straight away. My problem really is that I do not have anything to start coding in GL2.0. So I'll give a go at Cg.

One last thing: I am not really waiting for everyone to have a card that supports GL2.0 but I would like to see several cards from several vendors supporting it before comitting to write only GL2.0 code (mind you I guess we'll always have a GL1.x fallback anyway).

Regards.

Eric

folker
07-05-2002, 07:10 AM
Originally posted by Eric:
Fully agreed.

Some remark: I didn't address that to you. And also most people are discussing seriously about this topic.

folker
07-05-2002, 07:20 AM
Originally posted by Eric:
One last thing: I am not really waiting for everyone to have a card that supports GL2.0 but I would like to see several cards from several vendors supporting it before comitting to write only GL2.0 code (mind you I guess we'll always have a GL1.x fallback anyway).

Even if as first step only ATI hardware also supports OpenGL 2.0 shaders, than it is very important to support it: Then OpenGL 2.0 is the high-level shader language you can use to write shaders for ATI hardware, in the same way as Cg is the high-level shader language you use to write shaders for NVidia hardware.

Ant note - as mentioned already - that ATI (not 3dlabs) is heading the ARB OpenGL 2.0 working group.

When I read the first OpenGL 2.0 specs, I thought "really cool, but it will take years before we get first hardware for it". Then, some few month later, 3dlabs announced the P10 hardware. Now, many say "OpenGL 2.0 is cool, but it will take a long time until consumer cards will support OpenGL 2.0". Then, some few month later... ;-)

It seems that OpenGL 2.0 will be very relevant sooner than may people can imagine... ;-)

andreiga
07-05-2002, 07:46 AM
I really hate people who consider that if they will not program in GL2 the world will not function anymore, like Eric. BTW, if you're not programming games your thoughts will not matter much in the eyes of marketing guys (the most important guys in company).
In general this is a normal creation process of new products (if at NVidia the process is very different than they have big problems, like 3DFX): some ideeas are generated (from different sources), than are filtered (to see if they corespond to the firm objectives and strategies), concept products are generated and one is selected, an economic analysis is generated, ONLY NOW THE TECH GUYS ARE MAKING A PRODUCT (with some input exception on the first phase), a name and package is given, it is tested on the market and the final phase is the mass production (these are the basic phases).
So, you see that Matt and the others tech guys (from 3DLabs, ATI also) receive a list of the specs (in marketing terms based on clients wishes and company resources) of what to implement in the next product.
Matt, the world is not spinning around you so please don't "threat" us with your resignation.
I also hate people who blame CG for existing and blame NVidia for trying to do something else. Remember, ISVs want a standard to program less and thus have lower costs, but in any other domain than IT the innovation is based on competition and NOT ON STANDARDS.
Microsoft is not loved by many (including me), but they are VERY GOOD at given people what they want (and I mean to most people). Me, you and other guys belonging to a "specific club" of programmers don't matter in their eyes. If marketing guys from Microsoft and tech guys from NVidia will form a company than we will see very nice products.
One last thing, Carmack is in marketing terms an "opinion leader". So he is very, very important in the managers eyes, even if Matt doesn't agree with him (JC is more important than Matt).

PS I'm sorry if I've upset some people but I thought some things had to be told.

Andrei

knackered
07-05-2002, 12:43 PM
I agree that Cg is a high level abstraction of nvparse. Maybe one day I'll translate my nvparse scripts to Cg, but it seems a little pointless, seeing as though I know exactly how to use the ASM now.
I'm trying to persuade my employers to buy a Wildcat VP to experiment on, and maybe if 3dlabs hurry up with GL2 I may have a better argument. My employers seem to have an unhealthy interest in the new Matrox card, simply because it supports 3 displays....whereas the VP supports just 2. I personally don't see the importance of 3 displays - it's hardly a bloody CAVE is it?

BTW, Nutty has stated in previous posts that he uses OpenGL as a hobby, and for prototyping. I believe that in his professional capacity he doesn't use OpenGL.

Nutty
07-05-2002, 12:53 PM
That is indeed true, at this current moment I'm a console coder. Nowt wrong with a healthy interest in gl tho. http://www.opengl.org/discussion_boards/ubb/smile.gif

Korval
07-05-2002, 01:34 PM
BTW, if you're not programming games your thoughts will not matter much in the eyes of marketing guys (the most important guys in company).

I don't care about the eyes of the marketing department. The fact that marketing drives products more than building the right product is one of the failings of capitalism. Incidentally, it is also completely unimportant to the discussion at hand.


Matt, the world is not spinning around you so please don't "threat" us with your resignation.

The point of Matt's comment was not that he believed that the world is "spinning" around him. The point is that he believes strongly that the driver layer is not where multipass belongs, nor is it even reasonable to implement it there. He believes this strongly enough that he is unwilling to work on the product if that is the final decision from the ARB (which, btw, has nothing to do with a marketing department). That was all he was trying to say.


One last thing, Carmack is in marketing terms an "opinion leader". So he is very, very important in the managers eyes, even if Matt doesn't agree with him (JC is more important than Matt).

To me, Carmack is a jerk on a power-trip. He's a fine programmer, but I personally know of at least two other programmers who are probably more skilled (and one who is certainly more experienced). BTW, before you ask, neither of them are me.

His mere .plan files are gospel to thousands of programmers who look to him as some kind of god. A sentence or two can spark debates in hundreds of forums (like this one). As though his opinions have any legitimate reason to carry more weight than anyone elses. I've seen some of his code (Quake) and I wasn't impressed.

And, worse than anything else, he cultivates this fame. He knows that he is looked upon as a great programmer, regardless of his actual talent. If he didn't want to be well known and seen as an idol, he could simply be more private. After all, nobody makes him publish a .plan file.

The only thing Carmack has over most other programmers is a company&publisher willing to let him do anything he wants.

I don't begrudge the preferential treatment that he gets from graphcis companies; graphics cards sell (or don't sell) based on how well they play his games (which is very sad for the graphics card industry, if you think about it).

Sorry, but I had to get that off my chest.

knackered
07-05-2002, 01:45 PM
Carmacks reverse was a fantastic bit of lateral thinking, don't you think, Korval?

mcraighead
07-05-2002, 02:11 PM
Whoah. My comments have now been turned into "threats".

All I'm saying is that I generally disagree with most aspects of most of the OpenGL 2.0 proposals [not just the SL proposal], and so if that turns out to be the direction OpenGL goes, I think I'd rather not work on OpenGL.

I think the shading language should sit outside the API, not inside it.

I think transparent multipass should be provided by scene graphs and shader compilers, not the API.

I think the API needs to expose a low-level assembly language for both vertices and fragments, so that people can use [or even develop] the shading language of their choice. It may also make sense to standardize a high-level language, but not to the exclusion of low-level languages.

I think the proposed GL2 APIs are clumsy. First I have to create a bunch of objects, then "compile" my shaders, then attach them to another object, then "link" them, and only then I can use them? I don't see what this gains over GenProgramsARB, ProgramStringARB, BindProgramARB except confusion and more API calls. The analogy to how a C compiler works seems rather stilted.

I think the the "async" stuff probably doesn't even belong in OpenGL in the first place, except for a few very tiny bits (and even then why reinvent the NV_fence API?).

I think the object management proposals make things worse, not better. Some parts of it would probably be a little nicer, but still a bad idea because now you end up with two incompatible mechanisms in one API. Some parts of it I disagree with entirely (the object "policies", all of sections 4 and 5).

In other areas, I think the proposals do not interface in very compatible ways with the rest of OpenGL. For example, they reinvent the wheel on generic attributes for vertex programs, when the ARB_vertex_program interface for generic attributes is already quite sufficient, and when the ARB already discussed that topic for an extended period of time in the ARB_vertex_program working group.

For reasons such as these, the proposals by and large do not reflect my personal vision [as an engineer] of where OpenGL should go. So if they were adopted, I would probably not want to be an engineer working on OpenGL, since I would consider the API to have been (for the lack of a better word) "ruined".

Yes, there are things that _are_ good about the proposals. But I find that they are few and far between.

If I haven't been sufficiently clear, this is purely an engineering assessment of things, ***not*** an "I-want-my-NVIDIA-stock-to-go-up" assessment.

So (please) stop calling this some sort of "threat".

- Matt

folker
07-05-2002, 03:48 PM
Originally posted by mcraighead:
Whoah. My comments have now been turned into "threats".

No, it is only a (senseful) technical discussion. http://www.opengl.org/discussion_boards/ubb/wink.gif



All I'm saying is that I generally disagree with most aspects of most of the OpenGL 2.0 proposals [not just the SL proposal], and so if that turns out to be the direction OpenGL goes, I think I'd rather not work on OpenGL.

I think the shading language should sit outside the API, not inside it.


Main job of a driver is to hide hardware implementation details.
In the same way as different C / C++ compilers hide the CPU hardware details, I think it is very natural that a C-like shader language hides the GPU shader hardware details.



I think transparent multipass should be provided by scene graphs and shader compilers, not the API.


Main job of a driver is to hide hardware implementation details.
As consequence, I think if the same functionality can be executed in two-pass on card A and in one-pass in card B, then this should be hidden by the driver.



I think the API needs to expose a low-level assembly language for both vertices and fragments, so that people can use [or even develop] the shading language of their choice.


Main job of a driver is to hide hardware implementation details.
So a hardware-specific assembly shader language shouldn't be the primary interface of an driver API, it should be only an option if a developer want to optimize for a particular hardware.

If you mean a hardware-independent assembly shader language, this is at the end equivalent to a C-like shader language, it only has a different syntax. But a C-like language is simply easier to read and write.

Your argument does not hold, because in both cases people can use and develop shader languages of their choice if they really want. But most developers don't desire to implement their own languages, they desire a standard language.



It may also make sense to standardize a high-level language, but not to the exclusion of low-level languages.


OpenGL 2.0 does not exclude low-level languages. OpenGL 2.0 works smoothly together with low-level languages.

Who created this rumor coming up again and again in publicity, but definitely being wrong?



I think the proposed GL2 APIs are clumsy. First I have to create a bunch of objects, then "compile" my shaders, then attach them to another object, then "link" them, and only then I can use them? I don't see what this gains over GenProgramsARB, ProgramStringARB, BindProgramARB except confusion and more API calls. The analogy to how a C compiler works seems rather stilted.


The reason for linking multipe program objects together is that may developers have expressed their wish of being able to link several shader sub-programs together to one program.

But I think such API details don't touch the functionality of the API, and so such details are no fundamental argument for or against OpenGL 2.0.

Especially, if you complain about two or three additional function calls: Your suggestion of providing a assembly low-level interface for shaders requires much more work for the developer to use it. What do you want: A easy to use interface, or a complex one? ;-)



I think the the "async" stuff probably doesn't even belong in OpenGL in the first place, except for a few very tiny bits (and even then why reinvent the NV_fence API?).


The NV_fence API of Nvidia shows that the "async" stuff is very useful. And also here, many developers desire a better control over rendering timing.

If you prefer the NV_fence_API or the OpenGL 2.0 asynch API is an detail I think.



I think the object management proposals make things worse, not better. Some parts of it would probably be a little nicer, but still a bad idea because now you end up with two incompatible mechanisms in one API. Some parts of it I disagree with entirely (the object "policies", all of sections 4 and 5).


The need for a better object management is obvious. Today, you have the choice between display lists (which are hard to optimize by the driver in the same way as vertex buffers) and for passing the data again all the time. The vertex buffers of Direct3D demonstrate that it can be done better.

Policies address a problem mentioned by many developrs. But it is a valid discussion if policies shouldn't be simply hints which I would prefer.

Aux buffers are very useful for many effects.



In other areas, I think the proposals do not interface in very compatible ways with the rest of OpenGL. For example, they reinvent the wheel on generic attributes for vertex programs, when the ARB_vertex_program interface for generic attributes is already quite sufficient, and when the ARB already discussed that topic for an extended period of time in the ARB_vertex_program working group.


I agree in principle that it makes sense to use the results from ARB_vertex_program attributes. But currently I don't see a problem here, and this are uncritical API details anyway.



For reasons such as these, the proposals by and large do not reflect my personal vision [as an engineer] of where OpenGL should go. So if they were adopted, I would probably not want to be an engineer working on OpenGL, since I would consider the API to have been (for the lack of a better word) "ruined".


I am especially woundering that you have an vision about OpenGL which seems to contradict the idea that OpenGL should hide hardware implementation details.

For example, regarding the mulipass approach: You don't say that there are better approaches to hide hardware details, instead - if I understand you correctly - you say that it is not the job of OpenGL to hide such hardware implementation details. This makes me woundering.



Yes, there are things that _are_ good about the proposals. But I find that they are few and far between.

If I haven't been sufficiently clear, this is purely an engineering assessment of things, ***not*** an "I-want-my-NVIDIA-stock-to-go-up" assessment.

So (please) stop calling this some sort of "threat".

- Matt



[This message has been edited by folker (edited 07-05-2002).]

andreiga
07-05-2002, 03:50 PM
Matt, I'm sorry if I did upset you but some things have to be known very well. Such as, even if this is an OpenGL programming forum people should wake up and see that the
CG vs GL2 war will be decided by economists (maybe not all but most of it).

I really think that a creation of a general shading language (where you can program and not worry about current and future APIs, graphics cards) is not in the best interests
of the graphics cards companies.

Why? Because they will not be able to differentiate the future products from the current ones and to make really different attributes comparing to the competition, and I mean in the average client eyes (not in the specialist eyes).

Chip speed, memory bandwith and capacity will be factors of marketing promotion but there have to be others, so you will also see special features (promoted on the package) that will not work with every card. Therefore these special features will break the common shading language.

Anyway, this post was useful at least only to know that NVidia will not support GL2 in the near and medium future (at least the drivers will not support it because, if I don't make a mistake, Matt is the chief over this division).

mcraighead
07-05-2002, 03:52 PM
Originally posted by andreiga:
Matt is the chief over this division

Okay, this is just downright silly... I'm an engineer, not a manager.

- Matt

evanGLizr
07-05-2002, 04:17 PM
Originally posted by mcraighead:
I think the shading language should sit outside the API, not inside it.

I think the API needs to expose a low-level assembly language for both vertices and fragments, so that people can use [or even develop] the shading language of their choice.

Frankly speaking, the only advantage I see in exposing umpteens of low level assemblers in the API is so NVIDIA and MS can claim (groundless) IP issues on them and delay even more any 'advance' in OpenGL (if you can call advance to suggest the app writers that they should code for an individual brand and generation of graphics chip).

Exposing the HLL in the api is The Right Thing To Do (TM):
- Exposing the assemblers would be like exposing the DMA command data format exclusive of each graphics card, with the excuse that the application could then optimise the command transfers.
- Suggesting the app writers to code for a specific brand and generation of graphics chip is very short-sighted. It's much better to have them programming in a generic language, so their programs will work across all the boards, and will benefit from improvements in the hardware and in the compiler itself. So there's little point in exposing the specific ASM.
- And even yourself recognise that a HLL standard is desirable, so why forcing all the IHV companies to do an HLL to LLL compiler and then a LLL compiler? Because you will be aware that even the LLL you expose to the programmer does not match the real language the machine understands, either because you don't expose everything, or because the driver always optimises better than the app writer (as it has first-hand knowledge of the internals), or maybe even because 'features' in the hardware make the driver modify the app providen program. With GL2, you just have to expand the sample compiler to output machine-dependent opcodes and load them directly into the graphics card.


Originally posted by mcraighead:
I think the proposed GL2 APIs are clumsy. First I have to create a bunch of objects, then "compile" my shaders, then attach them to another object, then "link" them, and only then I can use them? I don't see what this gains over GenProgramsARB, ProgramStringARB, BindProgramARB except confusion and more API calls. The analogy to how a C compiler works seems rather stilted.

If you let the compiler do the register allocation of vertex input and vertex output-fragment input registers, you need a linkage phase, as you may want to use a vertex shader with different fragment shaders or viceversa.
Not letting the compiler do the register allocation and forcing the app to use arbitrary fixed registers (and again implementation dependent), would be another short-sighted decision, as you would preclude important optimisations on argument packing, for example, or even worse, move the responsability to the application writer.

I haven't read the specs on the sync & object management in depth, so I cannot comment on them, but in my opinion the current object management in opengl (the dual path GenLists/GenTextures and/or app generated ids) is braindead broken: why on earth would you like to let the app select the id for a display list, when it's much more efficient to let the driver generate a handle?.

mcraighead
07-05-2002, 05:07 PM
folker,

You are repeating this whole mantra about "hiding" vs. "exposing" implementation details over and over and over, but this really misses the point. All features expose some details and hide others. The real question is which are the _important_ ones to hide, and which are the _important_ ones to expose.

Your approach seems to be: if it _can_ be hidden, it _should_ be hidden.

I've already discussed transparent multipass, but I think it is a really counterproductive idea. It dodges the real question, which is: "Are limits defined or undefined?" For current hardware, it is impractical for either capability or invariance reasons. For future hardware, it remains unclear how practical it is, but on the other hand it may even be unnecessary.

As for low-level languages, you should be aware that 3Dlabs has made it very clear that they disagree with the idea that a standardized vertex and fragment program assembly language should be made part of OpenGL. Although unrelated IP issues made the discussion moot, NVIDIA's position was that ARB_vertex_program should be a required 1.4 feature, and 3Dlabs's position was that it should be optional. So, putting things together, 3Dlabs wants to standardize a HLL as a core part of OGL, but keep a standard assembly language out of core OGL. So it would not at all be erroneous to suggest that "3Dlabs's OpenGL 2.0 proposals exclude low-level languages". Certainly, they don't at all interoperate with the ARB_vertex_program standard API for creating and binding programs.

I think your discussion of "linking" shaders misses the point. There is now a standard API for how vertex and fragment programs get bound to a context: you call ProgramStringARB to load a program, and you call BindProgramARB to select a current program. This API works and is now the standard. The 3Dlabs proposed API is more complex for even very simple usage models. For example, I believe if you want to mix and match different vertex and fragment programs, you have to build a new program object, attach the objects, link it, etc. It reminds me of D3D's clumsy mechanism where you had to attach your surfaces to one another to get a mipmap chain or a Z-buffer. If all you want is code reuse, there are other ways to accomplish the same thing.

The async stuff is ill-designed and overblown as proposed, I think. It's not just a matter of little details.

For object handles, sure, it might be nice if OpenGL had done opaque handles rather than app IDs from day one. But OpenGL didn't do that. Changing things leaves you with two incompatible APIs, and that really doesn't benefit anyone.

But I was actually referring here to the memory management paper. I think "Direct Access" is a flawed idea, for one. The policies, as provided, are not powerful enough to do anything useful, but making them powerful enough to do anything useful would require undoing a very wise design decision OpenGL made a long time ago, which was to have memory management driver-controlled, not app-controlled. And the vertex array objects are likewise poorly designed, I believe. (I'm especially annoyed by the addition of the separate index arrays for each attribute, which is something OpenGL has wisely avoided in the past.)

This forum is not really a good place to have these kinds of discussions, unfortunately.


Also, please, let's avoid the conspiracy theory angle on things. The idea that exposing assembly languages is a scheme to *prevent* progress in OpenGL is, well, ludicrous. Likewise, the idea that NVIDIA doesn't want OpenGL to advance [which has been suggested by numerous people] is likewise erroneous. I, for one, wrote up the original NVIDIA OpenGL 1.4 proposal.

Of course, it doesn't help to have 3Dlabs people on these boards feeding those conspiracy theories with such comments as that NVIDIA has supposedly "road-blocked at every point" -- which is patently absurd. There are a lot of comments about the NVIDIA "marketing machine", but if anyone has been doing the marketing here, it's 3Dlabs...

It's likewise amazing to see the conspiracies that people cook up about what Cg is. Is it that hard to believe that Cg is just a tool?

- Matt

folker
07-05-2002, 07:19 PM
Originally posted by mcraighead:
folker,

You are repeating this whole mantra about "hiding" vs. "exposing" implementation details over and over and over, but this really misses the point. All features expose some details and hide others. The real question is which are the _important_ ones to hide, and which are the _important_ ones to expose.

Your approach seems to be: if it _can_ be hidden, it _should_ be hidden.


Yes, indeed, of course. I think it is the idea of a driver interface that if a hardware implementation detail can be hidden, it should be hidden. So that the application does not have to implement different code pathes for each hardware. What else is the job of a driver?



I've already discussed transparent multipass, but I think it is a really counterproductive idea. It dodges the real question, which is: "Are limits defined or undefined?" For current hardware, it is impractical for either capability or invariance reasons. For future hardware, it remains unclear how practical it is, but on the other hand it may even be unnecessary.


It definitely addresses future hardware, no question. And I agree, since it is not implemented yet, it is unclear how practical it is. Therefore I also think it is not the time yet to define transparent-multipass as standard.

Maybe indeed it turns out that it is not practicable. Currently I don't see a problem with it yet, and it seems to be an interesting idea. But of course, maybe there are better ideas.

But the important point is the following: Should we target to hide hardware implementation details if possible? It is a difference to say "time has not yet come", "it is not possible" or "there is a better approach" on the one side, or "we don't want to hide hardware implementation details" as I understand your point of view.

Of course, for a hardware and driver developer the easiest way is to simply directly expose hardware limits. ;-)
But isn't it worth do invest a little bit time and hardware to transform the "strict hardware limit restriction" to a "non-linear performance decrease behaviour"?



As for low-level languages, you should be aware that 3Dlabs has made it very clear that they disagree with the idea that a standardized vertex and fragment program assembly language should be made part of OpenGL. Although unrelated IP issues made the discussion moot, NVIDIA's position was that ARB_vertex_program should be a required 1.4 feature, and 3Dlabs's position was that it should be optional. So, putting things together, 3Dlabs wants to standardize a HLL as a core part of OGL, but keep a standard assembly language out of core OGL. So it would not at all be erroneous to suggest that "3Dlabs's OpenGL 2.0 proposals exclude low-level languages". Certainly, they don't at all interoperate with the ARB_vertex_program standard API for creating and binding programs.


Maybe I simply misunderstood what "exclusion" means (I thought you ment it that it is not possible to use a assembly language). Furthermore, I misunderstood that you were talking about a standard assembly language (in the ARB meetings, the discussion regarding OpenGL 2.0 were hardware specific assembly programs).

Trivially it is no problem that every vendor can define hardware specific assembly languages as extensions. And the OpenGL 2.0 proposal does not include a standard assemble language, but where is the problem? (We had the discussion between C-like languages and assembly languages already in this discussion board.) I cannot see that this is an agrument against OpenGL 2.0.

If a standard assembly language is really wanted, this can be done by an separate extension without any problems. Moving separate things into separate extensions is the usual strategy of the ARB. And if a standard assembly language will be part of 1.4, this will mean a little bit work for 3dlabs and others, but it won't kill anyone.



I think your discussion of "linking" shaders misses the point. There is now a standard API for how vertex and fragment programs get bound to a context: you call ProgramStringARB to load a program, and you call BindProgramARB to select a current program. This API works and is now the standard. The 3Dlabs proposed API is more complex for even very simple usage models. For example, I believe if you want to mix and match different vertex and fragment programs, you have to build a new program object, attach the objects, link it, etc. It reminds me of D3D's clumsy mechanism where you had to attach your surfaces to one another to get a mipmap chain or a Z-buffer. If all you want is code reuse, there are other ways to accomplish the same thing.


That you can bind vertex programs and fragment programs independently may be useful in some special situations, but it contradicts the philosophy of shaders in most situations.

In basically all situations, vertex and fragment programs work together smoothly. The Stanford shader language even uses only one shader program, and calculations per vertex and fragment are defined by declarations and type-casts. Thus, the OpenGL 2.0 is natural, more natural than handling vertex and fragment shaders as independent states.

This means: One shader is the combination of vertex and fragment program. If you want to combine program parts, splitting only between vertex and fragment programs is not consequent and only a special case. The OpenGL 2.0 proposal more consequent and allows linking of vertex and fragment program parts.



The async stuff is ill-designed and overblown as proposed, I think. It's not just a matter of little details.


I think for example the GL_NV_fence and GL_NV_occlusion_query extensions show that there is need for a uniform asynch mechanism.

What do you not like with the OpenGL 2.0 proposal?



For object handles, sure, it might be nice if OpenGL had done opaque handles rather than app IDs from day one. But OpenGL didn't do that. Changing things leaves you with two incompatible APIs, and that really doesn't benefit anyone.


Agreed, it is not nice. On the other hand, the old mechanism is also not nice, so it
makes sense to change it.

But at the end, is this difference really relevant? It's only a matter of taste without any consequences.



But I was actually referring here to the memory management paper. I think "Direct Access" is a flawed idea, for one.


It is the same idea as buffer locking in Direct3D. Due to some video texture issues, I already had some discussion with an NVidia driver developer. As far as I understand it correctly (and it sounded plausible), locking avoids a copying operation (as long as the textures are not swizzled) and so can be locked directly in AGP / video memory. From this I conclude that a "direct access" can be useful.



The policies, as provided, are not powerful enough to do anything useful, but making them powerful enough to do anything useful would require undoing a very wise design decision OpenGL made a long time ago, which was to have memory management driver-controlled, not app-controlled.


Agreed, I am also not happy with the policy stuff. Probably the best idea is to remove it completely and only keep the the usage hint flags and probaby the priority stuff.



And the vertex array objects are likewise poorly designed, I believe. (I'm especially annoyed by the addition of the separate index arrays for each attribute, which is something OpenGL has wisely avoided in the past.)


Probably true, depends on driver internal details. If driver devlopers would say "this is no problem, it is for free", then it would be useful to expose it to applications. Your statement indicates that this is not the case, therefore it indeed is not a good idea. And it is not a big loss I think (high-poly meshes anyway share most vertices). So I agree to you regarding the multiple index arrays.



This forum is not really a good place to have these kinds of discussions, unfortunately.


Hm, why not?

This is a discussion about the future of OpenGL.



Also, please, let's avoid the conspiracy theory angle on things. The idea that exposing assembly languages is a scheme to *prevent* progress in OpenGL is, well, ludicrous. Likewise, the idea that NVIDIA doesn't want OpenGL to advance [which has been suggested by numerous people] is likewise erroneous. I, for one, wrote up the original NVIDIA OpenGL 1.4 proposal.

Of course, it doesn't help to have 3Dlabs people on these boards feeding those conspiracy theories with such comments as that NVIDIA has supposedly "road-blocked at every point" -- which is patently absurd. There are a lot of comments about the NVIDIA "marketing machine", but if anyone has been doing the marketing here, it's 3Dlabs...

It's likewise amazing to see the conspiracies that people cook up about what Cg is. Is it that hard to believe that Cg is just a tool?


No comment about non-technical comments.

But definitely, Cg is a tool, and it is a very useful tool worth using it.

Cg simply does not satisfy the desire for an open standard shader language accross hardware vendors. OpenGL 2.0 addresses this desire.

But as already mentioned several times ago, I think it is not a Cg against ogl2, because they have different aims.

mcraighead
07-05-2002, 08:15 PM
The problem with having discussions of this sort here is that the messages get very long, and that it's often impossible to put all the necessary technical details in (especially in a public forum).


Sometimes it's a bad idea to hide certain details. Hiding details can sometimes have a negative performance impact, or reduce the amount of control that developers have. For example, the OpenGL texture environment interface "hides" the details of our register combiners. However, we decided that it was worth exposing those details, because people wanted the extra control.

The behavior of "non-linear performance decrease" would actually be nice compared to what _would_ happen if you loaded an unsupported shader in many cases. As I've pointed out, you'll often end up with a complete SW fallback, and in that case, you run into some pretty big invariance problems that make the emulation essentially useless in a large class of situations. Indeed, on GF3/GF4-class hardware, these SW fallbacks would be necessary for the vast majority of shaders that don't fit in HW limits.


You're greatly underestimating the utility of mixing different shader classes. One simple example: suppose all your objects in your app are bumpmapped, but some have static geometry (think "ammo box") and some use matrix blending (think "player" or "monster"). The former type can use a very simple vertex program, and the latter uses a very fancy one; but both can use exactly the same fragment program.

Since -- from a hardware point of view -- these two shader classes really *can* be mixed and matched painlessly, and because it is easy to expose an API where they can mix and match painlessly, you might as well do the API that way.

Sure, in some shading languages, you unify vertex and fragment computations into one shader. But that's not how either Cg or the 3Dlabs language work. So what makes the most sense is to let people plug and play as they desire. If you want to use a language like the Stanford language, you can compile it down to a pair of vertex and fragment programs, and bind them both. All that means is that you have to store two GLuints in your Shader class rather than one, and call BindProgram twice. Nothing difficult.


As for the async proposal, well, let's start with FlushStream -- which, so far as I can tell, solves a problem that does not exist; Flush works perfectly fine.

wglConvertSyncToEvent may not necessarily be implementable; emulating it when the appropriate hardware support does not exist would require some kind of horrible OpenGL polling thread that calls SetEvent when the particular hardware event had occurred. This is highly undesirable driver behavior.

The fence API here misses the point of the ALL_COMPLETED_NV flag, which is that there are multiple levels of "completion" that you can imagine, corresponding to different pipeline stages. Its use of 64-bit ints is necessary to prevent ugly wraparound behavior in entirely realistic timeframes (though it doesn't actually solve the problem -- the problem can only truly be fixed by using fence objects), but 64-bit ints are not very standard in C, and this also forces implementations to use 64-bit fences internally even when 32-bit ones might be more efficient and the full 64 bits are never _really_ needed.

I could continue, but most of the rest of the proposal seems to be overkill. A unified sync interface, maybe -- if done right. Background command streams? Sounds perfect for a vendor or EXT extension.


Direct access makes the same *mistake* that D3D did. The biggest problem with the proposal, though, is that it appears to be incomplete. Okay, so you can "lock" (by a different name) a texture. You get a pointer back. But to what does the pointer point? It's gotta be some sort of _internal_ data format, and so either you need to specify a data format or the driver needs to tell you that format. It's not good enough to see that the texture's internal format was RGB5_A1, and so obviously the data format is, well, RGB5_A1 -- because that doesn't tell you bit order or endianness, and even then, the driver might not actually have support for 5551 textures. Those internal formats were always just format *hints*.

If the driver tells you what format it really picked, the app's task is pretty nasty, because it has to handle so many possibilities.

If the app tells the driver what format it wants, then if they don't match, you have to do all sorts of reformatting. (But the whole point was supposed to be that you avoid reformatting???) This seems to be the intent of the proposal, based on the figure on page 21. Yet this seems to miss the whole distinction between -- in, say, the TexImage API -- the internalformat parameter and the format and type parameters. Format and type are parameters that are supposed to be thrown away after the TexImage call finishes...

The proposal also seems to contradict itself in several places on this matter:

"In other words, this direct access buffer does not expose the internal format, storage layout and type, which is implementation dependent." [If not, then what does the pointer point to?]

"Direct Access is mandatory ..., even if it means that format conversion needs to be done by the driver." [What if you run out of virtual address space to map the memory? Don't laugh, it really happens these days with 32-bit addresses, systems with 2GB of memory, and 128MB video cards.]

"The OpenGL implementation is not allowed to reformat or convert the data when executing glAcquireDirectPointer() or glReleaseDirectPointer*(), if this takes any significant amount of time." [Then what are you supposed to do instead???]

It actually gets worse. Often, reformatting _will_ be necessary (for unforseen circumstances). But this lock doesn't even have read-only or write-only flags, which force the driver to do even more reformatting even when it's unnecessary! Sometimes, reformatting is also lossy. Consider the RGBA8 texture whose data had originally been specified as 4444. (just a hypothetical case) The driver must convert to 4444 and then back -- but if the user doesn't write to every pixel inside the lock rectangle, you just lost the low 4 bits of your texture.

Also, lock mechanisms have lots of internal synchronization, and there is usually a better way of doing things.

Locks were a huge screwup in D3D. I don't want OpenGL to go down that road...


I've discussed multiple index arrays previously. Most people who _think_ they want this feature don't need it, from what I can tell. The driver can't do much to optimize this sort of situation.

- Matt

folker
07-05-2002, 10:38 PM
Originally posted by mcraighead:
The problem with having discussions of this sort here is that the messages get very long, and that it's often impossible to put all the necessary technical details in (especially in a public forum).


Sometimes it's a bad idea to hide certain details. Hiding details can sometimes have a negative performance impact, or reduce the amount of control that developers have. For example, the OpenGL texture environment interface "hides" the details of our register combiners. However, we decided that it was worth exposing those details, because people wanted the extra control.


I meant implementation(!) details. If you want to run the same functionality on different hardware, I call this an implementation detail, and you should not need to code different implementations for different hardware.

Doom3 is a famous sample for that: Several code pathes for implementing the same functionality.



The behavior of "non-linear performance decrease" would actually be nice compared to what _would_ happen if you loaded an unsupported shader in many cases. As I've pointed out, you'll often end up with a complete SW fallback, and in that case, you run into some pretty big invariance problems that make the emulation essentially useless in a large class of situations. Indeed, on GF3/GF4-class hardware, these SW fallbacks would be necessary for the vast majority of shaders that don't fit in HW limits.


Agreed, for gf3/gf4 cards a emulation is not possible. Hardware features differ too much to be hidden under one interface.

However, the P10 is near a universal shader language where no feature is missing any more, and hardware will only differ in performance. Thus, the next step should be to hide implementation details.

The idea is to design future(!) hardware in such way that this is possible in a senseful way.

Currently I don't see any reason why this should not be possible in hardware (your blending shader argument does not hold when doing per-primitive multi-pass in such cases).



You're greatly underestimating the utility of mixing different shader classes. One simple example: suppose all your objects in your app are bumpmapped, but some have static geometry (think "ammo box") and some use matrix blending (think "player" or "monster"). The former type can use a very simple vertex program, and the latter uses a very fancy one; but both can use exactly the same fragment program.

Since -- from a hardware point of view -- these two shader classes really *can* be mixed and matched painlessly, and because it is easy to expose an API where they can mix and match painlessly, you might as well do the API that way.

Sure, in some shading languages, you unify vertex and fragment computations into one shader. But that's not how either Cg or the 3Dlabs language work. So what makes the most sense is to let people plug and play as they desire. If you want to use a language like the Stanford language, you can compile it down to a pair of vertex and fragment programs, and bind them both. All that means is that you have to store two GLuints in your Shader class rather than one, and call BindProgram twice. Nothing difficult.


The problem is, for example in your sample, the current solution is to combine
a) a vertex program containing both matrix blend functionality and bump functionality, and
b) a bump fragment shader.
You still have to write all combinations of matrix blend and bump vertex program variants. So there is no real advantage of combining such vertex and fragment shaders,
because you still have to implement the cross product of all functionality combinations.

A better solution would be to combine separate funcionalities:
a) a matrix blend vertex sub-program with
b) a bump vertex+fragment sub-programs.
Then you can really combine different functionalities in any way by combining sub-shaders.

So only splitting between vertex and fragment shaders is only half the way and does not solve the problems.



As for the async proposal, well, let's start with FlushStream -- which, so far as I can tell, solves a problem that does not exist; Flush works perfectly fine.


Agreed.



wglConvertSyncToEvent may not necessarily be implementable; emulating it when the appropriate hardware support does not exist would require some kind of horrible OpenGL polling thread that calls SetEvent when the particular hardware event had occurred. This is highly undesirable driver behavior.


If this is the case, convert-synch functions indeed are no good idea.



The fence API here misses the point of the ALL_COMPLETED_NV flag, which is that there are multiple levels of "completion" that you can imagine, corresponding to different pipeline stages. Its use of 64-bit ints is necessary to prevent ugly wraparound behavior in entirely realistic timeframes (though it doesn't actually solve the problem -- the problem can only truly be fixed by using fence objects), but 64-bit ints are not very standard in C, and this also forces implementations to use 64-bit fences internally even when 32-bit ones might be more efficient and the full 64 bits are never _really_ needed.


Agreed.



I could continue, but most of the rest of the proposal seems to be overkill. A unified sync interface, maybe -- if done right. Background command streams? Sounds perfect for a vendor or EXT extension.


Agreed.



Direct access makes the same *mistake* that D3D did. The biggest problem with the proposal, though, is that it appears to be incomplete. Okay, so you can "lock" (by a different name) a texture. You get a pointer back. But to what does the pointer point? It's gotta be some sort of _internal_ data format, and so either you need to specify a data format or the driver needs to tell you that format. It's not good enough to see that the texture's internal format was RGB5_A1, and so obviously the data format is, well, RGB5_A1 -- because that doesn't tell you bit order or endianness, and even then, the driver might not actually have support for 5551 textures. Those internal formats were always just format *hints*.

If the driver tells you what format it really picked, the app's task is pretty nasty, because it has to handle so many possibilities.

If the app tells the driver what format it wants, then if they don't match, you have to do all sorts of reformatting. (But the whole point was supposed to be that you avoid reformatting???) This seems to be the intent of the proposal, based on the figure on page 21. Yet this seems to miss the whole distinction between -- in, say, the TexImage API -- the internalformat parameter and the format and type parameters. Format and type are parameters that are supposed to be thrown away after the TexImage call finishes...

The proposal also seems to contradict itself in several places on this matter:

"In other words, this direct access buffer does not expose the internal format, storage layout and type, which is implementation dependent." [If not, then what does the pointer point to?]

"Direct Access is mandatory ..., even if it means that format conversion needs to be done by the driver." [What if you run out of virtual address space to map the memory? Don't laugh, it really happens these days with 32-bit addresses, systems with 2GB of memory, and 128MB video cards.]

"The OpenGL implementation is not allowed to reformat or convert the data when executing glAcquireDirectPointer() or glReleaseDirectPointer*(), if this takes any significant amount of time." [Then what are you supposed to do instead???]

It actually gets worse. Often, reformatting _will_ be necessary (for unforseen circumstances). But this lock doesn't even have read-only or write-only flags, which force the driver to do even more reformatting even when it's unnecessary! Sometimes, reformatting is also lossy. Consider the RGBA8 texture whose data had originally been specified as 4444. (just a hypothetical case) The driver must convert to 4444 and then back -- but if the user doesn't write to every pixel inside the lock rectangle, you just lost the low 4 bits of your texture.

Also, lock mechanisms have lots of internal synchronization, and there is usually a better way of doing things.

Locks were a huge screwup in D3D. I don't want OpenGL to go down that road...


Convinced.

Probably the right approach to solve the video problem is to implement special extensions, for example a DirectShow-video-texture wgl-extension.



I've discussed multiple index arrays previously. Most people who _think_ they want this feature don't need it, from what I can tell. The driver can't do much to optimize this sort of situation.


Agreed completely.

All in all, I think many of your points are very valid points.

But I have a quite different opinion about the job of the OpenGL driver to hide implementation details. And I think this is (besides the shader language) the most interesting aspect of OpenGL 2.0: Get back to the main purpose of a driver to hide hardware implementation details.

knackered
07-06-2002, 03:43 AM
It's really touching and fascinating to watch 2 companies have a reasoned discussion like this.
Please carry on doing so in this forum, as I believe its important that we developers understand your reasoning too.

Thaellin
07-06-2002, 04:03 AM
While I strongly feel a standard HLL controlled by a comittee (such as the ARB) is desireable for GL, I'll agree with Matt that the core API may not be the place for it.

Traditionally, higher-level concepts like this have been part of GLU. I could imagine glu functionality that would compile the hll into a byte-code or core api assembler representation as meeting my needs for a SL.

In regards to the proposed sync api, I believe NV's fence would work quite nicely, but I think it was avoided due to the big "IP" stamps nVidia puts on everything. I understand that nVidia has been very cooperative with the ARB in regards to IP licensing as of late, so it may still be possible to modify this section of the API.

The new 'flush' is not something I've looked at much. I don't have the papers in front of me, but I assume it's to help with real-time constraints. Is it more of an async 'finish'? The name was changed to avoid confusion with the old functionality, IIRC.

Matt - you really do have some good points, and an apparently well-defined vision of where you, personally, would like to see the API move. Have you been able to participate in shaping the 2.0 API at all or had you given it up as completely unworkable?

-- Jeff

knackered
07-06-2002, 04:57 AM
BTW, Nutty, just to clear something up - I wasn't trying to imply that your opinions don't matter when I asserted that you don't use OpenGL professionally - it was merely a reply to Eric saying:-

Nutty is completely right: GL2.0 is useless right now, at least for the kind of work he's doing (Game Dev).

I've seen the experiments you've done in GL on your website, and have the greatest respect for you as a GL programmer.

folker
07-06-2002, 04:59 AM
Originally posted by Thaellin:
While I strongly feel a standard HLL controlled by a comittee (such as the ARB) is desireable for GL, I'll agree with Matt that the core API may not be the place for it.

Traditionally, higher-level concepts like this have been part of GLU. I could imagine glu functionality that would compile the hll into a byte-code or core api assembler representation as meeting my needs for a SL.


The core API definitely should be hardware independent. This is the job of a driver.

Note that current nv-vertex-program or arb-vertex-program is very close to hardware and therefore not really hardware independent. Both vertex programs assume that each command corresponds to a hardware command without optimization. This is no abstraction of the hardware and makes it difficult to change the hardware architecture. They only reflect the current vertex program hardware design, they are not future oriented. For example, suppose a future processor works with scalar pipelines, this does not fit with counting vector commands. So it is not a good idea to use such an assembly language as standard for the long run, it would be a mistake.

A assembly language should be really abstract, not reflecting current hardware. But then I don't see arguments for a assembly language - what is the advantage compared to a C-like language? In both cases you have an parse and optimize step. So assembly simply has no advantages to a C-like language, and a C-like language is simply easier to read. For good reason in softare industry C is a standard and no standard virtual assembly language.

Also CPU hardware is not abstracted by an anstract hardware-independent assembly language, but C/C++ as standard hardware-independent language is used. I think for a good reason. evanGLizr presented some strong arguments in this thread.

It is an interesting discussion to use a bytecode language for the core API. There are especially arguments for a byte-code, since it moves parsing to a glu-like layer. Also here, the bytecode should be really abstract, not reflecting current hardware. On the other hand, developers want a standard high-level shader language, not a standard byte-code shader language. I don't see any desire for developers implementing different shader languages syntax frontends for the same shader language.



In regards to the proposed sync api, I believe NV's fence would work quite nicely, but I think it was avoided due to the big "IP" stamps nVidia puts on everything. I understand that nVidia has been very cooperative with the ARB in regards to IP licensing as of late, so it may still be possible to modify this section of the API.


To be honest, I consider such kind of question as details. ARB members should simply sit together, discuss such details and fix it.

In my optinion, the most important issue of ogl2 is an open standard shading language and the philosophy to really hide hardware implementation details. An second very important issue is atrribute and index array objects.

This all may be implemented also by extensions. But regarding the shader language and hiding hardware details, I think it is important to not simply try to standardize existing hardware like 1.4 is currently doing, but to make a real future-oriented step forward. Calling it gl2 reflects this, and the complain by many people that ogl2 is too far in future indicates that they realized that it wants to be more than the 1.x evolution.



The new 'flush' is not something I've looked at much. I don't have the papers in front of me, but I assume it's to help with real-time constraints. Is it more of an async 'finish'? The name was changed to avoid confusion with the old functionality, IIRC.


Recently I had a discussion about d3d pipeline flushing. I learned that it maybe that d3d does not flush until a scene end. This means, with current opengl it is not possible force the execution of pending commands without waiting for completion.
This is addressed by the ogl2 finish.

However, I don't see a need for this.



Matt - you really do have some good points, and an apparently well-defined vision of where you, personally, would like to see the API move. Have you been able to participate in shaping the 2.0 API at all or had you given it up as completely unworkable?


[This message has been edited by folker (edited 07-06-2002).]

[This message has been edited by folker (edited 07-06-2002).]

[This message has been edited by folker (edited 07-06-2002).]

Quaternion
07-06-2002, 07:02 AM
I believe that in the next generation graphic hardware, the graphic proccessor should process the programs just like the CPU does that. i.e. there should be a standard set of operations available on the GPU, on both fragment and vertex programs. Maybe the basic architecture of "one operation per one clock cycle" should be changed. Complex operations should take more clocks cycles than simple operation. That way there will be no need in vendor-specific operations.

Once an opertion set becomes a standard, anyone can implements his own language, that will run on every standard hardware. This should be exactly like in the CPU market. Imagine that you had to write different programs for Intel proccesors and for AMD processors.

Future hardware should differs in it's memory speed and architecture, in it's processor speed and in it's operation pipelining methods, not in it's special operations. I believe that every special operation can be implemented using basic operations (for example, looping can be achieved by JMP and basic conditionals). In my opinion, the GPU should be a vector-based CPU-like processor, with the ability to access per-vertex and per-fragment attributes.

The only big problem arises (as far as I can see) is the limited number of texture units. This alone can blow up the one-program-fits-all solution. A hardware solution to this problem must be found, but I wonder whether the IHVs wants it as we do.

In the future, assuming a standard set of GPU operations, we will be able to use the Shading Language of our choice, and not an API or vendor specific language.

Anyway, I am NOT a hardware engineer, and my perception of the situation might be biased. I am not sure whether such a CPU-like approach is possible taking into account all the hardware aspects.

Shlomi

Thaellin
07-06-2002, 07:11 AM
The core API definitely should be hardware independent. This is the job of a driver.


This is not what I'm trying to argue. I'm not certain the assembler should be specified outside vendor-specific extensions. This would be accessible through an 'asm'-style keyword. You would want to be able to query the gpu support at runtime (or, rather, shader-bind time) to choose the instructions to be compiled in.

I'd think a byte-code standardization /would/ be desirable over compiling raw strings in order to speed compile-time and possibly put shader IP under a layer of abstraction. The assembler standard would be second choice, and should be possible due to the ARB VP extension (in combination with some other extensions to provide flow control, etc).

You could then provide the high-level language as part of the utility layer. You can then write a run-anywhere shader and still have the option to fine-tune for the specific architecture, much like you can in "C" today.

I definitely agree with Matt about the complexity being added to the 2.0 proposal. When I first set out to learn a 3D API, I'd chosen D3D, and I was really bothered by all the hoops I had to jump through to get a simple double-buffered surface with an attached depth buffer. After learning enough to put together my test-bed, I tried to do the same thing in OpenGL and found the process dead-easy. Now we're introducing some of the same complexity which steered me away from D3D (as a starter language) into GL. Admittedly, it's very flexible, but I'm not certain about the barrier to entry that may be created for language newbies.

(note - I understand D3D is very different now than it was 'X' years ago, but I really don't care anymore).

-- Jeff

V-man
07-06-2002, 07:32 AM
>>>The only big problem arises (as far as I can see) is the limited number of texture units. This alone can blow up the one-program-fits-all solution. A hardware solution to this problem must be found, but I wonder whether the IHVs wants it as we do.
<<<

This one has been mentioned enough times (texture unit limitation). I find it kind of silly not having at minimum software mode fallback by the driver for this. The only question is, how many units is enough?

Anyways, how many varieties are there? Some have 1 tex unit, others have 2, others have 4, others have 6. You should be able to cover all of these cases.

V-man

Quaternion
07-06-2002, 08:22 AM
You should be able to cover all of these cases.

But that's not what we want. It's not the topic of this thread anyway, but I think a hardware solution to the problem must be found, otherwise nothing will help us avoid writing hardware-specific shaders (then what the use of standard HLSL).

When I say hardware solution I mean a solution that enables the hardware to process an arbitrary number of textures per pass, with linear performance cost. I really don't think that multipassing will be needed in the future (why resending the data?).

edit: By multipassing I meant using multiple passes because there are not enough texture units.

Shlomi.

[This message has been edited by Quaternion (edited 07-06-2002).]

Carmacksutra
07-06-2002, 09:46 AM
Originally posted by mcraighead:
You're greatly underestimating the utility of mixing different shader classes. (...)
Since -- from a hardware point of view -- these two shader classes really *can* be mixed and matched painlessly, and because it is easy to expose an API where they can mix and match painlessly, you might as well do the API that way.

I agree, Program/Shader pair will indeed be much more complex to maintain in engine.
But I understand this is necessary to implement efficient binding of shaders that are higher level then DX8-style VS/PS. I believe compiling shaders can only be simple preprocessing, real work will have to be performed in "linking" stage, when both vertex + fragment code is known. Consider vertex shader that is reused in several paths (SLs are evolving for such usage, thats why they got loops, branches, and user functions). Such shader could compute varying data that is used by fragment shader A, but not used by fragment shader B. Only at linking stage HW resources can be assigned efficiently (like inlining function in C)



I think "Direct Access" is a flawed idea, for one. The policies, as provided, are not powerful enough to do anything useful, but making them powerful enough to do anything useful would require undoing a very wise design decision OpenGL made a long time ago, which was to have memory management driver-controlled, not app-controlled

How does your statement relate to Nvidia's own VAR ?
Whatever VAR+fence or VAO+MOB can do, ArrayObject+DirectAccess+GLSync can do better.
VAR doesn't allow to interleave rendering from AGP mem and video memory withot flushing.
VAO doesnt allow efficient updating contents of array (I havent seen MOB)
Both interfaces are drastically incompatible.
GL 2.0 solves nicely all these problems, it is shame we don't have it yet.

As for Direct Access on texture memory, your arguments might justify modifying OpenGl 2.0 proposal, not abandoning it.



Also, please, let's avoid the conspiracy theory angle on things. The idea that exposing assembly languages is a scheme to *prevent* progress in OpenGL is, well, ludicrous. Likewise, the idea that NVIDIA doesn't want OpenGL to advance [which has been suggested by numerous people] is likewise erroneous. I, for one, wrote up the original NVIDIA OpenGL 1.4 proposal.

Tolerating Glide-of-Nvidia + Glide-of-Ati growing on old OpenGL 1.x fundaments, can't be seen as progressive.
Even if we agreed on that OpenGL 2.0 has some flaws, it is still better then current state.
And ARB_ertex_program solves just one problem, dozen of others are still remaining.

folker
07-06-2002, 11:17 AM
Another proposal how arbitrary complex shaders can be automatically split into efficient multiple passes by the driver(an efficient solution solving the problem mentioned by Matt).

A group of primitives is rasterized into fragments. These fragments are executed per multipass: First pass of all fragments, second pass of all fragments and so on. But instead of writing temporary results into aux buffers (having the problem mentioned by Matt), the temporary results are streamed into some aux memory, so that each fragment temp data is stored in this stream independently of its position on screen. In the next pass, this data can be read back in the same order. Since multiple passes are not executed per primitive but per group of primitives, expensive fragment shader pass program state changes are minimized.

Memory overflow of this aux memory can be avoided by splitting large number of primitives into smaller groups. The first pass renders until the aux memory is exhausted , then renders the second pass of these primitives etc., before it starts the whole procedure with the remaining primitives. So the driver can reserve a fixed aux memory size for this technique. If the aux memory size is small, the only disadvantage is that only smaller number of primitives and fragments can be rendered within one pass, so increasing the number of fragment sub-program state changes, so decreasing performance continuously.

But there is never a sudden change of performance: Less cabable hardware or less memory continuously decreases performance.

By this technique, the driver can always split arbitrary complex fragment programs into multiple passes so that they can always be executed completely in hardware.

So I think that Matt is wrong that complex shaders can be only emulated in software, or do I miss something?

folker
07-06-2002, 11:22 AM
Originally posted by Quaternion:
I believe that in the next generation graphic hardware, the graphic proccessor should process the programs just like the CPU does that. i.e. there should be a standard set of operations available on the GPU, on both fragment and vertex programs. Maybe the basic architecture of "one operation per one clock cycle" should be changed. Complex operations should take more clocks cycles than simple operation. That way there will be no need in vendor-specific operations.

Once an opertion set becomes a standard, anyone can implements his own language, that will run on every standard hardware. This should be exactly like in the CPU market. Imagine that you had to write different programs for Intel proccesors and for AMD processors.

Future hardware should differs in it's memory speed and architecture, in it's processor speed and in it's operation pipelining methods, not in it's special operations. I believe that every special operation can be implemented using basic operations (for example, looping can be achieved by JMP and basic conditionals). In my opinion, the GPU should be a vector-based CPU-like processor, with the ability to access per-vertex and per-fragment attributes.

The only big problem arises (as far as I can see) is the limited number of texture units. This alone can blow up the one-program-fits-all solution. A hardware solution to this problem must be found, but I wonder whether the IHVs wants it as we do.

In the future, assuming a standard set of GPU operations, we will be able to use the Shading Language of our choice, and not an API or vendor specific language.

Anyway, I am NOT a hardware engineer, and my perception of the situation might be biased. I am not sure whether such a CPU-like approach is possible taking into account all the hardware aspects.

Shlomi

I don't think that would be a good idea, since the hardware differes too much. Sample: Having a vector unit or a scalar unit. Running SIMD programs on scalar hardware is ugly, and vice versa hardware cannot transform scalar programs into SIMD instructions. As CPUs definitely demonstrates, a compiler must do that work.

I agree with you that most performance differences in CPUs are internals, executing the same binary code. But major differences also arise from the machine code architecture, see for example the I64 architecture compared to I32 architecture, or alpha architecture etc. Compilers play an important role.

And CPUs demonstrates: The real standard is C / C++. We should learn this lessions for GPUs.

folker
07-06-2002, 11:29 AM
Originally posted by Thaellin:
This is not what I'm trying to argue. I'm not certain the assembler should be specified outside vendor-specific extensions. This would be accessible through an 'asm'-style keyword. You would want to be able to query the gpu support at runtime (or, rather, shader-bind time) to choose the instructions to be compiled in.

I'd think a byte-code standardization /would/ be desirable over compiling raw strings in order to speed compile-time and possibly put shader IP under a layer of abstraction. The assembler standard would be second choice, and should be possible due to the ARB VP extension (in combination with some other extensions to provide flow control, etc).

You could then provide the high-level language as part of the utility layer. You can then write a run-anywhere shader and still have the option to fine-tune for the specific architecture, much like you can in "C" today.

I definitely agree with Matt about the complexity being added to the 2.0 proposal. When I first set out to learn a 3D API, I'd chosen D3D, and I was really bothered by all the hoops I had to jump through to get a simple double-buffered surface with an attached depth buffer. After learning enough to put together my test-bed, I tried to do the same thing in OpenGL and found the process dead-easy. Now we're introducing some of the same complexity which steered me away from D3D (as a starter language) into GL. Admittedly, it's very flexible, but I'm not certain about the barrier to entry that may be created for language newbies.

(note - I understand D3D is very different now than it was 'X' years ago, but I really don't care anymore).

-- Jeff

I agree that a byte-code interface and moving the HLL compiler into some gl2u has design advantages.

But also here the important point is that the bytecode is hardware independent, and does not reflect only current hardware. The nv and gl 1.4 vertex shader asm languages reflect the current hardware, the are not future-oriented. This is the main point which should be addressed by gl2.

And compared to the issue of providing a real hardware independent shader language, the question of using a hardware-indpenented byte-code or hardware-independent c-like language is less important.

I also agree with you that we should avoid too much compexity to ogl2. Probably much stuff is not necessary. In my opinion, the important things are a hardware-independent shader language running on every gl2 hardware, and attribute array objects.

Jurjen Katsman
07-06-2002, 11:35 AM
Folker: Your multipass solution is sortof cool, but it's unlikely current hardware is going to be able to do this. New hardware could be build to support this, but that's not the issue.

This is the core problem with OpenGL2.0, it's NOT going to work right on old hardware. I think it basically puts us back in the old situation when GL vs D3D wars where raging. And we're probably going to see hardware only implementing parts of the GL2.0 spec, and other hardware which drops back to software all the time, but does do all of 2.0.

Both are huge problems for the API. With hardware not implementing everything (and instead providing their own interfaces for say, fragment shading) it's basically still just a whole bunch of extentions.

If hardware drops back to software all the time, or more general, has extreme performance drops when only making slight changes to the application (say using a view extra variables in a shader), it's going to be practically useless for game development as well.

Some sort of standard way of exposing capabilities (not really hardware specifics) will be required, and this doesn't have to mean a list of things a card does in hardware, just a list of things it does at a reasonable speed, relative to the overall card speed.

Also, I personally feel no GL implementation should be required to implement things not in this list. If they can't do a particular good (accelerated) job at it, I'll just do it myself. This is what so far has happened in games anyway, until hardware started acceleration transform and lighting pretty much all games still did their own. The same has happened with operations that can't be done in the fixed function pipe, etc.

[This message has been edited by Jurjen Katsman (edited 07-06-2002).]

mcraighead
07-06-2002, 12:06 PM
Originally posted by Thaellin:
The new 'flush' is not something I've looked at much. I don't have the papers in front of me, but I assume it's to help with real-time constraints. Is it more of an async 'finish'? The name was changed to avoid confusion with the old functionality, IIRC.

As for real-time things: there have been rumblings from certain directions at various times in the past about things that GL should support in order to be good for real-time apps. Unfortunately, Windows is not a real-time OS, and it's nearly impossible to solve many of these (admittedly interesting) problems on Windows, or even on Linux. The use of Windows event handles in the GL2.0 async API is a good example: Windows events have pretty poor responsiveness, actually, for many uses.

Moving on...

Okay, so from what I can tell, the idea of the new FlushStream is "Flush -- and I mean it!"

The only reason I can think of for why FlushStream might possibly be necessary is that some drivers might not really implement a glFlush as, well, a flush. Now, Flush is still specified strictly enough that the only realistic implementation I can think of would be one where you set up a timer in the driver, and then Flush would be implemented as "flush on the next timer tick, but not right now."

It's really hard to write spec language that rules out such an implementation even for FlushStream. It says something about "making progress", but -- sure, you're making progress, towards the next timer tick!

In any case, I see this as an implementation quality issue. NVIDIA drivers treat glFlush as a real, honest-to-goodness flush right now, and we've never had problems with this. Maybe 3Dlabs drivers don't work this way. But if anything, adding the new API will encourage misuse of the original glFlush, pressuring vendors such as NVIDIA to implement such hacks even if we would rather not.


Originally posted by Thaellin:
Matt - you really do have some good points, and an apparently well-defined vision of where you, personally, would like to see the API move. Have you been able to participate in shaping the 2.0 API at all or had you given it up as completely unworkable?

Until recently, there hasn't even been a 2.0 working group, so there hasn't even been any forum in which I could express my views. I don't know if I'll have time to participate in that working group; I have other work to do, after all. I suspect that the working group will focus on the shading language more than the other things right now, and in that case, I'm not really a big expert on shading languages...

Of course, I have discussed my views on shading with other people at NVIDIA. It's safe to say that NVIDIA won't simply be sitting on the side here.

It's important to distinguish between "road-block[ing] at every point" [as one 3Dlabs employee described our supposed stance] and "honest disagreement". It all is very reminiscent of that poll on the front of opengl.org, where you can either adopt the position that you support the OpenGL 2.0 proposals completely, or you want to learn more about them...

Now, I can understand that 3Dlabs marketing wants to portray NVIDIA as standing in the way of all that is good; but the facts just don't match up.

It's kind of like in politics, where if you disagree with bill X, people will smear you as being opposed to the _aims_ of bill X. ("You voted against tougher penalties for murder? Clearly you support murder!")

[Politics at the ARB? Why, that would never happen...]

- Matt

mcraighead
07-06-2002, 12:12 PM
Originally posted by Thaellin:
I'd think a byte-code standardization /would/ be desirable over compiling raw strings in order to speed compile-time and possibly put shader IP under a layer of abstraction.

We specifically did not use a bytecode for NV_vertex_program because there are a lot of disadvantages to it.

It's harder to read and write and debug.

It's not human-understandable.

It is much, much harder to extend to add new features.


Many of the same arguments apply to a procedural interface, though one additional one crops up: it doesn't provide a standard file format for interchange of programs between different apps.


A textual interface solves all of these problems, and parsing isn't _that_ expensive.

Of course, the ARB_vertex_program API would allow you to define a bytecode pretty easily; it's just that no bytecode languages have [as of right now] been created.

- Matt

mcraighead
07-06-2002, 12:23 PM
folker,

If I'm not mistaken, you're describing an F-buffer. Yes, an F-buffer can work. However, in practical scenarios, it often will achieve less performance than app-specified multipass, because the app can take advantage of its own special knowledge that, say, no pair of fragments will have the same [X,Y,Z]. (There are also other disadvantages to building an F-buffer.)

Unfortunately, discussion of how shaders will work on next-generation hardware is an area where I can't speak in too great of detail, seeing as doing so would require revealing assorted nonpublic information.

However, based on the information that I do know, I consider transparent multipass an unproductive API approach for both previous and future generations of hardware.

And I still think transparent multipass misses the real point when it comes to API design. Again, that real issue is undefined vs. defined shader size/complexity limits.

- Matt

Quaternion
07-06-2002, 12:54 PM
Originally posted by folker:
Running SIMD programs on scalar hardware is ugly, and vice versa hardware cannot transform scalar programs into SIMD instructions. As CPUs definitely demonstrates, a compiler must do that work.

What do you mean? What connection that has with using a low level shading language as a standard? If you use a HLSL as a standard every IHV will have to compile your c-like code into his own operation set (e.g. you can't send a "for" loop directly to the card, it must be compiled into low-level operations). Compilation is not the job of the driver. If you use LLSL as a standard, the IHVs will only need to match their own opcode to every standard operation.

Am I missing your point?

Shlomi.

folker
07-06-2002, 02:28 PM
Originally posted by mcraighead:
folker,

If I'm not mistaken, you're describing an F-buffer. Yes, an F-buffer can work. However, in practical scenarios, it often will achieve less performance than app-specified multipass, because the app can take advantage of its own special knowledge that, say, no pair of fragments will have the same [X,Y,Z]. (There are also other disadvantages to building an F-buffer.)

Unfortunately, discussion of how shaders will work on next-generation hardware is an area where I can't speak in too great of detail, seeing as doing so would require revealing assorted nonpublic information.

However, based on the information that I do know, I consider transparent multipass an unproductive API approach for both previous and future generations of hardware.

And I still think transparent multipass misses the real point when it comes to API design. Again, that real issue is undefined vs. defined shader size/complexity limits.

- Matt

I didn't know about F-buffer, sorry.
(Does someone know papers / information about it?)

Well, there is a difference between your old statement that "some complex shader only can be emulated in software", and your new statement that it is only "often slower than manual coded multipass". The latter would be very fine and would mean that transparent multipass is possible in hardware.

And if you now say "there are problems with F-buffer but I cannot talk about it", then the discussion is over...

The question is if the problems you have in mind only arise for the current design plans of future hardware not respecting F-buffers yet, or if there are really fundamental problems which cannot be solved in a reasonable way. Since I don't see principle problems, I suppose it is the first alternative.

You are right, the central question is "undefined vs defined shader complexity limits", which is a special case of the fundamental question if the API hides hardware implementation details.

Buf if there exists a solution how every shader of every complexity can be transparently split into multipass (for example using an F-buffer), this means that "undefined shader compexity" can be implemented in hardware.

And this means that the API can hide hardware implementation details, so that the developer does not has to develop different code pathes for every hardware, and then - in my opinion - it would be silly to not do it in that way.

folker
07-06-2002, 02:28 PM
Originally posted by Quaternion:
What do you mean? What connection that has with using a low level shading language as a standard? If you use a HLSL as a standard every IHV will have to compile your c-like code into his own operation set (e.g. you can't send a "for" loop directly to the card, it must be compiled into low-level operations). Compilation is not the job of the driver. If you use LLSL as a standard, the IHVs will only need to match their own opcode to every standard operation.

Am I missing your point?

Shlomi.

You assume that there is a matching between IHV opcodes and LLSL opcodes. The experiences from the different CPU architectures shows that this is very unlikely / would be very restrictive for the hardware.

For CPUs, it turned out the last decades that C/C++ is the best abstraction level to hide CPU hardware details. Why don't we learn from that and use a C-like language for GPUs?

folker
07-06-2002, 02:29 PM
Originally posted by Jurjen Katsman:
Folker: Your multipass solution is sortof cool, but it's unlikely current hardware is going to be able to do this. New hardware could be build to support this, but that's not the issue.

This is the core problem with OpenGL2.0, it's NOT going to work right on old hardware. I think it basically puts us back in the old situation when GL vs D3D wars where raging. And we're probably going to see hardware only implementing parts of the GL2.0 spec, and other hardware which drops back to software all the time, but does do all of 2.0.

Both are huge problems for the API. With hardware not implementing everything (and instead providing their own interfaces for say, fragment shading) it's basically still just a whole bunch of extentions.

If hardware drops back to software all the time, or more general, has extreme performance drops when only making slight changes to the application (say using a view extra variables in a shader), it's going to be practically useless for game development as well.

Some sort of standard way of exposing capabilities (not really hardware specifics) will be required, and this doesn't have to mean a list of things a card does in hardware, just a list of things it does at a reasonable speed, relative to the overall card speed.

Also, I personally feel no GL implementation should be required to implement things not in this list. If they can't do a particular good (accelerated) job at it, I'll just do it myself. This is what so far has happened in games anyway, until hardware started acceleration transform and lighting pretty much all games still did their own. The same has happened with operations that can't be done in the fixed function pipe, etc.

[This message has been edited by Jurjen Katsman (edited 07-06-2002).]

You say that if a problem only can be solved by future hardware excluding today's hardware, then we should not solve it at all?

Of course, it IS an issue if future hardware can support it. Because from this point on every shader using the ogl2 shader language (having basically no restrictions any more) will run at every hardware, only the performance will be different.

Note that today we have the worst possible situation that for basically every new hardware new extensions are defined. If we can solve this problem at least for future hardware we should do it.

ogl 1.4 only reflects the current hardware state, whereas ogl 2 addresses the future.


[This message has been edited by folker (edited 07-06-2002).]

jwatte
07-06-2002, 04:08 PM
OpenGL is pretty explicit about falling back to slower paths (software) when the faster path can't be fully done in hardware. That should speak for falling back to "whatever" when shaders are too complex, to stay in line with the original OpenGL approach.

Sometimes, this is really annoying, as you can't get guaranteed performance, and with modern hardware the difference in performance can be three orders of magnitude! (decimal, that is) Thus, it would make sense to REQUIRE a fallback path but provide QUERIES to figure out what will and will not be accelerated. Including, worst case, a "would a triangle list drawn with the current state be accelerated?" as there may be some limits on low-level ops which aren't easily accounted for at the high level. Like "your shader can't compile to more than 1 kB in our implementation, but we won't tell you exactly the rules for how big it will compile to beforehand".

If you want to change the OpenGL approach, then by all means do so. The question becomes: at what point does it stop being "OpenGL" and start being "GenericGraphicsAPI" ?

mcraighead
07-06-2002, 04:44 PM
The "Am I falling back?" query is a common feature request for OpenGL. It sounds a little useful, but it turns out to be a disastrous idea, for quite a number of reasons. That's a topic for another thread, although I've discussed it previously.

jwatte, I'm surprised to hear you, of all people, asking for that feature. Let me know (maybe via private email) if I need to brainwash you on that topic. http://www.opengl.org/discussion_boards/ubb/smile.gif

Yes, when a shader is "too complex" you should get a fallback. I wouldn't even think of disagreeing with that -- please don't misinterpret my distaste for automatic multipass to suggest that the HW limits should be brick walls where you hit them and then your program stops working for unexplainable reasons. Indeed, automatic multipass just happens to be one particular way to implement a driver! But I would also propose that there be explicit limits on loadable shader complexity, where once you pass that limit the shader is *guaranteed* not to load.

This is precisely what I mean by "defined" implementation limits. The trouble is that it is very difficult to define internal limits (e.g. # of registers consumed) in a HLL! It's very easy to put together defined limits when you have an assembly language.

The other real issue here is invariance. Invariance is a messy issue when it comes to fragment programmability; much more so than for vertex programmability.

- Matt

barthold
07-06-2002, 08:09 PM
Originally posted by mcraighead:

As for low-level languages, you should be aware that 3Dlabs has made it very clear that they disagree with the idea that a standardized vertex and fragment program assembly language should be made part of OpenGL. Although unrelated IP issues made the discussion moot, NVIDIA's position was that ARB_vertex_program should be a required 1.4 feature, and 3Dlabs's position was that it should be optional. So, putting things together, 3Dlabs wants to standardize a HLL as a core part of OGL, but keep a standard assembly language out of core OGL. So it would not at all be erroneous to suggest that "3Dlabs's OpenGL 2.0 proposals exclude low-level languages". Certainly, they don't at all interoperate with the ARB_vertex_program standard API for creating and binding programs.


I'll clarify this. ARB_vertex_program is a very limited language, poorly suited to support any high-level shading language. As we all know it is missing core functionality, like loops etc. Two to three years ago it would have been state of the art, and great to have. But hardware has evolved since then. It is a vec4 based language, which is not necessarily easy to implement on everyone's hardware, because it does not map to the hardware's own opcodes. This means that we'll have to write a compiler for it anyway! Defining a standard assembly language sounds like a good goal, but is not practical until every vendor's hardware more or less looks the same. Intel, HP, Motorola, MIPS, ARM etc have different assembly languages, for the same reason. Our OpenGL 2.0 proposal leaves the door wide open for support for low-level languages. Matt's claim in that respect is wrong. Every IHV is free to expose their (real) assembly language through some extension, and offer support for that to their ISVs.

Thus given these issues why should the ARB vote on making ARB_vertex_program part of the core OpenGL spec, to be supported by any and all hardware vendors until time stops ticking?

However, we do recognize that ARB_vertex_program has good use for ISVs, and therefore we fully support it being an ARB approved extension (hence its name). But making it part of core OpenGL is too much. We've also offered the suggestion to make it an optional subset to OpenGL 1.4, just like the imaging subset in 1.3.



Also, please, let's avoid the conspiracy theory angle on things. The idea that exposing assembly languages is a scheme to *prevent* progress in OpenGL is, well, ludicrous. Likewise, the idea that NVIDIA doesn't want OpenGL to advance [which has been suggested by numerous people] is likewise erroneous. I, for one, wrote up the original NVIDIA OpenGL 1.4 proposal.


Matt, answer these two questions please.

3Dlabs has been open from the beginning about our OpenGL 2.0 proposals, since about August last year. Back then we recognized that at the snails pace OpenGL was evolving the other 3D API would simply take over, and OpenGL would slowly die. In fact, back then that other API was already ahead in many areas (and still is!). Thus we started
writing white papers, and putting them out for the public to comment on. We've presented our ideas to the ARB at every meeting since then, and solicited, and gotten, excellent feedback from IHVs and a lot of ISVs, but not from nVidia. That feedback we've incorporated in the white papers, and the three specs now out there. We are not married to these, we are open to well argumented changes.

3Dlabs has created excitement and momentum behind OpenGL, something that was urgently needed.

Now my questions:

1) Why has nvidia not put their (considerable) resources behind helping develop and grow OpenGL 2.0?

2) Why instead did nVidia choose, in secret, to develop a competitor to the OpenGL 2.0 shading language in the form of Cg?

Consider where OpenGL 2.0 could have been now if nVidia has been pro-active in the ARB and actually put resources behind helping us write better white papers and ARB specifications. In my eyes nVidia has lost a lot of credibility in terms of being a good ARB and OpenGL citizen.

The first real technical feedback on our OpenGL 2.0 proposals I've seen from nVidia has been posted here in the last few days. Well, that is unfortunatel extremely late.

Barthold




[This message has been edited by barthold (edited 07-06-2002).]

folker
07-07-2002, 01:47 AM
Originally posted by mcraighead:
Indeed, automatic multipass just happens to be one particular way to implement a driver! But I would also propose that there be explicit limits on loadable shader complexity, where once you pass that limit the shader is *guaranteed* not to load.

This is precisely what I mean by "defined" implementation limits.


I don't understand your point of view completely: How you do define these implementation limits?

Are they choosen in such way that they can be executed in current hardware? Then this means that for future hardware they have to be extended, and we keep the current ugly path of defining new extensions / specifications for new hardware, and developers have to write different code pathes for different hardware.

Are they choosen in such way that they reflect all future hardware? If yes, then we need some transparent alternative technique for hardware which cannot do it in single-pass. And since I agree completely that software emulation is useless, it must be possible in hardware. And if we once have an possibility that for example the driver can split it automatically into multiple passes (perhaps there are also other ways), then there is no problem of defining these limits to infinity so that we don't have limits.


The trouble is that it is very difficult to define internal limits (e.g. # of registers consumed) in a HLL! It's very easy to put together defined limits when you have an assembly language.


Since I think that limits anyway are a bad idea, I don't see a problem there ;-)



The other real issue here is invariance. Invariance is a messy issue when it comes to fragment programmability; much more so than for vertex programmability.

- Matt

What kind of problems do you see regaring invariance?

folker
07-07-2002, 01:58 AM
Hi Barthold / 3dlabs, some suggestion for the OpenGL 2.0 specs:

When mentioning multi-pass in the specs, it may be a good idea to also mention f-buffers. This can prevent people from thinking "multipass is not possible for blending modes".

Some note: Using f-buffers there remains one issue: What happens if you read back old fb values? In this case a f-buffer alone does not work out since the next primitives may read old values instead of values of the previous primitive.

Worst case: A group of primitives rendered all at the same place, the fragment shader reading the old color value, performing some very complex operation which requires multipass, writes the result back. This result is read for the next primitive.

Possible solution: The framebuffer has a in-work-flag which is set during the first pass and reset at the last pass. If during rendering the fb is read back but a fragment with set in-work bit is reached, rendering the primitives is flushed, meaning that first the other pathes of these primitives are executed before this primitive of question is executed. This may be somehow tricky because if may happen during rendering a primitive, but it may be possible. In the worst case, in an separate special first pass the number of primitive are collected which can be rendered in parallel by only writing and reading this in-work bit.

jwatte
07-07-2002, 05:51 AM
mcraighead,

No brainwashing needed. I'm not sure I called for anything. The meat of my post was in the last question: The name "OpenGL" has borne with it some specific assumptions for a long time. How many of these assumptions can you throw away before it's not proper to call the end result "OpenGL" anymore? I think people have different opinions on this.

Me, I don't care. Just give me a capable non-scenegraph API that runs on more than one platform with wide consumer hardware vendor support, and I don't care what it's called.

Tom Nuydens
07-07-2002, 09:54 AM
Whoa, this turned into a big thread.

First of all, unlike what Dave Baldwin suggested early in this thread, the DX9 HLSL and Cg are not competitors. I was at NVIDIA's Gathering event last month, and Microsoft's Dave Bartolomeo was there to do a presentation on the DX9 HLSL. He assured everyone that MS is indeed working closely with NVIDIA to ensure that the two languages become completely compatible. Both language's specs are still in flux, but both companies seemed dedicated to making them converge.

Second, even if there were three competing high-level shading languages on the market (Cg, DX9, GL2): so what? Some people write applications in C/C++, some use Java, some use Delphi -- heck, some people even use VB! Why shouldn't we be able to pick an HLSL that suits our tastes?

When OpenGL 2.0 becomes truly widespread (and I'm with Nutty and others who believed that this may take a while), I will almost certainly prefer it's HLSL over other alternatives like Cg. But that doesn't mean that I want those other alternatives to dissappear. Most people here prefer GL to D3D, but do you really want D3D to dissappear?

Finally, the thread recently turned into a debate about GL2. I have to say that I'm slightly worried by Bartholds comments about NVIDIA's contributions to GL2 (or lack thereof). It would be nice to hear someone at NVIDIA assure the development community that they at least plan to fully support OpenGL 2.0 driver-wise, even if they haven't been active contributors to the development of the 2.0 specification.

-- Tom

mcraighead
07-07-2002, 12:53 PM
Originally posted by folker:
When mentioning multi-pass in the specs, it may be a good idea to also mention f-buffers. This can prevent people from thinking "multipass is not possible for blending modes".

As I've suggested, F-buffers are not themselves necessarily a good idea.

To be perfectly honest, an OpenGL specification should say nothing _at all_ about "multipass" being done by the driver. Multipass is a possible _implementation_ [one that I happen to think is undesirable and/or impractical]. It has nothing at all to do with the API question of how big shaders are allowed to get. And if your answer to that question is, "as big as the user wants", I think you're evading the question.

- Matt

folker
07-07-2002, 01:41 PM
Originally posted by mcraighead:
As I've suggested, F-buffers are not themselves necessarily a good idea.

To be perfectly honest, an OpenGL specification should say nothing _at all_ about "multipass" being done by the driver. Multipass is a possible _implementation_ [one that I happen to think is undesirable and/or impractical]. It has nothing at all to do with the API question of how big shaders are allowed to get.


Agreed completely.

It is an implementation detail of implementing unlimited shader sizes.

What I meant was that the spec may mention that for example in a footnote that using f-buffer is one possible implementation of unlimited shader sizes in hw. But not as part of the spec itself.

The idea is that such a note about f-buffer could help avoiding that people think that there is no reasonably possibility at all to implement unlimited shader sizes in hw.



And if your answer to that question is, "as big as the user wants", I think you're evading the question.
- Matt

I don't understand what you mean. Isn't "no size limit" a clear answer to the question "what is the size limit of a shader"?

BTW, unlimited shader size fits perfect into the ogl philosophy: ogl has unlimited display list size, unlimited number of commands between glBegin and glEnd, unlimited site of vertex arrays, etc. etc.

And since
a) unlimited shader sizes are possible in hardware in a reasonable way, and
b) unlimited shader sizes would make it possible that each shader runs on every gl2 hardware (instead of the todays ugly situation that for implementing the same functionality developers have to write multiple code pathes for different hardware),
isn't "unlimited shader size" the best answer for a future-oriented API like ogl2?

mcraighead
07-07-2002, 01:49 PM
Barthold,

As I've pointed out, I'm an engineer, not a manager. I don't make company policy decisions with respect to OpenGL, and I am not the person to express NVIDIA's official position on these issues. I am merely stating my own views, which in some cases coincide with NVIDIA's and in some cases may not.

As for my own personal views, I've disliked the proposals since day one, but I've had no forum in which to effectively express said views.

Indeed, I consider much of the talk about "OpenGL 2.0" to be *counterproductive* to advancing OpenGL today. I'd rather see the ARB spend its time on vertex and fragment program specs, not on "let's revamp the way object IDs work 2 years from now". The ARB should focus on solving *real developer needs*.

I think when you look at things this way, it makes a whole lot more sense. For example, what in the world was the ARB doing talking about some far-off "2.0" when it didn't even have any standard vertex programmability in the API? One could hardly come up with a more stereotypical example of putting the cart before the horse.


As for your questions, I think their wording is very tricky -- more sophisticated forms of "When did you stop beating your wife?" I'm rephrasing them a little bit, with explanation.

1. Why has nvidia not put their (considerable) resources behind helping develop and grow OpenGL?

[I've changed "OpenGL 2.0" to "OpenGL", since there is as of now no such thing as OpenGL 2.0; the term merely reflects the future of OpenGL, and that is already implied by the words "develop and grow".]

My answer would be: we _have_. We just don't necessarily agree on the direction. But surely it is beyond question that, without NVIDIA's efforts for OpenGL in the last 5 years, OpenGL would have died in the PC space long ago.

2. Why instead did nVidia choose, in secret, to develop a shading language?

[A bigger rewrite, but again, there is no "OpenGL 2.0 shading language", just a "3Dlabs shading language"; and the question imputed particular motives where those motives may or may not have existed.]

As I've suggested before, Cg is a _tool_, and it is a tool that we decided was needed _now_, not some time off in the future.

A related point is that standards bodies are not always the first place it makes sense to go when developing new features. Standards bodies do things by committee and generally don't design entirely _new_ things very well. OpenGL had IrisGL. Subsequent OpenGL versions mostly stole things from vendor and EXT extensions. ANSI C had K&R and AT&T C. Even the incremental C++ was based on years and years of experimentation and could not have been done by a committee sitting down and deciding how objects should be added to C.

If every time NVIDIA wanted to put out an OpenGL extension, we first went to the ARB to discuss our idea, nothing would ever happen. [To follow up on my previous cart-horse analogy, I'll put forth another oversimplified analogy: action beats talk 10 times out of 10.]

In addition, the whole shading language situation has a lot to do with Microsoft, so it gets even more complicated.

Note that Cg is intended to work for both OpenGL and D3D. I consider this one of its biggest and most important features. One important question the ARB will face is: is the ARB willing to drop the OGL vs. D3D politics for just one second when talking about shading languages? There is no reason that a shading language must be tied to OGL. Indeed, tying it in this way would do developers a disservice, sending the message that the ARB would rather continue to play these politics than solve a real developer problem, i.e., letting developers write a shader that can work on either API.

Dare I say it -- the ARB might not even be the right standards body to approve a shading language, for the simple reason that the ARB _is_ OpenGL-centric. Who knows, maybe what we need eventually is an "ANSI Cg", where Cg and the MS language are analogous to early AT&T C dialects.

Well, that's enough subversive thoughts for one message,

- Matt

mcraighead
07-07-2002, 02:33 PM
Originally posted by folker:
I don't understand what you mean. Isn't "no size limit" a clear answer to the question "what is the size limit of a shader"?

It's an incomplete answer that illustrates an incomplete understanding of the question.

Computers are finite. Therefore, some limit exists, even if it's virtual memory.

You give the analogy of display lists and primitive sizes. The latter doesn't need to consume memory in the first place, so it is not a good analogy. Display lists are a better analogy.

The usual display list model (unstated) is: "when you run out of system memory, you get a GL_OUT_OF_MEMORY" -- at which point, as far as the OpenGL spec is concerned, the universe is about to explode.

But in practice, it's actually undefined when GL_OUT_OF_MEMORY will occur. If an implementation decided that as soon as it hit 1M vertices in a display list, it would always give GL_OUT_OF_MEMORY, you couldn't really cite that as nonconformant behavior, especially if, say, display lists always sat in some special memory and the memory could only store 1M vertices.

Now reduce that 1M number. What stops an implementation from blowing up at 100K? 10K? 1K? 100? 10? 1? Well, at some point, the conform test itself will fail, and the driver then ceases to be an "OpenGL (tm)" driver. But the real point is that having an undefined limit lets people pick an arbitrarily *low* limit -- quite the opposite of what you suggest, where you say that there is no limit.

Now, a memory that only stores 1K vertices is a really stupid idea. But what about hardware that has [again, picking an arbitrary number] 1KB of space for temporary variables while running a shader? In fact, that would actually be a pretty _big_ memory for temporaries. Let's suppose the implementation gave a GL_OUT_OF_MEMORY error when your shader required 1.1KB of memory to run. That's perfectly admissible behavior under the undefined-limits paradigm.

The difference with display lists is that it is so easy to write a display list compiler that just puts all the commands in system memory, and then you walk down the list. Indeed, this is the standard way to implement unoptimized display lists. So these kinds of limitations _don't_ tend to exist for display lists.

But for shaders, the natural implementation is to compile down to microcode (or something similar to microcode, such as NV_vertex_program assembly), and then either run the microcode in HW or interpret it in SW. It's quite a pain to have yet another code path that compiles the shader directly into, say, x86 assembly. In practice, if limits are undefined, and HW limits are "high enough" (how high is that???), IHVs may just skip this effort and use their GL_OUT_OF_MEMORY wild-card.

Now you've merely ended up with an uglier version of defined limits.
- The limits are still there; but now you have no idea what they are.
- For all practical purposes, you can get no assurances on whether any given shader will load or fail to load.
- Since many compiler problems are NP-complete and require heuristics, it is not in general possible to guarantee that a newer driver will load a superset of the programs an older driver loaded.
- Since drivers are not expected to have full GL_OUT_OF_MEMORY recovery paths, the GL may be in a wacky or otherwise undefined state whenever this happens.

The key difference between shaders and display lists is that display lists are a highly unconstrained problem domain. Shaders have a few key metrics that define their size/complexity, but display lists are just a random jumble of GL commands.

I would suggest that having these "really high" limits is not a problem that needs solving, or certainly not immediate solving. What good does it do you to be able to run a shader that will run at 100 pixels/second, entirely in software? The people who want those ultra-huge shaders may even be better off without OpenGL. Instead, defined limits gives you good assurances about shaders loading or failing to load, and if the limits _do_ get "high enough", then IHVs can focus on the hardware/software engineering problem (making real shaders of realistic sizes go fast) rather than the CS problem (creating Turing-complete shader languages with arbitrary complexity).

Also:


Originally posted by folker:
a) unlimited shader sizes are possible in hardware in a reasonable way, and

I disagree, I think even if it's theoretically possible there are much better uses of transistors and HW design time (see previous comment about engineering vs. CS problems); and


Originally posted by folker:
b) unlimited shader sizes would make it possible that each shader runs on every gl2 hardware (instead of the todays ugly situation that for implementing the same functionality developers have to write multiple code pathes for different hardware),

I again disagree, I think you'll have the same problems with legacy hardware you have today.

- Matt

Dave Baldwin
07-08-2002, 12:36 AM
Originally posted by mcraighead (all the quotes are):
I think the shading language should sit outside the API, not inside it.


But it is OK for another language, albeit as a lower level, to sit inside?



I think transparent multipass should be provided by scene graphs and shader compilers, not the API.


All we are saying is that the user provides a shader and the driver/hardware runs it and takes responsibility if some fixed resources are exceeded. There are many ways of implementing this: multi pass, punting back to software on the CPU (and yes this can be done in an invariant way), F buffers, etc..



I think the API needs to expose a low-level assembly language for both vertices and fragments, so that people can use [or even develop] the shading language of their choice. It may also make sense to standardize a high-level language, but not to the exclusion of low-level languages.


No one it trying to exclude assembler level languages, but trying to standardise on something that is a snapshot of a design point and make this a to-be-supported-forever API feature when it is obviously missing common programming features has to be questionable.
As far as allowing other shading languages to be developed on top of what ever is in OGL, then the high level language can be used for this as well and has the advantage of providing a firmer foundation to build on.



I think the proposed GL2 APIs are clumsy. First I have to create a bunch of objects, then "compile" my shaders, then attach them to another object, then "link" them, and only then I can use them? I don't see what this gains over GenProgramsARB, ProgramStringARB, BindProgramARB except confusion and more API calls. The analogy to how a C compiler works seems rather stilted.


I think the analogy of the C compiler works very well. If you look back on the first version of the papers you will see that we had a model much more in line with what is being proposed for vertex programs. We changes into the compile-link model after hearing dozens of ISV saying this is what they wanted. We were encouraged by Kurt Akeley to think about programming-in-the-large as to make it future proof.



In other areas, I think the proposals do not interface in very compatible ways with the rest of OpenGL. For example, they reinvent the wheel on generic attributes for vertex programs, when the ARB_vertex_program interface for generic attributes is already quite sufficient, and when the ARB already discussed that topic for an extended period of time in the ARB_vertex_program working group.


When we started looking at OGL2 (or what it later became known as, with the ARB's blessing) there was no ARB vertex programming interface - nvidia had played its IP card and refused the ARB any opportunity of changing its spec - we could have copied it and left ourselves open to IP claims, but did the wheel fit?

There are four interfaces to consider:
app -> vertex shader, low frequency state data
app -> vertex shader, high frequency data (i.e. attributes)
app -> fragment shader, low frequency state
vertex shader -> fragment shader high frequency data

With an assembler everything has fixed types and named so all these interfaces become trivial. A high level language allows the user to choose their own names and types as appropriate for the job in hand. Forcing the user to use a fixed set of type/names for these interface variables is a backwards step and adopting the ARB vertex programming interface would force this.

Dave.
3Dlabs


[This message has been edited by Dave Baldwin (edited 07-08-2002).]

folker
07-08-2002, 12:52 AM
Originally posted by mcraighead:
It's an incomplete answer that illustrates an incomplete understanding of the question.

Computers are finite. Therefore, some limit exists, even if it's virtual memory.

You give the analogy of display lists and primitive sizes. The latter doesn't need to consume memory in the first place, so it is not a good analogy. Display lists are a better analogy.

The usual display list model (unstated) is: "when you run out of system memory, you get a GL_OUT_OF_MEMORY" -- at which point, as far as the OpenGL spec is concerned, the universe is about to explode.

But in practice, it's actually undefined when GL_OUT_OF_MEMORY will occur. If an implementation decided that as soon as it hit 1M vertices in a display list, it would always give GL_OUT_OF_MEMORY, you couldn't really cite that as nonconformant behavior, especially if, say, display lists always sat in some special memory and the memory could only store 1M vertices.

Now reduce that 1M number. What stops an implementation from blowing up at 100K? 10K? 1K? 100? 10? 1? Well, at some point, the conform test itself will fail, and the driver then ceases to be an "OpenGL (tm)" driver. But the real point is that having an undefined limit lets people pick an arbitrarily *low* limit -- quite the opposite of what you suggest, where you say that there is no limit.

Now, a memory that only stores 1K vertices is a really stupid idea. But what about hardware that has [again, picking an arbitrary number] 1KB of space for temporary variables while running a shader? In fact, that would actually be a pretty _big_ memory for temporaries. Let's suppose the implementation gave a GL_OUT_OF_MEMORY error when your shader required 1.1KB of memory to run. That's perfectly admissible behavior under the undefined-limits paradigm.

The difference with display lists is that it is so easy to write a display list compiler that just puts all the commands in system memory, and then you walk down the list. Indeed, this is the standard way to implement unoptimized display lists. So these kinds of limitations _don't_ tend to exist for display lists.

But for shaders, the natural implementation is to compile down to microcode (or something similar to microcode, such as NV_vertex_program assembly), and then either run the microcode in HW or interpret it in SW. It's quite a pain to have yet another code path that compiles the shader directly into, say, x86 assembly. In practice, if limits are undefined, and HW limits are "high enough" (how high is that???), IHVs may just skip this effort and use their GL_OUT_OF_MEMORY wild-card.

Now you've merely ended up with an uglier version of defined limits.
- The limits are still there; but now you have no idea what they are.
- For all practical purposes, you can get no assurances on whether any given shader will load or fail to load.
- Since many compiler problems are NP-complete and require heuristics, it is not in general possible to guarantee that a newer driver will load a superset of the programs an older driver loaded.
- Since drivers are not expected to have full GL_OUT_OF_MEMORY recovery paths, the GL may be in a wacky or otherwise undefined state whenever this happens.

The key difference between shaders and display lists is that display lists are a highly unconstrained problem domain. Shaders have a few key metrics that define their size/complexity, but display lists are just a random jumble of GL commands.

I would suggest that having these "really high" limits is not a problem that needs solving, or certainly not immediate solving. What good does it do you to be able to run a shader that will run at 100 pixels/second, entirely in software? The people who want those ultra-huge shaders may even be better off without OpenGL. Instead, defined limits gives you good assurances about shaders loading or failing to load, and if the limits _do_ get "high enough", then IHVs can focus on the hardware/software engineering problem (making real shaders of realistic sizes go fast) rather than the CS problem (creating Turing-complete shader languages with arbitrary complexity).


Of course, the aim of infinite shader limits is NOT to run 100,000 line fragment shader programs requiring one hour per frame. The aim is that for example a typical real-world shader can run with four-pass on hardware A and with two-pass on newer hardware B (about factor two faster), without forcing the developer to write different code passes for hardware A and B only to implement the same functionality.

Obviously it is a BIG difference between
a) limits defined by current(!!!) hardware (e.g. 6 TMUs, 8 fragment vec4 commands etc.) and
b) theoretical(!!!) limits indirectly defined by main memory etc (e.g. theoretically thousands of TMUs, theoretically hundred-thousands of fragment commands etc. which I call "infinite limits")

In the first case a), for every new hardware you have to extend these limits, the developer has to write different code pathes for every hardware. In the second case b) the same shader runs on all hardware, only at different speed.

Note that "infinite limits" (limited only theoretically by main memory etc.) is standard everywhere in software and hardware industry. For example, the number of local variables of C++ programs is limited in practise by main memory. Conclusion: ANSI-C++ should limit the maximum number of local variables per function to 16 variables. This can be guaranteed to be executed on every hardware. Do you agree with this? http://www.opengl.org/discussion_boards/ubb/wink.gif

It seems that the true reason for arguing against "infinite shader size limits" is that you don't want to implement transparent multipass (or alternative solutions), because you don't agree with the aim that every shader should run on every hardware.


I again disagree, I think you'll have the same problems with legacy hardware you have today.

In other words, you say: Because we cannot solve an ugly problem (needing different code pathes for every hardware) for today's hardware, we should give up and also don't solve this problem for future hardware?

In my opinion, it is a main job of a driver API like OpenGL to hide hardware implementation details, so that each shader runs on every hardware (instead of the todays ugly situation that for implementing the same functionality developers have to write multiple code pathes for different hardware).

Some real-world samples:
The idea of a standard shader language is that you can exchange shaders in the developer community. For example, to build up shader exchange sites in the internet etc. etc. etc. Your vision of hardware-dependent shader limits means that every shader has its limits:
"I tried your new cool bump-xyz-shader, but it does not run on my hardware, because I only have 6 instead of 8 TMUs. I try to re-write your code into multi-pass, give me some weeks..."
"All our artists don't care about realtime speed, but they still cannot preview most of our shaders within 3d max because they only have a xyz card..."
"Our game has 10 effects running at four detail variants ("low quality" ... "high quality"), so we have to write 40 shader variants which is unavoidable. But it also has to run on 4 main-stream hardware, so we have to manually write 40 * 4 = 160 shader implementations."

Do you really want to keep the current path that developers have to write separate code pathes for every hardware? Is this your vision of the future of OpenGL? (Welcome back to Glide and co... http://www.opengl.org/discussion_boards/ubb/wink.gif

Of course, it is your freedom to say that "I don't want to spend any development time or any transistor" for the aim that every shader runs on every hardware. But in my opinion this destroys the spitit of OpenGL.

(Sorry for the clear statements, but I prefer to clearly make points if I think they are important. I never mean it personally, I consider it only as open and fruitful technical discussion! http://www.opengl.org/discussion_boards/ubb/smile.gif

EG
07-08-2002, 01:13 AM
I would like to bring some 'OpenGL user' point of view in this debate http://www.opengl.org/discussion_boards/ubb/wink.gif

On the issue of incremental improvements vs. rework: we need both. Drivers already have to support DX8-style programs, so why not just put misplaced pride aside, workaround whatever IP issues there are and get things ASAP into OpenGL? This IMHO would help reduce some of the code-paths mess, keep the momentum and ship afloat.

*But* today's programmability features are very hardware-centric, many considerations will have gone the way of the Dodo in a few years.

The OpenGL rework is necessary, and IMO shouldn't be made with today's constraints in mind, but rather with what could be useful, what could be abstracted. In that light, 3DLabs proposals make perfect sense. Of course, as during the early OpenGL days, you shouldn't expect it to run smoothly on consumer hardware (if it runs in hardware at all) before years.

Ad interim, OpenGL should still be updated, and intermediate or external level langages/APIs ala Cg or early "OpenGL2 preview extensions" could bridge the gap, getting a fraction of tomorrow's standard on today's hardware.

I don't see much success for Cg (in the OpenGL world) unless some ARB vertex/pixel-programs emerges and are widely supported quickly (ie. they must be kept reasonnably simple).
Also, in its current state, Cg-Runtime is not a driver-thing, meaning that even if others were to support Cg wholeheartedly, we (application developpers) would have to take care of the hardware-specific Cg distributions... and redistributions whenever fixes are released or CgRuntime-driver relationship changes... Not good.

The 3Dlabs proposal are rich, but open new cans'o worms, while the nVidia proposals bring stuff to the masses... not contradictory, couldn't you guy find some common ground?

ash
07-08-2002, 01:49 AM
Originally posted by mcraighead:
As for low-level languages, you should be aware that 3Dlabs has made it very clear that they disagree with the idea that a standardized vertex and fragment program assembly language should be made part of OpenGL. Although unrelated IP issues made the discussion moot, NVIDIA's position was that ARB_vertex_program should be a required 1.4 feature, and 3Dlabs's position was that it should be optional. So, putting things together, 3Dlabs wants to standardize a HLL as a core part of OGL, but keep a standard assembly language out of core OGL. So it would not at all be erroneous to suggest that "3Dlabs's OpenGL 2.0 proposals exclude low-level languages". Certainly, they don't at all interoperate with the ARB_vertex_program standard API for creating and binding programs.


I expect 3Dlabs's position could best be stated as tolerating the optional support of hardware-specific assemblers by hardware vendors for their specific cards by means of non-core extensions on a per-card basis. There is good reason for keeping assembly languages out of the core: they are hardware specific by their nature and attempts to standardize them are misguided at best, or inspired by vested commercial interests at worst.

Because hardware differs, "standardizing" an assembly language really means forcing all IHVs (except possibly one happy IHV whose assembler happens to match the API assembler) to still include a full-blown low-level compiler in their driver simply to translate, and optimize, from the API "assembler" (which is no longer an assembler in any useful sense but simply a bad way of expressing a high-level language) to the native one. So you have two compilers in series, one in the shader compiler and the other in the driver, with both required and neither able to share its work with the other.

I can't really believe that you don't see these problems, Matt?


[This message has been edited by ash (edited 07-08-2002).]

Eric
07-08-2002, 02:10 AM
Originally posted by EG:
I don't see much success for Cg (in the OpenGL world) unless some ARB vertex/pixel-programs emerges and are widely supported quickly (ie. they must be kept reasonnably simple).

It is my understanding that OpenGL 1.4 should feature a standardized vertex program interface. AFAIK, it won't have a standard pixel shader interface (that's why we have threads like this http://www.opengl.org/discussion_boards/ubb/wink.gif).

Anyway, I haven't seen much information about GL1.4 lately...

Regards.

Eric

EG
07-08-2002, 02:40 AM
> Anyway, I haven't seen much information about GL1.4 lately...

Yep, ARB meeting was scheduled for mid-june. Since that, now news, and I just heard some rumours about an IP issue raised by M$ about ARB_vertex_program (binding logic?)... Just rumours I hope.

folker
07-08-2002, 02:52 AM
Originally posted by EG:
> Anyway, I haven't seen much information about GL1.4 lately...

Yep, ARB meeting was scheduled for mid-june. Since that, now news, and I just heard some rumours about an IP issue raised by M$ about ARB_vertex_program (binding logic?)... Just rumours I hope.
http://www.extremetech.com/article2/0,3973,183940,00.asp mentions this IP issue.

Julien Cayzac
07-08-2002, 03:33 AM
Originally posted by Eric:

Anyway, I haven't seen much information about GL1.4 lately...


Yes, it's been 3 weeks since last ARB meeting, and still no meeting notes available :p

<troll>Hopefully MicroSoft will make D3D9 crossplatform, so we won't care anymore about all those nasty IHV wars at ARB</troll>

Julien.

cass
07-08-2002, 06:01 AM
You know, ISVs are welcome to join the ARB mailing list once they sign the "participants undertaking". You get access to all this information as it's happening.

I encourage you to be an active participant.

Cass

Eric
07-08-2002, 06:23 AM
Originally posted by cass:

You know, ISVs are welcome to join the ARB mailing list once they sign the "participants undertaking". You get access to all this information as it's happening.

I encourage you to be an active participant.

Cass

Cass,

That sounds interesting but I couldn't find any information about the ARB mailing list on this site. I only found the Game Dev OpenGL mailing list...

Do you know where we can find this information? Also, are you sure the ARB is happy for minor ISVs to be included in the mailing list? (to be honest, I don't really offer anything interesting for them... the main benefit would be for me to have more information !).

Regards.

Eric

cass
07-08-2002, 06:37 AM
Eric,

Yes, I believe the ARB *wants* and *needs* more ISV participation - even if it only comes in the form of contributing to the participants mailing list discussions.

You can find the participants undertaking form at: http://www.opengl.org/developers/about/arb.html

The actual PDF is: http://www.opengl.org/developers/about/arb/legal/participant_v3.pdf

And you should contact arb-secretary@sgi.com to find out where to fax or mail it.

Thanks -
Cass

ash
07-08-2002, 06:58 AM
Originally posted by folker:
I don't understand what you mean. Isn't "no size limit" a clear answer to the question "what is the size limit of a shader"?



originally posted by mcraighead:
It's an incomplete answer that illustrates an incomplete understanding of the question.

(snip)

Now, a memory that only stores 1K vertices is a really stupid idea. But what about hardware that has [again, picking an arbitrary number] 1KB of space for temporary variables while running a shader? In fact, that would actually be a pretty _big_ memory for temporaries. Let's suppose the implementation gave a GL_OUT_OF_MEMORY error when your shader required 1.1KB of memory to run. That's perfectly admissible behavior under the undefined-limits paradigm.


The shader is too big for the card's on-chip scratch memory? Then the driver multipasses it (or whatever), to make it run. End of story, unless you know a reason why this is impossible, in an interesting case?

Or do you mean that the implementation eventually runs out of host resources? That's not an interesting case, to me, since practical shaders will be smaller.

Or is your point simply that having no fixed limits allows a hypothetical implementation to only support small programs before bailing out? Firstly this is at odds with the "any shader runs" ethos, and secondly while not explicitly forbidden by the "no fixed limits" idea, would in practice be discouraged by the usual market pressures which have sufficed in other similar situations. So no real issue.

Not sure I understand your post completely though..

Ash


[This message has been edited by ash (edited 07-08-2002).]

[This message has been edited by ash (edited 07-08-2002).]

secnuop
07-08-2002, 07:06 AM
Because hardware differs, "standardizing" an assembly language really means forcing all IHVs (except possibly one happy IHV whose assembler happens to match the API assembler) to include a full-blown low-level compiler in their driver simply to translate, and optimize, from the API "assembler" (which is no longer an assembler in any useful sense but simply a bad way of expressing a high-level language) to the native one. So you have two compilers in series, one in the shader compiler and the other in the driver, with both required and neither able to share its work with the other.

This is a point that I think has been mostly missed (or at least glossed over) in all of the high-level language vs. low-level language discussions I've read. The current assembler-like methods of specifying vertex programs are not low-level. True, an assembler-like vertex / fragment program looks low-level and has many of the disadvantages of programming in a low-level language. Yet as ash points out it does not give an ISV low-level control since most drivers will have to write a low-level compiler (it's no longer an assembler since there is no direct mapping between assembler instructions and hardware opcodes) to translate an assembler-like vertex / fragment program into the native hardware language.

I guess the bottom line is that I definitely agree that a widely adopted (regardless of whether it is part of the core OpenGL standard or not) assembler-like language is a good thing to have right now, but I haven't been convinced that an assembler-like language is a better long-term solution than a high-level language.

ash
07-08-2002, 08:18 AM
Originally posted by secnuop:
This is a point that I think has been mostly missed (or at least glossed over) in all of the high-level language vs. low-level language discussions I've read. The current assembler-like methods of specifying vertex programs are not low-level. True, an assembler-like vertex / fragment program looks low-level and has many of the disadvantages of programming in a low-level language. Yet as ash points out it does not give an ISV low-level control since most drivers will have to write a low-level compiler (it's no longer an assembler since there is no direct mapping between assembler instructions and hardware opcodes) to translate an assembler-like vertex / fragment program into the native hardware language.


Yes. I guess maybe we've become used to thinking of PC graphics as a one-card game (GeForce variants). Having an assembly language in the API makes perfect sense if you only have one hardware architecture to support (and perhaps this is still nvidia's mindset) but not much sense at all if there are three or four (or more) different competing architectures.

Ash

barthold
07-08-2002, 10:27 AM
Originally posted by cass:
Eric,

Yes, I believe the ARB *wants* and *needs* more ISV participation - even if it only comes in the form of contributing to the participants mailing list discussions.

Thanks -
Cass

I fully second that. ISVs, sign the ARB participants agreement, and get involved in the ARB. Signing the agreement gives you the right to attend ARB meetings in person, and access to the ARB participants mailing list, where pretty much all interesting discussions happen. The meeting notes get posted there as well.

Barthold

barthold
07-08-2002, 10:53 AM
Originally posted by mcraighead:

As for my own personal views, I've disliked the proposals since day one, but I've had no forum in which to effectively express said views.


There is the ARB participants list, which is a perfect place for discussion, and ogl2@3dlabs.com, as well as private email.



Indeed, I consider much of the talk about "OpenGL 2.0" to be *counterproductive* to advancing OpenGL today. I'd rather see the ARB spend its time on vertex and fragment program specs, not on "let's revamp the way object IDs work 2 years from now". The ARB should focus on solving *real developer needs*.


The OGL2 proposals solve real developer's needs. We didn't just come up with all these ideas just because 'they are cool'.

Object IDs are completely beside the point. That is not what OpenGL2 is about, but keep coming up (even Kurt mentioned them a few times). If we can't agree on the important issues, I do not want to start a discussion about object IDs.

The ARB does progress OpenGL today. ARB_vertex_program, and now the new ATI proposed ARB_fragment_program. are two good examples. HOWEVER, in parallel we need to work towards OpenGL 2.0, to keep OpenGL competitive. Don't you agree?



I think when you look at things this way, it makes a whole lot more sense. For example, what in the world was the ARB doing talking about some far-off "2.0" when it didn't even have any standard vertex programmability in the API? One could hardly come up with a more stereotypical example of putting the cart before the horse.


ARB_vertex_program could have been a standard a lot sooner if nVidia hadn't tried to play the 'we have IP issues on this cool new API called NV_vertex_program' card. When was that? Back in September 2000 I believe. That effectively made the ARB not look at vertex programming for almost a year.



Note that Cg is intended to work for both OpenGL and D3D. I consider this one of its biggest and most important features. One important question the ARB will face is: is the ARB willing to drop the OGL vs. D3D politics for just one second when talking about shading languages? There is no reason that a shading language must be tied to OGL. Indeed, tying it in this way would do developers a disservice, sending the message that the ARB would rather continue to play these politics than solve a real developer problem, i.e., letting developers write a shader that can work on either API.

Dare I say it -- the ARB might not even be the right standards body to approve a shading language, for the simple reason that the ARB _is_ OpenGL-centric. Who knows, maybe what we need eventually is an "ANSI Cg", where Cg and the MS language are analogous to early AT&T C dialects.


By not offering Cg to the ARB, support for Cg from other hardware vendors is going to be tough to get. I have nothing against changing the OGL2 shading language (or whatever else language) to make it match Microsoft's HLSL. But compatibility between DX and OGL does not mean that applications have to write and maintain one code base. Writing a shader is only a tiny tiny portion of a whole application. That these shaders are one and the same for a DX or OGL platform is not important. Until both the DX and OGL APIs are truly one and the same, an application's code will be different for both targets. Making a 100 line shader match up between the two targets is irrelevant.

However, it sounds really good to say that one and the same shader will compile and run on both DX and OGL platforms. It is a good marketing message.

Barthold

Pedro
07-08-2002, 11:00 AM
Originally posted by secnuop:
This is a point that I think has been mostly missed (or at least glossed over) in all of the high-level language vs. low-level language discussions I've read. The current assembler-like methods of specifying vertex programs are not low-level. True, an assembler-like vertex / fragment program looks low-level and has many of the disadvantages of programming in a low-level language. Yet as ash points out it does not give an ISV low-level control since most drivers will have to write a low-level compiler (it's no longer an assembler since there is no direct mapping between assembler instructions and hardware opcodes) to translate an assembler-like vertex / fragment program into the native hardware language.


Originally posted by ash:
Yes. I guess maybe we've become used to thinking of PC graphics as a one-card game (GeForce variants). Having an assembly language in the API makes perfect sense if you only have one hardware architecture to support (and perhaps this is still nvidia's mindset) but not much sense at all if there are three or four (or more) different competing architectures.

Exactly! And having a standard high level shading language built directly into OpenGL, instead of sitting on top of a low level compiled one, will benefit the industry as a whole:

IHVs will be able to *innovate* and come up with virtually any kind of HW architecture, that can immediately be put to use on both existing and new software.

ISVs will spend less time writing specific code paths and will begin to see some real competition among IHVs (i.e., competition based on better performance, and not on proprietary gimmicks).

End users will simply have a broader range of HW solutions to chose from, to run all their software titles.

IMHO, even if there was only one IHV, having a standard low level shading language is a *VERY* BAD IDEA. What happens in one/two years down the road when the new generations of HW no longer look nothing like the assembly shading language? Well that's simple: we come up with ARB_vertex_program_2 and ARB_pixel_shader_2. But now we will have to do this every year or so, and we will, still, have to support the older versions (IHVs and ISVs alike)! This works in DX because Microsoft comes up with a *new* (and different) API every year. OpenGL was not designed to expose completely different interfaces the same way as DX, the ARB cannot move that fast, and even if it could, this would not be feasible for *all* markets that OpenGL addresses.

Again, IMHO, a high level shading language built directly into OpenGL (as opposed to an assembly like shading language) is the only sane direction for OpenGL.

Of course, there is much more to it than OpenGL's shading language, ... but, I won't go into that http://www.opengl.org/discussion_boards/ubb/smile.gif.

Pedro
07-08-2002, 11:32 AM
Originally posted by barthold:
I have nothing against changing the OGL2 shading language (or whatever else language) to make it match Microsoft's HLSL. But compatibility between DX and OGL does not mean that applications have to write and maintain one code base. Writing a shader is only a tiny tiny portion of a whole application. That these shaders are one and the same for a DX or OGL platform is not important. Until both the DX and OGL APIs are truly one and the same, an application's code will be different for both targets. Making a 100 line shader match up between the two targets is irrelevant.

Even so, this is a feature I would like to see in the future OpenGL's *standard* shading language. But, unfortunately it requires Microsoft’s cooperation, so I know it'll never happen http://www.opengl.org/discussion_boards/ubb/frown.gif.

In some situations this would be desirable. Imagine a tool, written as two plugins for Maya and 3DSMAX, where I could draw a mesh, select, in a window, which per-vertex components to use and write a shader for that mesh in another window. Since 3DS only supports DX and Maya only supports OGL, it would be nice to have a common shading language. Of course, it would be simpler to write a complier for OpenGL's shading language that outputs DX HLSL or low level vert and pixel shaders than to wait around for all IHVs to support Cg http://www.opengl.org/discussion_boards/ubb/smile.gif.

[OK, I know that DCC apps are used by designers that don't write shaders. So, all that is needed is to parse the shader and to generate a nice dialog box with lots of sliders. In this case, one shading language would also mean one parser. Anyone care to start an open source project http://www.opengl.org/discussion_boards/ubb/smile.gif ]


[This message has been edited by Pedro (edited 07-08-2002).]

Korval
07-08-2002, 11:39 AM
Once again, we seem to be dancing around the issue at hand.

If there is to be a low-level assembly-esque language adopted as a standard, then it is much easier to specify hardware-imposed limits. It is easier to express limits like the number of constants used, register counts, etc when you're looking at something assembly-esque.

However, if you don't like the idea of seeing the limits, then you will likely prefer the higher-level shading language. Naturally, it hides all of the supposedly unimportant details of any particular implementation, and makes them all look the same (regardless of performance costs).

Basically, there are just two points of view: If you believe that OpenGL should expose the limits of the hardware in a standard way, an assembly-esque language is the best type of shading language. If you believe that OpenGL should impose the significant overhead of allowing for higher-than-hardware limits to shader resources, then you're going to support a C-style interface language.


Here's something that nobody's even come close to discussing that is also important to this topic: compilation time. An assembly-esque compiler is very simple. For each standardized opcode, it will have a table entry for compiling that opcode into some amount of microcode or state setting. This is a fairly trivial operation. For a C-style language, you need a highly complex compiler that will take a non-insignificant ammount of time to perform this task.

In a high-performance situation (ie, a game), one may want to compile shaders on the fly. I understand that the main reason to do shader compiling nowadays is the lack of looping, but even in the future, one may want to dynamically add/delete various effects out of a shader. One could do this by having constants in the shader that add/delete the various effects, but that impacts shader processing time. Instead, to create high-performance shaders, one will want to compile shaders dynamically.

C-style shaders will take much longer to compile than assembly-style shaders. Therefore, they are an impediment to high-performance rendering.

Another detriment to C-style shaders (with their undefined limits) is that it allows the driver to impose arbituary overhead on the user. The user should have an idea about exactly what is going on, because that information, for high-performance applications, is vitally important. It is important to know when the driver decides to multipass, and by how much (so that you can decide to do something else).

Having a standardized shader is different from having a standard shader with undefined limits. Having a standardized shader will prevent the platform-specific (by platform-specific, I mean GeForce 3/4 vs Radeon 8500) problems. Having limits will let developers of high-performance applications know about the hardware-specific issues (the difference between a GeForce 3 and a Radeon 7500).

This is important because developers of high-performance applications need to know about the hardware they are using. Remember, they are going to be making hardware-specific shaders anyway for lower-end hardware. They don't want to fall to the 5-pass F-Buffer method that runs in software; they'd rather use a different shader (and no mere compiler can gracefully degrade a shader). By not allowing them to know whether or not a shader will fit in one pass (because they no longer have control over such things), they cannot properly optimize their applications for the low-end solution. This kills OpenGL 2.0 as a high-performance API.

As to the F-buffer method for multipass, it may function, but it is hardly fast. Not only that, it requires additional resources, and in high-performance applications, those resources (extra memory) may make things run significantly slower.

You can hide many things in the driver layer. You can even justify it by saying that they are "implementation-specific details that are unimportant to the end-user". But, that justification is flawed. The high-performance user very much wants to know about these "implementation-specific details".

So far, OpenGL 2.0 looks like something that is fine for hobbiests, researchers, or even high-end graphics artists (movie product). It lets them do all the demos, effects research, and other work they want. However, it is not useful for high-performance graphics work. Not being able to know about things that directly affect performance is unforgivable and unacceptable. We need to know exactly how a shader is going to be implemented.

Oh, and for that analogy of people not using assembly in favor of C/C++, understand this: us game developers still, evey now and then, go in and hand-code assembly. And we do it because we have to, in order to get the performance we want.


BTW, as for the notion that CPU's and GPU's should follow the same path (ie, towards C-style programming)... no. They should follow the same only if it makes sense to. Graphics chips shouldn't follow that path if it is unreasonable for them to.

Remember, a GPU isn't, nor will it ever become, a second CPU. It is a special-purpose chip with special-purpose hardware and fixed functionality (the rasterizer, among others). The specific purpose of the GPU is understood: It gets certain input data from the CPU (called vertices), does some processing to create triangles and other interpolants, which are then rasterized and interpolated into "fragments", which are then passed to the fragment processor (along with other interpolant/processor-based memory accesses, commonly known as texture mapping), which ouputs values to various destination memory locations (typically sequentially, which will make up the shape of a triangle).

As such, one has to wonder just how much sense it makes to use a C-style paradigm for GPU's. Since there are fundamental differences between them (and, even for shaders as advanced as Renderman, there are still fundamental differences), one wonders whether the assembly-esque (or perhaps slightly more advanced) variant will be more practical. Certainly, no real argument, besides ease-of-use and the falicy of CPU==GPU, has been made for the use of C-style shader programming.

folker
07-08-2002, 11:49 AM
Originally posted by mcraighead:
Note that Cg is intended to work for both OpenGL and D3D. I consider this one of its biggest and most important features. One important question the ARB will face is: is the ARB willing to drop the OGL vs. D3D politics for just one second when talking about shading languages? There is no reason that a shading language must be tied to OGL. Indeed, tying it in this way would do developers a disservice, sending the message that the ARB would rather continue to play these politics than solve a real developer problem, i.e., letting developers write a shader that can work on either API.

Dare I say it -- the ARB might not even be the right standards body to approve a shading language, for the simple reason that the ARB _is_ OpenGL-centric. Who knows, maybe what we need eventually is an "ANSI Cg", where Cg and the MS language are analogous to early AT&T C dialects.
- Matt

Agreed, using the same lanuage both for ogl and d3d would be nice. However, there is a problem:

DX9 HLSL / Cg are not powerful enough for the future OpenGL. DX9 HLSL / Cg does not have important features like if's and loops etc. So if the ARB would decide to make the gl2 shader language compatible to DX9 HLSL / Cg, then the gl2 shader language would have to be a superset of DX9 HLSL / Cg. And this means that future DX HLSL / Cg will only keep compatible to this ARB-approved gl2 shader language if Microsoft decides in future to define the missing features of DX HLSL compatible to the ARB-approved gl2 shader language. But Microsoft easily can decide for an incompatible DX10 HLSL, maybe simply because they prefer some other syntax. In other words, since the DX HLSL / Cg is not finished yet (loops, if's etc), the ARB cannot simply decide to make the gl2 shader language compatible to it.

This means, the DX HLSL / Cg only will keep compatible to the ARB-approved gl2 shader language if Microsoft wants that. If Microsoft does not support this compatibility, the (future) DX HLSL won't be compatible to the ARB-approved gl2 shader language anyway.

So before the ARB decides to make the gl shader language compatible to the DX HLSL, there should be a clear support from Microsoft that it is willing to keep the DX HLSL compatible to the gl2 shader language, too. Without that, compatibility between DX HLSL and gl2 shader language is only a (short) sweat dream.

folker
07-08-2002, 12:24 PM
Originally posted by Korval:
Once again, we seem to be dancing around the issue at hand.

If there is to be a low-level assembly-esque language adopted as a standard, then it is much easier to specify hardware-imposed limits. It is easier to express limits like the number of constants used, register counts, etc when you're looking at something assembly-esque.


As already pointed out by several people: This is wrong.

Except for hardware which has the luck that it matches this assembly shader language, the driver anyway must compile this standard assembly language into a hardware assembly language, and so there is no one-to-one mapping anyway between number of instructions, register counts etc.

And therefore defining hardware limits in a assembly language is as difficult as for a high-level language.



However, if you don't like the idea of seeing the limits, then you will likely prefer the higher-level shading language. Naturally, it hides all of the supposedly unimportant details of any particular implementation, and makes them all look the same (regardless of performance costs).

Basically, there are just two points of view: If you believe that OpenGL should expose the limits of the hardware in a standard way, an assembly-esque language is the best type of shading language. If you believe that OpenGL should impose the significant overhead of allowing for higher-than-hardware limits to shader resources, then you're going to support a C-style interface language.


You assume "significant overhead" as it would be a fact.



Here's something that nobody's even come close to discussing that is also important to this topic: compilation time. An assembly-esque compiler is very simple. For each standardized opcode, it will have a table entry for compiling that opcode into some amount of microcode or state setting. This is a fairly trivial operation. For a C-style language, you need a highly complex compiler that will take a non-insignificant ammount of time to perform this task.

In a high-performance situation (ie, a game), one may want to compile shaders on the fly. I understand that the main reason to do shader compiling nowadays is the lack of looping, but even in the future, one may want to dynamically add/delete various effects out of a shader. One could do this by having constants in the shader that add/delete the various effects, but that impacts shader processing time. Instead, to create high-performance shaders, one will want to compile shaders dynamically.

C-style shaders will take much longer to compile than assembly-style shaders. Therefore, they are an impediment to high-performance rendering.


This aspect you describe is exactly solved by the shader link mechanism of the OpenGL 2.0 proposal. You can link together on the fly different shader code parts very quickly.

See the OpenGL 2.0 specs for details.

And an OpenGL 2.0 driver implementation may do this linking much quicker than your suggestion of recompiling the complete shader on the fly which is no good idea.



Another detriment to C-style shaders (with their undefined limits) is that it allows the driver to impose arbituary overhead on the user. The user should have an idea about exactly what is going on, because that information, for high-performance applications, is vitally important. It is important to know when the driver decides to multipass, and by how much (so that you can decide to do something else).


The OpenGL 2.0 proposal also solves exacty this point: You can query the complexity (number of passes) of a shader.

See the OpenGL 2.0 specs for details.



Having a standardized shader is different from having a standard shader with undefined limits. Having a standardized shader will prevent the platform-specific (by platform-specific, I mean GeForce 3/4 vs Radeon 8500) problems. Having limits will let developers of high-performance applications know about the hardware-specific issues (the difference between a GeForce 3 and a Radeon 7500).

This is important because developers of high-performance applications need to know about the hardware they are using. Remember, they are going to be making hardware-specific shaders anyway for lower-end hardware. They don't want to fall to the 5-pass F-Buffer method that runs in software; they'd rather use a different shader (and no mere compiler can gracefully degrade a shader). By not allowing them to know whether or not a shader will fit in one pass (because they no longer have control over such things), they cannot properly optimize their applications for the low-end solution. This kills OpenGL 2.0 as a high-performance API.


Every hardware vendor can expose hardware-specific assembly shaders, as also pointed out several times.

But of course such hardware-specific interfaces should not be part of a future-oriented and hardware-independent core API.

It's the same as for CPU programming: It is very important to have a platform independent language C / C++. And if a developer wants to optimize for a particular hardware, he can program assembly.



As to the F-buffer method for multipass, it may function, but it is hardly fast. Not only that, it requires additional resources, and in high-performance applications, those resources (extra memory) may make things run significantly slower.


First, why do you assume that it is significantly slower? For example, if you split a shader using 10 TMUs into two passes only using 5 TMUs, then we have five accesses to texture memory and one access to F-buffer (instead of a frame buffer access in a manual multipass solution). So I would say that in most situations there should be no relevant performance difference.

But agreed, probably a automatic multipass probably often may be "somehow slower" than hand-optimized code for a particular hardware. In the same way as assembly codes for a particular CPUs often is faster than C-code.

But is this a reason for not using a high-level language? Do you also conclude from this that we should develop all software for CPUs in assembly instead of C?



You can hide many things in the driver layer. You can even justify it by saying that they are "implementation-specific details that are unimportant to the end-user". But, that justification is flawed. The high-performance user very much wants to know about these "implementation-specific details".


Right. And if you want to access these hardware specific implementation details, you can.

It reminds me very much to the discussion C against assembly for games. Of course, assembly is the fastest. But C is also very fast if you have a reasonable good compiler, and if the developers understands the compiler. So all in all, I think everyone agrees that C is a very performant programming language for CPUs.

It will be the same for GPUs.



So far, OpenGL 2.0 looks like something that is fine for hobbiests, researchers, or even high-end graphics artists (movie product). It lets them do all the demos, effects research, and other work they want. However, it is not useful for high-performance graphics work. Not being able to know about things that directly affect performance is unforgivable and unacceptable. We need to know exactly how a shader is going to be implemented.


As mentioned above:
If your arguments whould be right, everyone would write every performance critical software (e.g. games) completely in assembly and not in C...

In the same way as C is well suited for performance critical software (and allowing every developer to write assembly optimizations), also a C-like language is well suited for performance critical graphics software (and allowing every developer to write hardware-specific non-standadized assembly shaders).



Oh, and for that analogy of people not using assembly in favor of C/C++, understand this: us game developers still, evey now and then, go in and hand-code assembly. And we do it because we have to, in order to get the performance we want.


Do you write your complete application in assemly? You don't use C / C++ at all?



BTW, as for the notion that CPU's and GPU's should follow the same path (ie, towards C-style programming)... no. They should follow the same only if it makes sense to. Graphics chips shouldn't follow that path if it is unreasonable for them to.

Remember, a GPU isn't, nor will it ever become, a second CPU. It is a special-purpose chip with special-purpose hardware and fixed functionality (the rasterizer, among others). The specific purpose of the GPU is understood: It gets certain input data from the CPU (called vertices), does some processing to create triangles and other interpolants, which are then rasterized and interpolated into "fragments", which are then passed to the fragment processor (along with other interpolant/processor-based memory accesses, commonly known as texture mapping), which ouputs values to various destination memory locations (typically sequentially, which will make up the shape of a triangle).

As such, one has to wonder just how much sense it makes to use a C-style paradigm for GPU's. Since there are fundamental differences between them (and, even for shaders as advanced as Renderman, there are still fundamental differences), one wonders whether the assembly-esque (or perhaps slightly more advanced) variant will be more practical. Certainly, no real argument, besides ease-of-use and the falicy of CPU==GPU, has been made for the use of C-style shader programming.


Right, there are fundamental differences between CPU and GPU programming. Especially that GPUs cannot programmed generically, for example each fragment program can only access data of one fragment.

But regarding shader the compiler / assembly / high-level language issues, the analogy is completely valid.

Why don't we learn from the long experiences regarding C / assembly programming languages?

But anyway, you avoid the central question: Should every shader run on every hardware (only differing in performance, and of course giving a develop also hardware-specific lowlevel access if he wants to)? Or do you want that every developer ALWAYS HAS TO write multiple hardware-specific shaders to implementing the same functionality?


[This message has been edited by folker (edited 07-08-2002).]

cass
07-08-2002, 12:30 PM
Originally posted by folker:
Agreed, using the same lanuage both for ogl and d3d would be nice. However, there is a problem:

DX9 HLSL / Cg are not powerful enough for the future OpenGL. DX9 HLSL / Cg does not have important features like if's and loops etc.


The Cg language does support ifs and loops.

Currently available profiles may limit or even prohibit some functionality - based on what the underlying hardware can actually support. Profiles for future hardware will relax these restrictions, however.

Thanks -
Cass

edited to fix up the bold /bold tag in the quote...


[This message has been edited by cass (edited 07-08-2002).]

knackered
07-08-2002, 12:37 PM
BTW, who's bright idea was it to use enums to reference texture units in the multitexture extension? And for that matter, why was the secondary colour extension implemented as if nobody would ever have need of a third or fourth colour? What's wrong with passing integers in!?

There...I've been wanting to say that for years. http://www.opengl.org/discussion_boards/ubb/smile.gif

folker
07-08-2002, 12:44 PM
Originally posted by cass:
The Cg language does support ifs and loops.

Currently available profiles may limit or even prohibit some functionality - based on what the underlying hardware can actually support. Profiles for future hardware will relax these restrictions, however.

Thanks -
Cass

edited to fix up the bold /bold tag in the quote...


[This message has been edited by cass (edited 07-08-2002).]

Right.

But anyway, apart from such detail questions, the main point is:
Realistically, DX9 HLSL / Cg will only keep compatible with a gl2 shader language if Microsoft wants that. The ARB cannot simply decide to make the languages compatible.


[This message has been edited by folker (edited 07-08-2002).]

folker
07-08-2002, 12:45 PM
Originally posted by knackered:
BTW, who's bright idea was it to use enums to reference texture units in the multitexture extension? And for that matter, why was the secondary colour extension implemented as if nobody would ever have need of a third or fourth colour? What's wrong with passing integers in!?

There...I've been wanting to say that for years. http://www.opengl.org/discussion_boards/ubb/smile.gif

Do you feel better now? It was good to talk about it. http://www.opengl.org/discussion_boards/ubb/wink.gif

knackered
07-08-2002, 01:00 PM
Indeed I do, folker. http://www.opengl.org/discussion_boards/ubb/smile.gif
If you're gonna design GL2.0, you should start again - forget legacy support. Make it a C++ interface, have texture objects, vertex objects, shader objects - but within these objects, have really, really general purpose methods that accept constant enums to dictate exactly what they do (such as shader.Command(GL2_ENABLE_SHADER), just to test everyones memories, and, and....****, that's Direct3d, isn't it? http://www.opengl.org/discussion_boards/ubb/smile.gif http://www.opengl.org/discussion_boards/ubb/smile.gif

secnuop
07-08-2002, 01:35 PM
Korval, I think your arguments all work if you assume that an assembly-like language is basically a direct mapping between assembly instructions and hardware opcodes. This is where I disagree. If the assembly-like language must be compiled into hardware opcodes I don't think things work out so nicely.


Originally posted by Korval (all of these quotes are):
If there is to be a low-level assembly-esque language adopted as a standard, then it is much easier to specify hardware-imposed limits. It is easier to express limits like the number of constants used, register counts, etc when you're looking at something assembly-esque.

What if an assembly instruction is not supported natively in hardware? A cross-product instruction is a good example of a case where this might occur. If a cross-product instruction is not supported natively the driver compiler will have to expand this into two instructions - a MUL and a MAD. How does this get counted then? Do you specify all of your limits low enough so that if somebody specified a program with nothing but cross products (which would be absolutely silly, but still legal) it will still run in hardware (or "fast", whatever that means)?

What if you have an instruction that needs some temporary space to be emulated? How do you determine register usage of this program?

I don't it's possible to know the resource usage of your program until the driver has had a chance to look it over, since native hardware limits by definition can change depending on the hardware the program will be run on!



Basically, there are just two points of view: If you believe that OpenGL should expose the limits of the hardware in a standard way, an assembly-esque language is the best type of shading language. If you believe that OpenGL should impose the significant overhead of allowing for higher-than-hardware limits to shader resources, then you're going to support a C-style interface language.

Let me turn your statements into questions. How does an assembly-like language expose the limits of the hardware if each assembly language instruction doesn't directly map to a hardware opcode? I don't understand the statement about "significant overhead" and higher level languages, but don't the same limits exist regardless of whether you specify the program in a high-level or low-level language?

(Aside: I see two very interesting and related discussions happening in this thread, but I don't think they're the same issue. The problem of what to do with program limits exists whether the program is specified in a high level language or a low level language.)



Here's something that nobody's even come close to discussing that is also important to this topic: compilation time. An assembly-esque compiler is very simple. For each standardized opcode, it will have a table entry for compiling that opcode into some amount of microcode or state setting. This is a fairly trivial operation. For a C-style language, you need a highly complex compiler that will take a non-insignificant ammount of time to perform this task.

Again, I agree completely, if the assumption is that the shader compiler simply has to translate between assembly language instruction and hardware opcode. Unfortunately, I expect that this won't be true in the majority of the cases, since not all hardware is created equally.

Vertex program compilation time WILL be an issue even with the current assembler-like languages, since the driver will have to compile the assembly language instructions.



Another detriment to C-style shaders (with their undefined limits) is that it allows the driver to impose arbituary overhead on the user. The user should have an idea about exactly what is going on, because that information, for high-performance applications, is vitally important. It is important to know when the driver decides to multipass, and by how much (so that you can decide to do something else).

This is precisely the illusion that assembler-like languages provide. I strongly believe that you won't know what's going on in the driver with any of the current proposals, be they assembler-like or c-like. If this is incorrect, it means that everybody is designing their hardware to have extremely similar architectures and hardware instruction sets, which I don't think is very likely.

Maybe what we really need is a different scheme altogether. What if there were two methods of specifying vertex / fragment programs to the driver.

One method would be through a standard high-level language. Perhaps the generated low-level assembler-like results could be returned to the user in a non-standard (read: vendor specific) form so that an ISV could see the code the driver compiler produces. Think of this like the compile-to-assembler option of your favorite high-level compiler.

The other method would be to input low-level vendor specific assembler directly. Perhaps the mechanism to specify this would be standard but the actual code wouldn't be. This would give direct control over the shader code and a mechanism to try to produce better code than the driver compiler.

I'm sure this proposal could be improved even more, but I think this would provide the best of both worlds. Food for thought, at least. http://www.opengl.org/discussion_boards/ubb/smile.gif

Korval
07-08-2002, 01:54 PM
On my comment about assembly-esque languages being easy to show limits in:


As already pointed out by several people: This is wrong.

Except for hardware which has the luck that it matches this assembly shader language, the driver anyway must compile this standard assembly language into a hardware assembly language, and so there is no one-to-one mapping anyway between number of instructions, register counts etc.

That's a logical falicy. By that logic, a C-like language can't be used unless the hardware matches the C-style of the language. Both of them will be compiled, so the compiler can convert them however they want. If a compiler can compile a C-style language into the hardware-specific format for the GPU, a compiler can compile any assembly-esque language into it as well.

Even current vertex shaders likely aren't simple processors that have a "log" opcode; more likely, the assembly is compiled into some form of really-low-level microcode that gets uploaded and executed.

However, the hardware will, almost certainly, still have registers, constants, and so on, much like it does today. Therefore, an assembly-like language is still a reasonable low-level model. Also, the assembly model does make it easier to specify limits. That is something you cannot deny.


You assume "significant overhead" as it would be a fact.

No more than you assume that there will be "insignificant overhead" to be a fact. But we'll get to disproving your "fact" in a moment.


You can link together on the fly different shader code parts very quickly.

No, it can piece together different shader parts. How quickly it can do so is implementation dependent. Don't assume its going to be fast just because it seems simple on the surface.


Every hardware vendor can expose hardware-specific assembly shaders, as also pointed out several times.

What good is that? Since I've taken out your best (and only) argument against a standardized assembly-esque shader, why not have one? Having platform-specific shader languages simply gives the problem we have now.


It's the same as for CPU programming: It is very important to have a platform independent language C / C++. And if a developer wants to optimize for a particular hardware, he can program assembly.

Saying that it is the same for CPU programming doesn't make it right or reasonable. A CPU is not a GPU. A GPU has to do much more time-critical tasks than the CPU, and it is doing them repeatedly. Losing one or two cycles in code that is run 50 times per frame on the CPU isn't an issue. Losing one or two cycles in code that is run on 500,000 vertices (or 500,000,000 fragments) per frame is a significant performance issue. And the near future isn't going to change that too significantly.


First, why do you assume that it is significantly slower? For example, if you split a shader using 10 TMUs into two passes only using 5 TMUs, then we have five accesses to texture memory and one access to F-buffer (instead of a frame buffer access in a manual multipass solution). So I would say that in most situations there should be no relevant performance difference.

Well, you would be wrong. First, you're making the assumption (that I disproved some time ago) that a 10-texture shader can be run in 2 passes on 5-texture hardware. This is not necessarily the case; it depends significantly on the particular shader.

Secondly, assuming a 2-pass solution is possible, that means using twice the fillrate. It means running the vertex shader twice (and I don't want to hear anything about how you can store the intermediates, because I've already shown that you can't). Not only that, depending on the shader, you may be doing the same operations twice on the vertex shader because one of the texture generation algorithms needs results from one of the first five texture gens.

Next, you're taking up valuable video RAM space for the F-Buffer, thus leaving less space for textures. You're, also, hitting the memory 2 extra times per pixels (once into the F-buffer for the first pass, again for the F-buffer in the second).

Also, you have to do a copy operation to get the F-buffer values into the framebuffer. This has the disadvantage of causing a full pipeline stall (since subsequent triangles may need access to the pixels you're writting) while you update the framebuffer.

Using an F-buffer approach makes blending that much more difficult as well, especially if the framebuffer-value is part of the fragment shader program). That, alone, might make it impossible to guarentee that all shaders will compile.

In short, it is going to be at least twice as slow as the single pass technique, if not more. It may not even make things possible.

Oh, sure, you're not pushing the memory bandwidth too much more than a single-pass method. But you're still guarenteed to take no less than twice as long as the single-pass method (all other things being equal, of course).


But is this a reason for not using a high-level language? Do you also conclude from this that we should develop all software for CPUs in assembly instead of C?

If code is time-critical (like all GPU code is) and it can be optimized further than a compiler, then yes, it should be rewritten in assembly. That's the way us game developers handle things.


It reminds me very much to the discussion C against assembly for games. Of course, assembly is the fastest. But C is also very fast if you have a reasonable good compiler, and if the developers understands the compiler. So all in all, I think everyone agrees that C is a very performant programming language for CPUs.

It reminds me more of arguments I've heard for making processors that natively understand C. I think we can see how relatively silly and useless that argument is.


It will be the same for GPUs.

A pretty bold statement considering the vast differences between CPUs and GPUs.


If your arguments whould be right, everyone would write every performance critical software (e.g. games) completely in assembly and not in C...

What transparent multipass and C have in common, I don't know. But my argument is against transparent multipass. Because of that, I support exposed hardware limits and the assembly-like language that makes exposing hardware limits possible.

If you could find a way to make a C-like language where I could see the hardware-exposed limits, I would be fine with that. Unfortunately, that's very difficult, if not impossible.


In the same way as C is well suited for performance critical software (and allowing every developer to write assembly optimizations), also a C-like language is well suited for performance critical graphics software (and allowing every developer to write hardware-specific non-standadized assembly shaders).

See above: CPU != GPU.


Do you write your complete application in assemly? You don't use C / C++ at all?

Maybe you should read what I said. I said, "Oh, and for that analogy of people not using assembly in favor of C/C++, understand this: us game developers still, evey now and then, go in and hand-code assembly. And we do it because we have to, in order to get the performance we want." I said nothing about entire applications.



Should every shader run on every hardware (only differing in performance, and of course giving a develop also hardware-specific lowlevel access if he wants to)?

No. A shader should run if the hardware is within the given limits that the shader requires. Performance is a more important consideration than general ease of use for the developer.


Or do you want that every developer ALWAYS HAS TO write multiple hardware-specific shaders to implementing the same functionality?

It's nice to see you putting words in my mouth. I said nothing about hardware-specific shaders. I don't want hardware-specific shaders. As I said, I want a single, assembly-esque shader language that all hardware developers will use as the standard, with an interface to see the hardware-specific limitations on a particular shader.

And what's wrong with that?

folker
07-08-2002, 02:21 PM
BTW, a minor remark:
Currently DX9 HLSL / Cg cannot be compatible with a gl2 shader languge, because the tex1D/tex2D etc. functions of Cg don't take separate arguments for the texture stage and texture coordinates.

This does not only exclude future hardware, but does even not include the functionality provided by d3d8 1.4 pixel programs where you can generate texture coordinates for texture lookup.

In other words, without modifications of D3D9 HLSL / Cg, a gl2 shader language (which must not have such restrictions) cannot be compatible with it.

cass
07-08-2002, 02:45 PM
Originally posted by folker:
BTW, a minor remark:
Currently DX9 HLSL / Cg cannot be compatible with a gl2 shader languge, because the tex1D/tex2D etc. functions of Cg don't take separate arguments for the texture stage and texture coordinates.
. . .
In other words, without modifications of D3D9 HLSL / Cg, a gl2 shader language (which must not have such restrictions) cannot be compatible with it.



Folker,

The texture fetching calls are defined in the standard library on a per-profile basis. It did not make sense to have "generic" texture fetch calls for DX8 ps 1.1 pixel shaders.

Future profiles will certainly relax that restriction and provide more generic access.

This is not a language thing - it's a profile thing.

Thanks -
Cass

folker
07-08-2002, 03:18 PM
Originally posted by Korval:
That's a logical falicy. By that logic, a C-like language can't be used unless the hardware matches the C-style of the language. Both of them will be compiled, so the compiler can convert them however they want. If a compiler can compile a C-style language into the hardware-specific format for the GPU, a compiler can compile any assembly-esque language into it as well.

Even current vertex shaders likely aren't simple processors that have a "log" opcode; more likely, the assembly is compiled into some form of really-low-level microcode that gets uploaded and executed.

However, the hardware will, almost certainly, still have registers, constants, and so on, much like it does today. Therefore, an assembly-like language is still a reasonable low-level model. Also, the assembly model does make it easier to specify limits. That is something you cannot deny.



What good is that? Since I've taken out your best (and only) argument against a standardized assembly-esque shader, why not have one? Having platform-specific shader languages simply gives the problem we have now.

Saying that it is the same for CPU programming doesn't make it right or reasonable. A CPU is not a GPU. A GPU has to do much more time-critical tasks than the CPU, and it is doing them repeatedly. Losing one or two cycles in code that is run 50 times per frame on the CPU isn't an issue. Losing one or two cycles in code that is run on 500,000 vertices (or 500,000,000 fragments) per frame is a significant performance issue. And the near future isn't going to change that too significantly.


If you really want to optimize your code (or want to define hardware limits), you need the hardware specific asm language. A hardware independent language (asm or c-like) won't help you. Simply because hardware architectures are too different.

Real-world sample: The gf4 vertex hardware is a vec4 vector processor, whereas the P10 vertex hardware is a scalar processor (see for example http://www.anandtech.com/video/showdoc.html?i=1614&p=5 )

Thus, what do you suggest? Your hardware-independent asm shader language should be an vec4 language or not? There is no asm language directly representating both the gf4 and p10 hardware.

So if you really want to optimize perfectly for a gf4 and for a p10, you would need to access the gf4 vec4 asm language to optimize for the gf4, but the p10 scalar asm language to optimize for the p10.

How do you want to define hardware limits? Do you want to define the limits in terms of number of vec4 commands or number of scalar commands? Obviously it does not work out.

If this all does not work even for today's hardware, what about future hardware?

Why don't we learn from CPUs? Intel, Motorola, etc. - all have different assembly languages, the standard is not a asm language, but C / C++. There is a reason for this...



No, it can piece together different shader parts. How quickly it can do so is implementation dependent. Don't assume its going to be fast just because it seems simple on the surface.


How quickly you can compile a complete shader is also implementation dependent. So what is the point?

And linking supports a more efficient implementation than recompiling the complete shader. Linking is specifically designed to solve this problem you mentioned in an optimal way.



Well, you would be wrong. First, you're making the assumption (that I disproved some time ago) that a 10-texture shader can be run in 2 passes on 5-texture hardware. This is not necessarily the case; it depends significantly on the particular shader.

Secondly, assuming a 2-pass solution is possible, that means using twice the fillrate. It means running the vertex shader twice (and I don't want to hear anything about how you can store the intermediates, because I've already shown that you can't). Not only that, depending on the shader, you may be doing the same operations twice on the vertex shader because one of the texture generation algorithms needs results from one of the first five texture gens.

Next, you're taking up valuable video RAM space for the F-Buffer, thus leaving less space for textures. You're, also, hitting the memory 2 extra times per pixels (once into the F-buffer for the first pass, again for the F-buffer in the second).

Also, you have to do a copy operation to get the F-buffer values into the framebuffer. This has the disadvantage of causing a full pipeline stall (since subsequent triangles may need access to the pixels you're writting) while you update the framebuffer.

Using an F-buffer approach makes blending that much more difficult as well, especially if the framebuffer-value is part of the fragment shader program). That, alone, might make it impossible to guarentee that all shaders will compile.

In short, it is going to be at least twice as slow as the single pass technique, if not more. It may not even make things possible.

Oh, sure, you're not pushing the memory bandwidth too much more than a single-pass method. But you're still guarenteed to take no less than twice as long as the single-pass method (all other things being equal, of course).


You miss the point: You argue that a single-pass version of an shader is faster than a multipass version of the same shader functionality.

But the point is: If a shader functionality cannot be executed single-pass due to hardware limitations, why should be a manual coded two-pass implementation be (significantly) faster than a automatic generated two-pass version of the same shader functionality?



Using an F-buffer approach makes blending that much more difficult as well, especially if the framebuffer-value is part of the fragment shader program). That, alone, might make it impossible to guarentee that all shaders will compile.


No, see my post from 07-07-2002 03:58 AM.



If code is time-critical (like all GPU code is) and it can be optimized further than a compiler, then yes, it should be rewritten in assembly. That's the way us game developers handle things.

It reminds me more of arguments I've heard for making processors that natively understand C. I think we can see how relatively silly and useless that argument is.

A pretty bold statement considering the vast differences between CPUs and GPUs.

What transparent multipass and C have in common, I don't know. But my argument is against transparent multipass. Because of that, I support exposed hardware limits and the assembly-like language that makes exposing hardware limits possible.

If you could find a way to make a C-like language where I could see the hardware-exposed limits, I would be fine with that. Unfortunately, that's very difficult, if not impossible.

See above: CPU != GPU.

Maybe you should read what I said. I said, "Oh, and for that analogy of people not using assembly in favor of C/C++, understand this: us game developers still, evey now and then, go in and hand-code assembly. And we do it because we have to, in order to get the performance we want." I said nothing about entire applications.

No. A shader should run if the hardware is within the given limits that the shader requires. Performance is a more important consideration than general ease of use for the developer.

It's nice to see you putting words in my mouth. I said nothing about hardware-specific shaders. I don't want hardware-specific shaders. As I said, I want a single, assembly-esque shader language that all hardware developers will use as the standard, with an interface to see the hardware-specific limitations on a particular shader.

And what's wrong with that?


You seem to be really afraid of a high-level shader language. Did you already tell Microsoft and Nvidia that DX9 HLSL / Cg are stupid ideas? http://www.opengl.org/discussion_boards/ubb/wink.gif


[This message has been edited by folker (edited 07-08-2002).]

folker
07-08-2002, 03:32 PM
Originally posted by cass:
Folker,

The texture fetching calls are defined in the standard library on a per-profile basis. It did not make sense to have "generic" texture fetch calls for DX8 ps 1.1 pixel shaders.

Future profiles will certainly relax that restriction and provide more generic access.

This is not a language thing - it's a profile thing.

Thanks -
Cass



Hm...

In other words, the Cg profile exposes hardware specific things to the Cg language, right? But this means that one particular Cg shader not compile with every profile, right?

And this also means that a gl2 shader does not compile with (most or all) Cg profiles anyway? So there cannot be a gl2 shader language so that (all) gl2 shaders run with all Cg profiles and vice versa in principle, right?

Then why should we to try to make a gl2 SL compatible (whatever this means) to DX9 HLSL / Cg, if you anyway cannot compile gl2 shaders with DX9 HLSL / Cg or vice versa?

(In my understanding, "compatibility" between languages includes elementary build-in functions so that every source always compiles. Note that ANSI C/C++ also includes a specification of the standard C/C++ libary for a good reason.)


[This message has been edited by folker (edited 07-08-2002).]

Thaellin
07-08-2002, 04:24 PM
In an attempt to clarify a concept I'm seeing on this thread:

Some people are arguing that hand-coded assembler allows you to write the fastest possible shader. No one is really trying to contradict this.

I think one post tried to validate the possibility of a multi-vendor shader assembly language by pointing out that AMD and Intel x86 processors shared the same assembler language. This is closer to the point, but misses out an important aspect of the problem.

If I were to define an assembler language for use on all PowerPC and x86 processors, then I would have to either favor one hardware representation over the other, or make one language which does not truly represent either architecture's capabilities. The argument towards a single assembler language for GPUs has the same flaw. Like the CPUs, GPUs have the same 'basic' purpose, but they are free to go about accomplishing that purpose in any number of ways. Placing one assembler language over all GPUs either makes one GPU manufacturer happy at the expense of others, or abstracts the functionality of all hardware equally.

Since the assembler language cannot avoid abstracting one or more significant vendors' hardware designs, what benefit does an assembler language hold at all? You cannot assume that you are writing to the metal. With this (the primary advantage of an assembler-level language) erased, why not standardize with a higher-level language that is not only easier to write, comphrehend, and maintain, but is very straight-forward about the fact that it /is/ an abstraction?

You WILL have an abstraction, no matter the language. That's what we're faced with when coding for different hardware architectures. Would you rather simply pretend that the assembler-language shader you spent an hour writing is on a hardware level where the HLL version I wrote in five minutes is not?

Vendors can still expose a /true/ hardware level shader language through extensions, and this will allow devleopers to code fast paths for those cards. Which would you prefer: a truly hardware-representing vendor-specific asm SL, or a pseudo-assembler language for all cards?

I don't even see how there /can/ be an argument on this point.

-- Jeff

Korval
07-08-2002, 05:07 PM
See secnuop's post from 07-08-2002 09:06 AM. He explains why assembly language does not make it easier to define hardware limits.

It's funny. His post doesn't talk about how an assembly-esque language doesn't make it easier to define hardware limits. In fact, he isn't talking about hardware limits in his post at all. Instead, he is discussing how assembly-esque languages would only be assembled into opcodes on one piece of hardware, and that the others would have to compile it.

I pointed out that they are, almost certainly, already compiling vertex shaders into platform-specific microcode. It is a process that already likely does some optimization. It isn't a C-compiler by any means, but it's not a basic assembler either.


Linking is specifically designed to solve the problem you mentioned in the optimal way.

At the linking level, perhaps. But, certainly not at the compiler level. It is entirely possible that linked shaders are less optimized than fully compiled ones. And, if they're not, then you're still doing a lot of work searching through the two linked shaders for points to optimize/inline.


Well if you really want low-level control, you need to access the hardware-specific language, not a hardware-independent language (whether it is asm or C-like).

Apparently, you haven't noticed that I've avoided the use of this "low-level" term. The reason I want an assembly-esque shader language is exactly as I have stated before: I don't believe transparent multipass is something OpenGL should handle. Since a C-style language would make defining hardware limits very difficult (and defining hardware limits is vital if you don't have transparent multipass), the language should be an assembly-style one.

I'm not looking for low-level control here. I'm looking for having hardware-defined limits for shaders.


But the point is: If a shader functionality cannot be executed single-pass due to hardware limitations, why should be a manual coded two-pass implementation be (significantly) faster than a automatic generated two-pass version of the same shader functionality?

That may be your point, but it isn't my point. My point is that a performance-programmer isn't going to use the 2-pass system at all, either self-coded or driver-implemented. Why? Because it's slow. Therefore transparent multipass is utterly useless to us.

My argument against transparent multipass is this. As it stands now, the only feedback I guess is a bool: yes, your shader can work (in one pass, for I refuse to multipass), or no it can't. I have no idea why my shader was rejected by the driver. And, therefore, I have no idea how to design a shader that runs in one pass. As a high-performance programmer, this is unacceptable.


[Edit:] I hope that Thaelin will see my point that the C-style language will not be able to define hardware limits, and will therefore require transparent multipass. Otherwise, I'd be fine with it.

[This message has been edited by Korval (edited 07-08-2002).]

cass
07-08-2002, 10:31 PM
Originally posted by folker:
Hm...

In other words, the Cg profile exposes hardware specific things to the Cg language, right? But this means that one particular Cg shader not compile with every profile, right?


Folker,

That's right. Cg profiles - in addition to specifying the part of the programmable pipleline that they target - expose the capabilities (and limitations) of their low-level targets, which are based on the API features and supported extensions.

This makes Cg useful today on millions of GPUs, and allows Cg developers to write in a high-level language while targetting a well-defined set of resources. And they can even see / use the resulting low-level assembly program.

The Cg profile model allows compiler vendors (which are not necessarily IHVs) to gracefully relax old restrictions, expose new hardware capabilities, and even expose new programmable parts in the graphics pipeline. This is a very important degree of freedom to maintain because we've really only seen the tip of the GPU programmability iceberg.

Thanks -
Cass

Ugh. I always wind up having to fix the UBB markup.


[This message has been edited by cass (edited 07-09-2002).]

Nakoruru
07-09-2002, 12:04 AM
Translating from one assembly language is an incredibly straight forward process, especially when the source assembly language is more restrictive than the target. I would hardly call it 'compiling' as many people imply. In fact, the process is so similar to a 'normal' assembler that its should still be called 'assembling'.

What is assembling anyway? Its translating an op code into binary machine code one op code at a time, which each op code producing a completely predictable machine code output, which is then put in a completely predictable place. The only tricky thing about an assembler are labels, which allow you to use a symbol instead of an absolute address.

This process is the same linear predictable process whether 'CROSS' gets translated directly a single machine instruction that does a cross product or one that does a 'MUL' then an 'MAD'.

Also, there is nothing to keep me from saying that 'CROSS' is a nmeumonic in the latter assembly language, even if the hardware does not support a single CROSS instruction.

So, you still have completely predictable hardware limits. The example someone wanted an explanation for was what if the standard assembly supports a CROSS instruction, but the actual target only supports MUL and MAD. Well, if this was the worse case (the worse that can happen is one instruction becomes two) then the actual target will simply have to have twice as much room for its instructions than the standard requires.

I.E., if the standard requires that you be able to load 128 op codes, then it will have to have 256 places. Then you can make a program with 128 CROSS instructions, and it will match the standard assembly language.

Notice that whether or not CROSS gets turned into 1 or 2 or 3 opcodes, it is still the very same simply assembly process, and no where near approaches the complexity of compilation. Assemblers normally have psuedo ops which gets compiled to multiple actual machine instructions.

Just look at how Quake 3 compiles its byte code quickly into Intel assembly and then optimizes it for an example.

I agree with the argument that there is nothing wrong with a virtual assembly language. It assembles quickly into whatever the native machine language is, and can be defined in such a way that the limits are still predictable and within the requirements of the standard. The main advantage is well defined programmability across all implementations. It takes NOTHING away from a higher level shading language.

And even if we have to truck the virtual assembly language around in the standard translating from virtual assembly to native, it can't be more than 500 to 1000 lines of code.

Unless we see huge differences from your standard scalar and vector risc like instruction sets, the differences between the capabilities of the gpus are not going to be that great, an assembly language could capture the very basic functionality in a strict way and would be very useful.

I also agree that if there is a good mechanism for controlling a High Level Shading Language compiler, which allows you to understand the resources required by a shader so you can make it perform better and know what to do and not do to keep it from running out of resources, then this would have the same advantages as the virtual assembly language (but it would not negate the virtual assembly languages usefulness)

Such controls are the bread and butter of programmers seeking high performance out of C applications, so we should have them for any high level shading language. I think that Cgs profiles are one way to do this, along with _asm, #pragma, and gcc's __attribute__(())

The main problem with this discussion is that people are not stating why they must have their view, while excluding others.

I do not see how a virtual assembly language takes away from a high level shading language. Neither removes the possibility of native assembly languages as extensions (although I would like to see a standard interface for sending such programs, if its possible).

And none of any of that has anything to do with whether the resources of a graphics card should be hard and fast and defined or if they should be completely virtualized (in other words, code different shaders that take advantage of different levels of resources or code one shader which degrades in performance as it exceeds a cards resources).

But, even those are not mutually exclusive. It should be possible to run the most expensive shader on the hardware with the least resources, but it should also be possible to figure out that its going to be foolhartily slow and use the cheaper shader.

I think that running a shader that is too expensive should be done almost exclusivly by a full software fallback, because to anyone who has studied the math behind computer science should know, figuring out how to break a shading program into multipass in a general way is simply impossible (yes, impossible as in going faster than the speed of light impossible).

folker
07-09-2002, 01:48 AM
Originally posted by Korval:
It's funny. His post doesn't talk about how an assembly-esque language doesn't make it easier to define hardware limits. In fact, he isn't talking about hardware limits in his post at all. Instead, he is discussing how assembly-esque languages would only be assembled into opcodes on one piece of hardware, and that the others would have to compile it.

I pointed out that they are, almost certainly, already compiling vertex shaders into platform-specific microcode. It is a process that already likely does some optimization. It isn't a C-compiler by any means, but it's not a basic assembler either.

At the linking level, perhaps. But, certainly not at the compiler level. It is entirely possible that linked shaders are less optimized than fully compiled ones. And, if they're not, then you're still doing a lot of work searching through the two linked shaders for points to optimize/inline.

Apparently, you haven't noticed that I've avoided the use of this "low-level" term. The reason I want an assembly-esque shader language is exactly as I have stated before: I don't believe transparent multipass is something OpenGL should handle. Since a C-style language would make defining hardware limits very difficult (and defining hardware limits is vital if you don't have transparent multipass), the language should be an assembly-style one.

I'm not looking for low-level control here. I'm looking for having hardware-defined limits for shaders.

That may be your point, but it isn't my point. My point is that a performance-programmer isn't going to use the 2-pass system at all, either self-coded or driver-implemented. Why? Because it's slow. Therefore transparent multipass is utterly useless to us.


Defining hardware limits for an assembly language language is at the end as difficult as for a high-level language.

Take a look at my gf4 / p10 sample of my 07-08-2002 05:18 PM post: How do you count hardware opcode instructions and define hardware limits?`



My argument against transparent multipass is this. As it stands now, the only feedback I guess is a bool: yes, your shader can work (in one pass, for I refuse to multipass), or no it can't. I have no idea why my shader was rejected by the driver. And, therefore, I have no idea how to design a shader that runs in one pass. As a high-performance programmer, this is unacceptable.


In the above case of a gf4 and p10: How do you know that your particular asm-shader can run both on a gf4 and p10 in single-pass? How do you count instructions?

folker
07-09-2002, 01:48 AM
Originally posted by Nakoruru:
Translating from one assembly language is an incredibly straight forward process, especially when the source assembly language is more restrictive than the target. I would hardly call it 'compiling' as many people imply. In fact, the process is so similar to a 'normal' assembler that its should still be called 'assembling'.

What is assembling anyway? Its translating an op code into binary machine code one op code at a time, which each op code producing a completely predictable machine code output, which is then put in a completely predictable place. The only tricky thing about an assembler are labels, which allow you to use a symbol instead of an absolute address.

This process is the same linear predictable process whether 'CROSS' gets translated directly a single machine instruction that does a cross product or one that does a 'MUL' then an 'MAD'.

Also, there is nothing to keep me from saying that 'CROSS' is a nmeumonic in the latter assembly language, even if the hardware does not support a single CROSS instruction.

So, you still have completely predictable hardware limits. The example someone wanted an explanation for was what if the standard assembly supports a CROSS instruction, but the actual target only supports MUL and MAD. Well, if this was the worse case (the worse that can happen is one instruction becomes two) then the actual target will simply have to have twice as much room for its instructions than the standard requires.

I.E., if the standard requires that you be able to load 128 op codes, then it will have to have 256 places. Then you can make a program with 128 CROSS instructions, and it will match the standard assembly language.

Notice that whether or not CROSS gets turned into 1 or 2 or 3 opcodes, it is still the very same simply assembly process, and no where near approaches the complexity of compilation. Assemblers normally have psuedo ops which gets compiled to multiple actual machine instructions.

Just look at how Quake 3 compiles its byte code quickly into Intel assembly and then optimizes it for an example.

I agree with the argument that there is nothing wrong with a virtual assembly language. It assembles quickly into whatever the native machine language is, and can be defined in such a way that the limits are still predictable and within the requirements of the standard. The main advantage is well defined programmability across all implementations. It takes NOTHING away from a higher level shading language.

And even if we have to truck the virtual assembly language around in the standard translating from virtual assembly to native, it can't be more than 500 to 1000 lines of code.

Unless we see huge differences from your standard scalar and vector risc like instruction sets, the differences between the capabilities of the gpus are not going to be that great, an assembly language could capture the very basic functionality in a strict way and would be very useful.

I also agree that if there is a good mechanism for controlling a High Level Shading Language compiler, which allows you to understand the resources required by a shader so you can make it perform better and know what to do and not do to keep it from running out of resources, then this would have the same advantages as the virtual assembly language (but it would not negate the virtual assembly languages usefulness)

Such controls are the bread and butter of programmers seeking high performance out of C applications, so we should have them for any high level shading language. I think that Cgs profiles are one way to do this, along with _asm, #pragma, and gcc's __attribute__(())

The main problem with this discussion is that people are not stating why they must have their view, while excluding others.

I do not see how a virtual assembly language takes away from a high level shading language. Neither removes the possibility of native assembly languages as extensions (although I would like to see a standard interface for sending such programs, if its possible).

And none of any of that has anything to do with whether the resources of a graphics card should be hard and fast and defined or if they should be completely virtualized (in other words, code different shaders that take advantage of different levels of resources or code one shader which degrades in performance as it exceeds a cards resources).

But, even those are not mutually exclusive. It should be possible to run the most expensive shader on the hardware with the least resources, but it should also be possible to figure out that its going to be foolhartily slow and use the cheaper shader.
`

How do you for example count / define hardware limits in my gf4 / p10 sample of my 08-2002 05:18 post?



I think that running a shader that is too expensive should be done almost exclusivly by a full software fallback, because to anyone who has studied the math behind computer science should know, figuring out how to break a shading program into multipass in a general way is simply impossible (yes, impossible as in going faster than the speed of light impossible).


I don't see any convincing argument that it is not possible in a reasonable speed. Why not use f-buffers?

Maybe I am wrong, but currently I don't see any argument why in typcial sitiations a manual multipass solution should be significantly faster than automatic multipass for example using f-buffers.

Addendum: I forgot to mention: I agree, having also access to the hardware specific asm language can be very useful. This may be by an pragma statement or by separate OpenGL extensions.


[This message has been edited by folker (edited 07-09-2002).]

folker
07-09-2002, 02:04 AM
Originally posted by Korval:
At the linking level, perhaps. But, certainly not at the compiler level. It is entirely possible that linked shaders are less optimized than fully compiled ones. And, if they're not, then you're still doing a lot of work searching through the two linked shaders for points to optimize/inline.


Linking can be implemented always as performant as recompiling the complete shader asm code (because the driver could always generate asm code internally and link it by re-compiling it).

On the other hand, linking allows much more performant optimizations in most situations than alwways recompiling the asm code.

What do you conclude from that?

folker
07-09-2002, 02:39 AM
To summarize some points which in my opinion are important:

First, as already mentioned by secnuop, we have to discussion topics:

a) Standard hardware-independent assembly language versus standard high-level langugae.

As Thaellin and others pointed out correctly, there is no unified standard assembly language representing every hardware. Its the same problem as for CPUs: For example, PowerPC, x86 and I64 cannot share one coommon assembly languages. For GPUs, a vec4 hardware like gf4 has a fundamentally different assembly language than a scalar-operating p10. So whatever assembly language you select as standard, it never can reflect all hardware architectures, it will prefer some (current) hardware achitecture to the disadvantage of others, and you need advanced compilers anyway.

So a high-level shader languages is a natural an powerful choice solving these problems.

Of course, having also low-level access to a hardware-specific assembly language makes much sense, allowing developers to write also optimized code pathes for a particular hardware.

b) Shader limits and automatic multipass (or alternative techniques).

The first question: Is automatic multipass (or alternative solutions) possible on (future) hardware with a reasonable performance (e.g. no software fallback)? Yes, for example using f-buffers.

Then, second question: Defining limits:
If each hardware exposes hardware-specific limits, the developers are forced to write different code pathes not only for different detail levels, but also for every hardware even for the same functionality.

Having no limits (for example implemented by transparent multipass) implements the desire of "every shader runs on every hardware, only differing in performance".

That's the main job of an driver API. And it is the spirit of OpenGL.

Of course, if wanted, each developer still can write different code pathes for every hardware (avoiding for example automatic multipass) to get the last cycle of performance on every hardware platform. But obviousy, the developers shouldn't be forced(!) to implement different code pathes for every hardware. This would be a big step backwards and would destroy the spirit of OpenGL.

knackered
07-09-2002, 03:09 AM
Can't there be an automatic multipass 'hint'? By default the hint is set to ALWAYS_DO_AUTOMATIC_MULTIPASS, but can be set to FAIL_ON_MULTIPASS, where the shader fails to bind if it neccessitates multiple passes on the target hardware. OR a hintable proxy texture type mechanism?

[This message has been edited by knackered (edited 07-09-2002).]

ScottManDeath
07-09-2002, 03:27 AM
Originally posted by knackered:
Can't there be an automatic multipass 'hint'? By default the hint is set to ALWAYS_DO_AUTOMATIC_MULTIPASS, but can be set to FAIL_ON_MULTIPASS, where the shader fails to bind if it neccessitates multiple passes on the target hardware. OR a hintable proxy texture type mechanism?

[This message has been edited by knackered (edited 07-09-2002).]


yes that sound interesting, make it an optional feature like glEnable(GL_AUTO_MULTI_PASS) (which is disabled for default) and then on loading a shader of the driver can multi pass it, it is accepted,otherwise it is rejected. When multipass is disabled it will accepted only when doing it in one pass. Then the app can decide what to do when the shader is nit able to run (auto_multi_passed ot not), for example try a simpler shader or skip it.

I think when you develop an app (game or ...) you always have your target harware in mind, so you would only write a shader that is hw accelerated anyways. So when a new hw is available it will run your shader for a less powerfull hw.But you could for example provide a patch for you app to use a shader that would in theory run on you target hw, but slow. The new hw would accept the shader and would run it.

Bye
ScottManDeath

folker
07-09-2002, 03:57 AM
Originally posted by knackered:
Can't there be an automatic multipass 'hint'? By default the hint is set to ALWAYS_DO_AUTOMATIC_MULTIPASS, but can be set to FAIL_ON_MULTIPASS, where the shader fails to bind if it neccessitates multiple passes on the target hardware. OR a hintable proxy texture type mechanism?

[This message has been edited by knackered (edited 07-09-2002).]

This would be an option, agreed.

At the end, the gl2 function SHADER_RELATIVE_SIZE shader attribute (gl2 objects white paper) provides this functionality. But a GL_AUTO_MULTI_PASS state would be more explicit.

Thaellin
07-09-2002, 04:05 AM
Korval:
So, if we could resolve the issue of transparent multipass being obscured from the developer, you would be happy with a high-level language representation?

How about this (pseudocode with an OpenGL1x flavor to it):



sometype existingObject;
othertype shader = SOME_SHADER_PROGRAM;
BindShader( existingObject, SOME_TARGET );

int complexity = CompileShader( &amp;shader );
if (complexity <= 0) {
// Handle compilation error condition
}
else if (complexity == 1) {
// guaranteed single-pass in hardware
UploadShader( existingObject, SOME_TARGET );
}
else {
// probable multi-pass or fallback.
complexity = CompileShader( &amp;simpleShader );
if (complexity != 1 ) {
// this card sucks, I quit.
}
else UploadShader( existingObject, SOME_TARGET );
}


Cass:
If you can write a shader in Cg, but not guarantee that it will compile on all cards supporting Cg, isn't that a significant problem with the language? If I get an ANSI standard "C" compiler and it doesn't compile my ANSI "C" program, I tend to think of this as a bug, not a feature of the language. Isn't this going to generate support headaches?

Knackered:
I like the hint idea, but I think 'hints' were being dropped from 2.0? Can't remember... As mentioned by folker, this sounds like a compatible concept for GL2 shader objects, though.

-- Jeff

[This message has been edited by Thaellin (edited 07-09-2002).]

[This message has been edited by Thaellin (edited 07-09-2002).]

ash
07-09-2002, 04:06 AM
[QUOTE]Originally posted by Nakoruru:
Translating from one assembly language is an incredibly straight forward process, especially when the source assembly language is more restrictive than the target. I would hardly call it 'compiling' as many people imply. In fact, the process is so similar to a 'normal' assembler that its should still be called 'assembling'.


As the guy who wrote the code that translates GF3 assembler to P10 vertex shader assembler in the P10 driver (for nv_vertex_program and VS1.x), I can tell you just how untrue this statement is. I can't go into details but rest assured it's tens of thousands of lines of code, and involves pruning, data propogation, simplification of expressions, instruction reordering for stall avoidance, common subexpression elimination, etc. In other words, it's a compiler.

Ash

Nakoruru
07-09-2002, 04:52 AM
Folker, If you mean how you would translate from a vec4 assembly into a float based assembly, then its not that hard. It may require that you expand each operation to 4 instructions or more. But if you are using a vec4 assembly to its fullest then this is what you want anyway. But, its still straight forward assembling of code. I bet a simple optimizer could figure out a lot of waste and eliminate it (you would be stuck with it on an actual vec4 processor). It won't be optimal all the time, but thats not the goal. The goal is the fact that you can do it in the first place.

cass
07-09-2002, 04:54 AM
Originally posted by Thaellin:
Cass:
If you can write a shader in Cg, but not guarantee that it will compile on all cards supporting Cg, isn't that a significant problem with the language? If I get an ANSI standard "C" compiler and it doesn't compile my ANSI "C" program, I tend to think of this as a bug, not a feature of the language. Isn't this going to generate support headaches?


Thaellin,

Exposing a high-level hardware shading language does not imply that you must employ a one-size-fits-all "virtual hardware description". Graphics programmers are quite used to the idea that different hardware has drastically different native capabilities, and that straying beond those capabilities sends you over a performance cliff. Even though the OpenGL API supports accumulation buffers, nobody uses them on consumer hardware because they are not hardware accelerated.

Supporting different compile targets for different hardware makes Cg practical. Hiding the underlying hardware limits is what would cause lots of support headaches.

In the not-too-distant future, it will be reasonable to have profiles that represent the intersection of functionality of lots of different vendors' hardware. People that want portability can use a profile like that, while people that want access (and direct control) to every last hardware-specific resource will use hareware-specific profiles.

Thanks -
Cass

Nakoruru
07-09-2002, 05:14 AM
When you start talking about F-Buffers, I begin to think that we are not really talking about automatic multipassing anymore, because I thought that F-Buffers still had to be used explicitly by the programmer (not by a compiler).

The general case is, I have a program P, which I want to run, but its uses too much of resources A, B, and C so it cannot fit into the memories of the hardware. So, to multipass, I need to rewrite P into multiple programs so that it is now programs P1, P2, ... PN which run each in a row so as to produce the same result.

The problem is that such problems are ussually impossible to do in general! You cannot write programs P1 - PN because that would require that you know what P does, and it is impossible to know what P does without -running- it, and sometimes even then you will not even know (this is a simple fact of computer science, the simplist statement of which is the halting problem).

If rewriting a program requires knowledge I cannot always have then I cannot always rewrite. Humans are good at rewriting because they already know what they want something to do, but a computer is not so lucky.

So, any translate to multipass method must not require rewriting the program. But any method that simply involves saving the state of a program, and then reloading the next segment, and then running the next segment on the saved context is not really multipass, its just a single pass with a lot of overhead. When you consider jumps you may end up reloading previous sections. Since you can never know of a program is actually going to halt (the very basic premise of the halting problem) you cannot predict which parts of the program will have to be reloaded. Also, it may very well be different parts of different vertices/fragments. It becomes a nighmare.

This is not even mentioning the fact that some parts of the program may require more texture units than you have, and that you cannot even predict how many in a general way.

Any solution needs to be general or you will not acheive the goals you say you want. So you cannot say 'typical situations' because 'typical' is not 'general.' Heck, typical situations are probably entractable as well.

Since a general solution is possible, what you want is impossible, and you should get over it. Wait until hardware is so powerful that the limits do not matter. That is why C/C++ are so useful today, not because they tried to shoehorn themselves into small boxes.

Shoehorned C is 16-bit Borland C for DOS, it has severe limits by todays standards (near/far pointers, 64k size limits on arrays). You can preach the lessons we have learned about C all you want, but you forget that our GPUs today are like C64s and 286s in their evolution and C simply did not always work back then.

Wait for the 386 of GPUs! ^_^
But even then if your program is too big or hungry it will not work in hardware.

Nakoruru
07-09-2002, 05:30 AM
ash,

Okay, wow. I now consider myself clued in. I guess there is something I am not considering. Would it be completely impossible to do in a straight forward manner? If so then is the straight forward manner just result in awful performance? Is the P10 something exotic like VLIW? I was assuming risc-like to risc-like.

This tips my feelings towards a virtual assembly language into the negative (where I had been neutral, dispite my defense of it). Imagine that, someone on this forum changing their mind!

Thaellin
07-09-2002, 06:04 AM
Cass:
Thanks for the response. I think I get it, now. So, if I was doing a run-time compiled Cg shader, I could enumerate the compiler profiles available and choose the best profile for my purposes (or alternatively, fall back to another shader that would work on the 'common' OpenGL profile)?

Sounds nice. Makes me think I'll take a second look at Cg. This isn't very OpenGL, but can you post a link to info on creating a Cg compiler profile? I'm not seeing that information on your site.

Thanks,
-- Jeff

ash
07-09-2002, 06:16 AM
Originally posted by Nakoruru:
ash,

Okay, wow. I now consider myself clued in. I guess there is something I am not considering. Would it be completely impossible to do in a straight forward manner? If so then is the straight forward manner just result in awful performance? Is the P10 something exotic like VLIW? I was assuming risc-like to risc-like.

This tips my feelings towards a virtual assembly language into the negative (where I had been neutral, dispite my defense of it). Imagine that, someone on this forum changing their mind!

http://www.opengl.org/discussion_boards/ubb/smile.gif

As you suggest, it could be done much more simply if you didn't care about performance, or about making sure that large programs fit. Say 90% of the work is optimization and catering for pesky corner cases.

It's taking an algorithm expressed optimally in terms of one hardware design (with all of its specializations, redundancies and non-orthogonalities) and translating it into one for a radically different design. A direct translation will run, but slowly.

Ash

cass
07-09-2002, 06:18 AM
Originally posted by Thaellin:
Cass:
Thanks for the response. I think I get it, now. So, if I was doing a run-time compiled Cg shader, I could enumerate the compiler profiles available and choose the best profile for my purposes (or alternatively, fall back to another shader that would work on the 'common' OpenGL profile)?

Thaellin,

That's right.



Sounds nice. Makes me think I'll take a second look at Cg. This isn't very OpenGL, but can you post a link to info on creating a Cg compiler profile? I'm not seeing that information on your site.

Thanks,
-- Jeff

More information about how to develop a Cg profile will be made available soon. Essentially, the compiler front-end will be open sourced, and writing a new profile corresponds to writing a back-end and any associated runtime support.

Thanks -
Cass

Nakoruru
07-09-2002, 06:25 AM
I realize that when one goes about claiming that something is impossible, that that person should probably hedge quite a bit incase they are mistaken.

The most obvious thing a person would point to in order to say I was wrong is the Stanford Shading language. But, that would be a mistake.

The Stanford Shading language uses automatic multipassing to implement a RenderMan like shading language. There are three reasons why this does not apply to current discussion.

First, the problem as I believe is being discussed, is of breaking program P into P1, P2, ... PN. But the Stanford Language does exactly the opposite. It has predefined programs P1, P2, ... PN and it compiles a shader into P using those predefined parts like a VLIW assembly. Unless you want one pass per instruction then the SSL does not solve the problem we are discussing.

Secondly, SSL uses render to texture for temporary values. Seeing as how you cannot predict how many temporaries you may need (in a general way) without running the program, overflowing your memory is very possible and would result in a runtime error. Hardly a 'runs on all hardware with no modifications' solution.

Thirdly, a minor point, the SSL hides parts of OpenGL, most notably the stencil buffer, for its own use. SSL doesn't just hide hardware complexity, it hides OpenGL ^_^

Korval
07-09-2002, 10:59 AM
So, if we could resolve the issue of transparent multipass being obscured from the developer, you would be happy with a high-level language representation?

If by "resolve the issue of transparent multipass being obscured from the developer," you mean "not have transparent multipass and expose hardware limits directly", and it were possible to define hardware limits in a C-style shader, then yes, I would be willing to accept a C-style shader.

Granted, I don't believe it's possible to express hardware limits to a C-style shader (which is the reason I am suggesting an assembly-style shader).


How about this (pseudocode with an OpenGL1x flavor to it):

Absolutely not. As I mentioned before, a simple pass/fail querry doesn't provide enough information. What I need to know is exactly what was wrong with the shader that made it fail. Were there too many vertex shader instructions? Did I use too many constants? I need to know this so I can develop (most likely before shipping the title) a "simpleShader" that will still take relative advantage of the hardware I am faced with.

secnuop
07-09-2002, 12:15 PM
Granted, I don't believe it's possible to express hardware limits to a C-style shader (which is the reason I am suggesting an assembly-style shader).

Maybe I'm missing something - how is it any more possible to express hardware limits in an assembly-style shader? Remember that assembly instructions do not necessarily map 1:1 to hardware instructions, so you cannot simply count assembly instructions and assume that this is the native hardware instruction count.

You have a similar problem with register usage. If some instructions require temporary space to do their calculations, you cannot simply count the number of registers used in a program and assume that this is the native hardware register usage.

(This works the other way too - a "smart" compiler might be able to eliminate parts of the code that do nothing or to collapse two simple instructions down to a more complex one to save an instruction or a temp. For example, a compiler could recognize a MUL and a MAD and collapse this into a MAD, in which case you might have FEWER native hardware instructions than instructions in the text assembly.)

My point is that the only way you can get the native hardware resources is to give the program to the driver and see what it can do with it. This will vary depending on the hardware being used and even depending on how aggressive the driver compiler is at optimizing the program. If you have to give the program to the driver anyway, why not give it to the driver in a more human-readable form?

If you're arguing that an ISV will need more feedback than a simple boolean indicating that the vertex / fragment program compiled or didn't, I agree completely. ARB_vertex_program (and, presumably, any future "standard") has some additional queries that an ISV can use to determine exactly what went wrong.

folker
07-09-2002, 01:21 PM
Originally posted by Nakoruru:
... The general case is, I have a program P, which I want to run, but its uses too much of resources A, B, and C so it cannot fit into the memories of the hardware. So, to multipass, I need to rewrite P into multiple programs so that it is now programs P1, P2, ... PN which run each in a row so as to produce the same result.

The problem is that such problems are ussually impossible to do in general! ...

I claim that it is possible to split every(!) fragment program automatically into multi-passes of (hardware-)limited size so that it can be completely executed in hardware. Maybe I have overseen something, but currently I don't see any agrument against it. (And as far as I understood Matt's latest posts, he agrees with that.)

Addressing also your issues, I try to describe a path how this can be done in the following (using f-buffers as implementation detail, not visible to the user).

Some notes: First, to avoid misunderstandings: I am always talking about future hardware, not including today's hardware. Also note that the below mechanism is not the fastest, it is only designed to show a generic way how it is possible for every shader - there is much room for optimizations and "clever" splitting into multiple passes of course.

First, assume we don't have control flow instructions. This means, you "only" have to implement arbitrary complex fragment color expressions by multipass.

Obviously you can split the fragment program (expression tree) into sub-programs (sub-expression trees, passes) so that every pass does not exceed hardware limits like number of instructions or number of texture units etc. required for this pass.

Temporary variables which have to be passed between passes are stored in a f-buffer. Since the number of temporarily variables is unlimited, the memory required per fragment in the f-buffer is unlimited. This itself is no problem for one fragment alone, however you could get a problem when having many fragments (e.g. run out of video memory for the f-buffer). Solution: You execute only such number of fragments as package so that the f-buffer size does not exceed the hardware limit. In other words, if you have less f-buffer memory available, you cannot execute all polygons at once per multipass, but have to split them into smaller groups. This makes it as slower as less memory you have, but still every shader will run. (See also my post from 07-06-2002 01:17 PM).

Ok, now let's talk about control instructions. We have if's and while loops (including for).

Every if body not fitting into one pass can be splitted down as wanted. For example, " if(b) { A; B; } " can be splitted into two passes " temp = b; if(temp) A; " and " if(temp) B; ". Also nested if's can be split, for example " if(b1) { A; if(b2) B; } " can be split into " temp1 = b1; temp2 = b1 && b2; if(temp1) A; " and " if(temp2) B; " and so on.

About while loops (including for loops as special case): You have to repeat the execution of the body passes. This can be done by executing the body until every fragment has finished the loop. In other words, " while(b) A; " is executed as " temp = true; while(for any fragment temp is true) { if(b) A; else temp = false; } or something like that. If the body A has to be split, all these passes have to be executed multiple times.

For solving problems regarding frame-buffer access and manual blending code see my post from 07-07-2002 03:58 AM. At the end you cannot avoid that one fragment execution producing a result used by some other fragment execution has to be completed first. You cannot avoid this both for automatic or manual multipass.

Some comments about optimization: When having no control flow, the number of instructions remains the same. This means, the automatic multipass is only slower than single-pass due to additional memory access to the f-buffer (corresponding to the additional frame buffer access in case of manual multipass). But usually this should be not too much expensive since the number of temporary variables passed between two passes usually should be controlable. But of course, there may be situations where the f-buffer access may dominate. At the end, it depends on how clever the compiler can split the fragment program so that temporarily variables passed between passes are avoided as much as possible. At the end, the problem is somehow similar to the problem of usual C compilers of having any a finite number of registers and having to avoid memory accesses caused by storing intermediate results not fitting into the registers any more.

Perhaps if's and loop's may be more crtitical regaring performance. Especially, loops which are splitted accrss passes, every fragment repeats the loop body until the last fragment has finished. On the other hand, since today's GPUs usually have parallel working fragment units, this often is anyway the case also in singlecase. So the situation may be not worse for automatic multipass.

Also note that if the hardware already can execute quite complex shaders, this has two advantages: First, most shaders anyway execute single-pass (the ogl2 white papers assumes that). Second, if you spent a complex shader into two or three passes, it is very likely that the additional overhead due to splitting is small compared to the real work of the shader code itself.

Korval
07-09-2002, 02:00 PM
Remember that assembly instructions do not necessarily map 1:1 to hardware instructions, so you cannot simply count assembly instructions and assume that this is the native hardware instruction count.

Take a look at the 1.1 pixel shaders, and nVidia's implementation in Register Combiners. You only get 8 opcodes in 1.1 pixel shaders. However, we all know that each RC can do 2 muls or 2 dots. So, why doesn't 1.1 expose this functionality? Because 1.1 includes the Muxsum opcode, which, in RC's, requires an entire RC. A 1.1 pixel shader of dot products or muls could in theory have 16 instructions, but, because there is the possibility that someone is going to use Muxsum (and try it 16 times), you are limitted to 8 opcodes.

The same would apply to the underlying hardware. Whatever its limitations are need to be converted into the standard assembly-esque language's limitations. It may be that one implementation's own opcodes (or microcode) could support 4096 opcodes. However, if the maximum translated size from the standard opcode to the hardware one is 8, then it only supports 512 standard opcodes. Yes, in theory, this hardware could be doing more. And, quite likely, for any particular shader, it could be doing more. Those are the trade-offs that need to be made in order to get a reasonable standardized assembly-esque shader. And these trade-offs are quite acceptable.



First, to avoid misunderstandings: I am always talking about future hardware, not including today's hardware.

Future hardware is unimportant in this discussion. My GeForce 8 won't need transparent multipass for its shaders, since the hardware limits will likely be quite large. My GeForce 4 will need them to run the shaders that run on a GeForce 8.


Also note that the below mechanism is not the fastest, it is only designed to show a generic way how it is possible for every shader - there is much room for optimizations and "clever" splitting into multiple passes of course.

Proving that a way is possible doesn't show that it will be efficient enough to use in a high-performance application. If it is inefficient, it may as well not be there.

cass
07-09-2002, 02:06 PM
Originally posted by secnuop:
Maybe I'm missing something - how is it any more possible to express hardware limits in an assembly-style shader? Remember that assembly instructions do not necessarily map 1:1 to hardware instructions, so you cannot simply count assembly instructions and assume that this is the native hardware instruction count.


The ARB_vertex_program extension has queryable resource maximums, as well as a specified "minimum maximum". For example, MAX_PROGRAM_TEMPORARIES_ARB can be queried, but must be at least 12.

I'm not sure when this extension spec will be available on the registry - of course it's available already if you've signed the ARB participant undertaking. http://www.opengl.org/discussion_boards/ubb/smile.gif

( http://www.opengl.org/developers/about/arb/legal/participant_v3.pdf )

Thanks -
Cass

folker
07-09-2002, 02:20 PM
Originally posted by Korval:
Take a look at the 1.1 pixel shaders, and nVidia's implementation in Register Combiners. You only get 8 opcodes in 1.1 pixel shaders. However, we all know that each RC can do 2 muls or 2 dots. So, why doesn't 1.1 expose this functionality? Because 1.1 includes the Muxsum opcode, which, in RC's, requires an entire RC. A 1.1 pixel shader of dot products or muls could in theory have 16 instructions, but, because there is the possibility that someone is going to use Muxsum (and try it 16 times), you are limitted to 8 opcodes.

The same would apply to the underlying hardware. Whatever its limitations are need to be converted into the standard assembly-esque language's limitations. It may be that one implementation's own opcodes (or microcode) could support 4096 opcodes. However, if the maximum translated size from the standard opcode to the hardware one is 8, then it only supports 512 standard opcodes. Yes, in theory, this hardware could be doing more. And, quite likely, for any particular shader, it could be doing more. Those are the trade-offs that need to be made in order to get a reasonable standardized assembly-esque shader. And these trade-offs are quite acceptable.


Your technical points are right.

But if you accept these trade-offs asacceptable depends on your point of view.

I would prefer that (for today's hardware) the language does not define limits, but compiling a shader can fail on a particular hardware (and the application can fall-back manually to less complex alternatives). By this you have access to all hardware features.

Furthermore, I definitely would prefer a high-level shader language.



Future hardware is unimportant in this discussion. My GeForce 8 won't need transparent multipass for its shaders, since the hardware limits will likely be quite large. My GeForce 4 will need them to run the shaders that run on a GeForce 8.


Agreed, it future hardware anyway has "no limits" in practise, we don't need all this transparent multipass discussion. Would be really the best future, avoiding all this transparent multipass complexity.

But currently the limits are envolving quite slowly 2 TMUs, 4 TMUs 6 TMUs, 8 TMUs...
The same for fragment shader complexity. So I am pessimistic, but who knows?

But maybe we have to be pragmatically: Future hardware has for example 32 TMUs and allows fragment programs having 256 instructions. Maybe this is near enough to "infinity" that the transparent multipass is not necessary...

But your assumption definitely would make things much, much easier!!! I hope you are right!!! http://www.opengl.org/discussion_boards/ubb/wink.gif



Proving that a way is possible doesn't show that it will be efficient enough to use in a high-performance application. If it is inefficient, it may as well not be there.


Right. But as mentioned at the end of my corresponding post, I think that there are strong arguments that it can be done also very efficiently.

Jurjen Katsman
07-09-2002, 03:21 PM
As far as the vertexshader discussion is concerned. (With the Vec4 vs Scalar discussion, and the assembly vs highlevel), I would simply just like to offer another point of view.

Aren't we really just looking at this from the wrong angle? Aren't we being a little to idealistic? Shouldn't we really just look at what is there, what is wrong with that, what is right about it, and how we could improve on that?

Currently we have NV_Vertex_Progam, and (probably pretty similar) ARB_Vertex_Program. Assembler level languages. This certainly has both problems and benefits:

Problems:

- Relatively hard to write. It's assembler, a C style syntax would be much easier.
- Lacks important features like ifs and loops.
- It's very much based on particular hardware designs. It should probably be more general, avoid very specific hardware constructs.
- Not available on all hardware, maybe partially for the above reason, but mostly just because it isn't yet. Just a matter of time, so not really a 'problem'.

Benefits:

- Clearly defined limits. A shader can have a certain amount of instructions, use a certain amounts of variables and temporaries.
- It's a low level language, which is usually slightly easier to optimise.
- It's a low level language, which could even become a bytecode, which can probably be compiled a lot faster, and with a lot less parsing and potential syntax errors.
- High level language isn't defined. Any sort of high level language could be placed on top of it, as shown by Cg.

Ok, so now that we have this situation, what can be done about it? We obviously would like to keep all the benefits

As mentioned, the 'assembler' problem has already been tackled. Cg is here. Cg might not be perfect, but it does show that an assembler language itself is not something that stops us from having a higher level language as well.

Second, lacks important instructions. Well, so how about we add them? And make it simply a limitation imposed by certain hardware that it can't do them. It can't. So let's just forget about it on those. The only option is a software fallback, and that's hardly what we're after. So until all 'major' features are added (which could be pretty soon), we could see hardware that does or doesn't support such features, and if we want to use them we'll have to consider the old hardware. To me this sounds like a good way to bridge the gap. Yes, this means we'll get DX style VS 1.0, 1.1, 1.2, 1.3. (All being backwards compatible).

And now the last problem, hardware specifics. Many of those can be solved in the same way as the above. Simply a new version that removes restrictions. Old hardware has the restriction, new hardware doesn't. Specific things which are allowed but shouldn't be should be looked at now and removed forever. I have a feeling it won't be all that many things.

So now you might be saying, what about the vec4 vs scalar thing? Well, I think that's pretty obvious. A vec4 language is what we have, a vec4 is what already works. A vec4 language also makes a lot of sense for what we're doing, vertex operations.

A scalar processor has no problem compiling code and running code written in a vec4 languague, so that's probably what it should do. Was making something NOT a vec4 processor a smart idea? Probably not. And if such an assembly alike language where to become standard, an IHV with an non vec4 language would probably have to change their hardware to keep. But I don't really consider that a bad thing.

It's an illusion that having a lowlevel assembly alike language would not allow different CPU designs, I think AMD and Intel have been proving the opposite for quite a while.

So let's sum all this up, by making just a few small changes and adding a few versions/caps bits to the NV_vertex_program extention, and by slightly forcing a certian hardware design direction on certain IHVs that chose a different one, we can solve all the problems, and keep all the benefits.

- We'll have a straightforward and clean vec4 based assembly language, even relatively easy to write for an assembly language, which is very suited to the types of operations we are doing -> manipulating vectors.
- Constraints are clearly defined, ISVs will know on what hardware their shaders will work, how fast, and can provide for fallbacks.
- We'll have all features needed at that level for atleast a while, and IHVs can start competing on simply providing the fastest implementations with the least strict resource constraints.
- It could be implemented nicely by all hardware vendors, and in the future they can model their hardware towards it.
- All sorts of higher level languages can be build on top of it.
- Many ISVs already familiar with it, most IHVs will already have compilers in that direction because of ARB_vertex_program.
- It might actually be ready for use pretty soon, (even with pretty strict resource and feature constraints on some older hardware).

--

Ok, so I'm not the best writer in the world, so my points might not come across very well, but I have a hard time seeing why we need this highlevel in-the-core opengl2.0 shading language, apart from the fact the 3dlabs currently really doesn't like vec4.

If we're into making API decisions based on the desires of particular hardware we might just as well go back and redo the whole API to play nice with say PowerVR tilebased hardware.

Nakoruru
07-09-2002, 03:26 PM
Folker's solution to auto-multi-pass looks a lot more like the Stanford Shading Language and less like the ideas I have been railing against.

I still see problems with different vertexes requiring different pass programs and texture resources at the same time (the same should apply to fragments as well).

See my 'Unpossible' post for details.

Do you not see how loops and if statements could eventually make it impossible for the fragments/vertexes that you have started working on with one program chunk to continue using the same program chunk?

You would have to store these fragments in different F-Buffers and then run the appropriate program on each split result. Now it seems that you need more than one F-Buffer.

I may downgrade my 'impossible as in 2+2=5' to 'impossible as in fitting a 300lb woman into a size 6 spandex dress' You may be able to do it, but its not going to be pretty ^_^

Nakoruru
07-09-2002, 03:58 PM
Simple pathological shader which would demonstrate the problem I see:




pathos()
{
float n = noise1d(Xpos)

if (n < .25) {
my_func_which_uses_all_resources1();
}
else if (n < .5) {
my_func_which_uses_all_resources2();
}
else if (n < .75) {
my_func_which_uses_all_resources3();
}
else {
my_func_which_uses_all_resources4();
}
}


Because of the even distribution of the noise function, each fragment has an equal chance of needing to a different chunk of the original program, each of which requires all available resources in such a way that they cannot be executed in parallel.

Opps! Actually, I think I see the solution (damn), and because of that I am beginning to hate F-Buffers because they make the incredibly ugly possible http://www.opengl.org/discussion_boards/ubb/smile.gif

The 'if' ladder could be done in 1 or 4 passes, depending on how smart the compiler is. If each greedy function could be done in 1 pass each then that means this could be done in 5 to 8 passes.

Actually, I am thinking that this shader is not too unlikely, it could create some really interesting patterns.

Oh, another thing. How do you predict how many fragments that geometry will produce so that you do not overflow your f-buffers?

Also, if its essentially impossible to predict how many passes a shader will take in a loop (and it is, think halting problem), then it is further impossible to keep the F-Buffer from overflowing.

The answer of course is to handle overflow in the F-Buffer by some mechanism. This mechanism can only hope to be horribly slow (for example, interupting execution and dumping the f-buffer to main memory, and eventually the HD). It would be impossible to predict when this would happen and why it did.

Of course, there are probably answers to these questions that involve further hackery. We do not seem to be converging towards an elegant solution, just deeper into more kludge.

folker
07-10-2002, 03:32 AM
Originally posted by Nakoruru:
Folker's solution to auto-multi-pass looks a lot more like the Stanford Shading Language and less like the ideas I have been railing against.

I still see problems with different vertexes requiring different pass programs and texture resources at the same time (the same should apply to fragments as well).

See my 'Unpossible' post for details.

Do you not see how loops and if statements could eventually make it impossible for the fragments/vertexes that you have started working on with one program chunk to continue using the same program chunk?

You would have to store these fragments in different F-Buffers and then run the appropriate program on each split result. Now it seems that you need more than one F-Buffer.

I may downgrade my 'impossible as in 2+2=5' to 'impossible as in fitting a 300lb woman into a size 6 spandex dress' You may be able to do it, but its not going to be pretty ^_^

The solution for if's and loops does not require separate f-buffers. It is much simpler. For if's you only have to split long if-bodies accross multiple passes. That's all. For loops, you repeat the body passes again and again until all fragments have finished their loop. If one fragment terminates earlier, it is on an idle state in the meanwhile (see my "if(b) xx" part of the loop body in by previous post).



The 'if' ladder could be done in 1 or 4 passes, depending on how smart the compiler is. If each greedy function could be done in 1 pass each then that means this could be done in 5 to 8 passes.

Actually, I am thinking that this shader is not too unlikely, it could create some really interesting patterns.


I think you need 4 passes (not more). But the question is: Can you do it better by implementing multipass manually? If not, itr is not a performance problem of transparent multipass, it is a performance problem of your shader.

Agreed, there can be ugly situations, exploding the passes. You have to take care. Also if the compiler does the multipass job for you, you still should keep thinking about your multipass comnpexity instead of simply writing any shader code.



Oh, another thing. How do you predict how many fragments that geometry will produce so that you do not overflow your f-buffers?

Also, if its essentially impossible to predict how many passes a shader will take in a loop (and it is, think halting problem), then it is further impossible to keep the F-Buffer from overflowing.

The answer of course is to handle overflow in the F-Buffer by some mechanism. This mechanism can only hope to be horribly slow (for example, interupting execution and dumping the f-buffer to main memory, and eventually the HD). It would be impossible to predict when this would happen and why it did.

Of course, there are probably answers to these questions that involve further hackery. We do not seem to be converging towards an elegant solution, just deeper into more kludge.


Since the f-buffer only has to store all temporarily variables passed between passes, the size of the f-buffer is the same, independently from the number of loop iterations. So you know the size of the f-buffer per fragment for each shader from
beginning.

So you can determine the numbers of fragments you can handle by dividing the available total f-buffer size by the per-fragment f-buffer-size.

About predicting the numbers of fragments that geometry will produce: One possibility may be to split execution within fragments. For example, the rasterization stops if the f-buffer is exhausted, stores the current fragment pixel position, and continues at this position later. Of course, also requires additional hardware, but should be possible in a reasonable way.

I think swapping out f-buffer would be ugly. You should implement it in such way that the f-buffer never overlows. But I think this is possible in an reasonable way as described above.

It seems to me that future hardware must support the following additional features to support transparent multipass for every shader:
a) Support every basic language feature (e.g. build-in functions, dependent texture lookup etc.). Should come anyway.
b) f-buffer.
c) Abort and continue fragment execution (or altenative techniques to avoid f-buffer overflow).
The rest is job of the driver.

Some comment:
I think that you can create hardware which supports transparent multipass for every shader. However, I agree completely it costs transistors and driver developer manpower. So I also understand Matt's position of saying that "I don't want to spent resources into that, it is not worth doing". I come to a different conclusion because it seems that I give more priority to "every shader runs on every hardware" than for example Matt. On the other hand, if hardware limits anyway will be big enough for future hardware, it indeed may not be worth to support transparent multipass. But currently I think that a) transparent multipass is easier to implement as it may look at the first place, and b) "every shader runs on every hardware" is really important.

BTW, according Orwell's "1984" 2+2=5 is true. http://www.opengl.org/discussion_boards/ubb/wink.gif


[This message has been edited by folker (edited 07-10-2002).]

[This message has been edited by folker (edited 07-10-2002).]