View Full Version : Response to "Cg won't work" op-ed in The Register

06-19-2002, 08:34 AM

06-19-2002, 09:28 AM
yep, i've read the article too, the guy has serious counter arguments too :)

06-19-2002, 11:50 AM
I'm glad nvidia have produced Cg. It's a bold step. I just hope to god that ati, matrox and 3dlabs produce 'profiles', otherwise it will be utterly pointless.

06-19-2002, 12:12 PM
Developer support will force them to create profiles.

If NVIDIA supplies a default profile for vanilla OpenGL and developers use Cg, ATI et.al will then be pressured to improve their performance by providing profiles that exploit their hardware & extensions.

That at least is one scenario NVIDIA is probably hoping for. You want profiles? Support Cg.

06-19-2002, 12:20 PM
Or maybe create their own versions of Cg (yawn) - then supporting each others profiles. Much like loading a MS Word 6.0 file into Word Perfect http://www.opengl.org/discussion_boards/ubb/wink.gif

06-19-2002, 12:40 PM
Pardon my ignorance but...
What are profiles?

06-19-2002, 12:47 PM
Don't you also feel that OpenGL is a large leap behind. I am a bit worried about the fragment shader functionality. How long will it take until we get a fragment shader for OpenGL. I want to use Cg or equiv but as long as the output is so varying between profiles, it is difficult to make something generic......

06-19-2002, 12:58 PM
NV20 fragment profile is expect "soon". Probably when nvidia finish their NV_fragment_program extension, which will probably combine RC's and TS's into 1 easier API, like ATI's.


06-19-2002, 01:55 PM
I sure hope it is soon. My Cg fragment shader i wrote for OpenGL is just dying to be used. http://www.opengl.org/discussion_boards/ubb/smile.gif At first I didnt know that the OpenGL fragment part of Cg was not working yet and I just about pulled my hair out trying to figure out why in the heck the shader loading func was crashing horribly. Then I read that post on cgshaders.org which talked about fragment Cg progs under OpenGL not being finished. Doh! http://www.opengl.org/discussion_boards/ubb/smile.gif Maybe i'll give myself a crash course on D3D so I can test it out. It's been a long time since I used D3D 8 last. Well, since it first came out actually.


06-19-2002, 09:48 PM
Originally posted by SirKnight:
Maybe i'll give myself a crash course on D3D so I can test it out.

No! http://www.opengl.org/discussion_boards/ubb/biggrin.gif You don't have to do that to use fragment programs..

Read the post from Cass in the following topic: http://www.cgshaders.org/forums/viewtopic.php?t=24&start=20

Just use the dx8ps profile for fragment programs and use nvParse (the version that supports ps1.1, which should be at NVIDIA's site 'soon') to set the regcombiner + texshader states and switch to the fp20 profile when it's done/available (with next release at siggraph?)

06-20-2002, 07:37 PM
Given how truly little we know about NV30, this guy is making some pretty bold claims about nVidia having the best hardware. They just released GeForce 4, months after the Radeon 8500, and the 8500 is still, fundamentally, more versatile and more powerful than the GeForce 4. Given that, I don't think it's wise to be throwing around comments like this:

...and although Nvidia is the 800 lb gorilla of graphics, they also have the most interesting and innovative hardware currently on the way.

ATi beat them once before. They've got a pretty good head start (even a fragment_shader API that is very extensible and flexable). To be D3D 9-compiant in the pixel shader department, ATi simply has to add a few more passes into the hardware. nVidia has to build an entire new pixel pipeline. Granted, because of this, nVidia could build a very good one, but it is just as likely that ATi can build off of their current one and make one better than nVidia's.

And that says nothing about what 3DLabs is bringing to the table.

06-20-2002, 08:10 PM
You assume that the writer knows no more than you.

Anyway, I'm glad someone corrected the original article, it was terrible. You don't have to pick favorites in the graphics card war to see that.

Zak McKrakem
06-21-2002, 05:04 AM
Very interesting to read: http://www.extremetech.com/article2/0,3973,183940,00.asp

06-21-2002, 05:23 AM
Very interesting,

the battle lines are getting drawn :-)

06-21-2002, 05:50 AM
Hmmm, that sounds pretty interesting richardve, I might have to see if I can get that working, that's if the new nvparse is online. http://www.opengl.org/discussion_boards/ubb/smile.gif


06-21-2002, 06:12 AM
Am I being paranoid, or has nvidia knocked up this Cg language because it's unsure that it can produce competitive hardware that will support the shading language proposed in gl2.0 ?

06-21-2002, 06:41 AM
Originally posted by knackered:
Am I being paranoid, or has nvidia knocked up this Cg language because it's unsure that it can produce competitive hardware that will support the shading language proposed in gl2.0 ?

knackered, I think you're just being paranoid.

I don't know if NVIDIA can produce a competitive chip that supports the gl2.0 shading language but that being said I don't know if anybody can...

As far as I am concerned, Cg is a good thing and I don't understand why people keep trying to put it down even before trying it. Bottom line is, if you don't like it, stick to ASM-style shaders.... BTW, why did you move to C/C++ instead of staying in the fantastic world of x86???

As to whether NV30 will be a truly fantastic chip or not (cf. Korval post), I'd say it will (which does not mean that ATI or someone else cannot produce something better).

Have I missed something important in the past few weeks? It looks like everyone is having something against NVIDIA these days...



06-21-2002, 07:04 AM
knackered, there's absolutely nothing to back up that suspicion, and something like Cg is definitely not "knocked up". It looks like an intentionally minimalist low level approach to shader compilation.

06-21-2002, 07:35 AM
Yes, 'knocked up' was a bad choice of words.
You're right, I've no facts to back up my suspicion.
All I say is this - whenever d3d is mentioned on this newsgroup, you people have been quick to point out that opengl is an open api, governed by a body with no single commercial interest - whereas with d3d, microsoft plays the tune that everyone must dance to. It seems to me that (albeit to a smaller extent) nvidia is attempting to do the same with opengl. They will govern what can be added to the language...but this is ok by you guys? It's in some way morally different is it?
Eric, get in the real world - nobody is anti-nvidia, just healthy suspicion of a commercially driven organisation.
If you think NVidia thinks there's room for more than one consumer hardware vendor, and that it's "healthy competition" then I suggest you read up on how to run a business successfully - high in the priorities is to eliminate the competition.

[This message has been edited by knackered (edited 06-21-2002).]

06-21-2002, 07:40 AM
Originally posted by knackered:
Eric, get in the real world - nobody is anti-nvidia, just healthy suspicion of a commercially driven organisation.

I am sorry but I am in the real world, you aren't: I am very conscious of the commercial issues behind what NVIDIA is doing (although I think the guys who are developping Cg are not the ones who are commercially interested in it...).

You complain about this commercial side of things while this is something you should expect these days. Who's in DreamLand then?



06-21-2002, 07:47 AM
Disclaimer: Note that I haven't worked at all on Cg. Also keep in mind that the below are all _personal_ opinions, not necessarily reflecting the views of my employer. Indeed, I'm actually on vacation from work right now, after my recent college graduation...

Eric, your post reminds me of an interesting point -- the "ASM" vs. "HLL" issue, as it relates to the API. The way I see it, what makes the most sense from an API standpoint is to expose an assembly-level language from the graphics API (recall that OpenGL is supposed to be a "low-level" graphics API), and to layer a HLL compiler on top rather than integrating it into OpenGL. I think it makes very little sense to make a shading language part of the API itself.

This is the approach that we have taken with NV_vertex_program and Cg, and I think it's the right design decision. You can precompile shaders; you can examine what the assembly looks like; you can have an API-independent runtime layer that works with more than just OpenGL; and, of course, you can still write assembly programs when you need to!

This is a personal beef of mine with the OGL2.0 proposals -- I really don't think it makes any sense to put a HLL inside the OpenGL API.

OGL1.4 is taking the right approach by exposing an assembly language from the API. The assembly language can be upgraded later, but it will certainly be functional in its initial form and a viable compiler target platform. So, a sufficiently inclined individual or company could write a compiler from the proposed 3Dlabs shading language to ARB_vertex_program, right now, today. The only reason to wait and make it part of "OGL2.0" is for marketing reasons.

If you wanted to be really picky, you could have a shading language as part of a "GLU 2.0". This layer would simply call glLoadProgramNV (or the ARB_vertex_program equivalent thereof). This analogy isn't entirely accurate because GLU generally doesn't have a driver layer that allows different vendors to plug in their own implementation. However, this analogy _does_ make it clear how a separate layer can work, and also illustrates how "standardization" is a straw man for putting the shading language in the base API. It's perfectly possible to standardize a shading language and even a shading runtime that sits at any layer.

A final argument that has been made is that it is valuable somehow to _not_ support an assembly language, because it eliminates some sort of backwards compatibility burden. But since ARB_vertex_program isn't going away (much less NV_vertex_program or DX8), this is a burden that will already exist by developer demand. In the very worst case, you could "compile" an ARB_vertex_program into a high-level program. This assumes that the HLL has the same set of program inputs (i.e. vertex attribs) and outputs as the ARB language; but I think that's a reasonable assumption. There's no reason to change the semantics of input/output behavior just because you are putting in a high-level language.

Oh, what I'd give to be able to have some honest discussion of the 3Dlabs OGL 2.0 proposals... just look at the poll on this site about whether you've reviewed the proposals. If you've reviewed them, you have the choice of either "fully support[ing]" them or "want[ing] to learn more". There is no option that lets you say that you disagree with many of the design decisions, as I do.

But I've already probably spoken too much about this sensitive topic...

[Then again, isn't that precisely the problem? Those of us who've worked on drivers for years, who live and breathe OpenGL, who may have many criticisms and disagreements have our tongues tied for political reasons, while developers just look at the proposals and see that there's all this stuff in them, and wouldn't it be nice to just have every feature in the world... I see it as a set of tradeoffs and design decisions, and see what I think are the wrong ones being made, and I can't even _tell_ anyone what I'd like to see changed, even if they might agree with me.]

Okay, now I should really shut up.

- Matt

06-21-2002, 07:56 AM
I hadn't thought of this problem (i.e. the HLL being part of the API) but reading your post I begin to think that it may not actually be such a good idea...

I may start a war here but the main problem I have with OGL 2.0 is that it is just that: a proposal.

Seeing how long it takes for the ARB to promote one single extension, I cannot see how OGL 2.0 could be made available before several years (I mean, with proper drivers, not just a sample implementation).

As far as I am concerned, OpenGL 1.4 seems to be what I have been waiting for, and I must say that Cg looks like it could be quite helpful in this case....

That being said, that is just my opinion. After all, I am just a poor lonesome coder...



P.S.: another thing I've been dreaming about is OpenML... Think I saw the first announce about that 2 years ago. Where are they now?

06-21-2002, 09:34 AM
Originally posted by Eric:
P.S.: another thing I've been dreaming about is OpenML... Think I saw the first announce about that 2 years ago. Where are they now?

Well, the 1.0 specification was released a while ago, if that's what you mean.

My 2 pence on Cg... the concept is great, the politics suck. A language controlled by one vendor is always going to be developed in such a way as to favour that vendor. So either it flops, or (more likely) it's pulled into DX9/DX10 as an "official" high-level shading language, cements NV's market lead and discourages experimentation by other vendors. (Man, I'm getting old... I can remember the days when you could still use the word "innovation" and have it mean something.. ;-) )

Given that Cg occupies about the same niche as the GL2 shading language, it's a bit disappointing that NV couldn't lend their efforts to developing a real open standard instead of producing an encumbered de facto one.

As I've said before, I'm a big fan of NV. This isn't an anti-NV rant, it's just pointing out (as others have done) that what's good for NV-the-company isn't necessarily good for the users. Or for that matter, for NV tech guys, for whom a healthy competitive marketplace means great salaries and cool work to do.

06-21-2002, 09:53 AM
Originally posted by Eric:
I am sorry but I am in the real world, you aren't: I am very conscious of the commercial issues behind what NVIDIA is doing (although I think the guys who are developping Cg are not the ones who are commercially interested in it...).

You complain about this commercial side of things while this is something you should expect these days. Who's in DreamLand then?



Eric, I didn't mean any offense to you.
Why the agressive reply? Have you shares in nvidia plc?
The more I think about this whole Cg thing, the more depressed I get (in terms of the future of opengl, not life itself! http://www.opengl.org/discussion_boards/ubb/smile.gif ).
OpenGL is just becoming a mongrel - a messy experiment.
Now go on, insult me!

(I've just discovered the bold tags, in case you haven't noticed!) http://www.opengl.org/discussion_boards/ubb/smile.gif

P.S. I'm gutted about the England v Brazil match - so excuse my bad mood.

06-21-2002, 11:12 AM
I don't think Cg's design really "favors" NVIDIA in any way. The _profiles_ may be "dumbed down" to present hardware, but the language itself is just a programming language. It's not as though the C programming language "favors" x86, Sparc, Mips, or Alpha... and it would likely be safe to say that future profiles will support larger subsets.

I suppose there is one other difference between a layered and a built-in language -- handling of multipass. Unfortunately, transparent handling of multipass is very hard to do in an OpenGL driver; you'd likely need hardware support for an F-buffer, or it would only work in certain special cases. It makes more sense in a scene graph API or game engine to talk about transparent multipass. Even there, though, it can be difficult if all you have is destination alpha to carry forward intermediate results. A pbuffer or aux buffer could help.

In practice, the driver might just have to fall back to software for any shader more complicated than it can handle. Transparent multipass would be cool -- sure. Feasible? Unclear.

Similar problems are faced in designing a multivendor vertex/fragment programmability API, because not every vendor has the same underlying instruction set, and because in many cases underlying restrictions can't be revealed to end users. I am quite happy with the approach the ARB settled on for this particular issue (i.e. limitations on how "big" programs can get, and associated queries) with ARB_vertex_program.

- Matt

06-21-2002, 01:53 PM
My 2 cents:
OpenGl 2.0 is something more then just shader languages.
For me, shaders are not even primary priority. More important is fixing well known OGL flaws:

- render to texture (please recall recent threads *full* of bitching on wglShareLists, wglMakeCurrentRead)
- texture objects (texture-targets are obsolete and annoying)
- vertex array objects (no more CVA, VAR, VAO, MOB mess)
- shader objects (no more loose bunch of states (like Nv RC or tex_env) or "exotic" Ati-style interfaces)
- synchronisation (NV_fence & NV_occclusion_query: seems like beginning of a another mess)
- unified object interface (one set of glGen+glDelete+glIs+glPrioritize is enough)

It is not that important to me which shading language(s) will be available with above interface.
Provided you stick to unified interface (like attribute binding, loading constants), you might
implement any shading language, high or low level:

GL2 shaders, Cg shaders, DirectX 8, 8.1, 9 shaders,
Nv VP, Arb VP, Ext VS, Nv Parse, Ati FS, Matrox FS, Nv30 FP,
or even texture_env_combine/crossbar/route

You may put some of the above into core, others into extensions, others into glu, whatever.
But changes proposed in OGL2 are necessary. Retarding them for any reason (while DX lives free from above problems) is harmful for OpenGl. So I'm very disappointed to see Nvidia officially declaring "cold war" to OGL2.

As for the article at extremetech, it is funny.
In beginning it tries to suggest that the main obstacle to see GF3 & R200 utilised in full extent is the complexity of asm-style programing.
So it is not because most people have DX7 HW, nor because DX8 HW is priced over 200$.
It is all because programmers are not smart enough to write 12-line asm programs http://www.opengl.org/discussion_boards/ubb/rolleyes.gif

[This message has been edited by Carmacksutra (edited 06-21-2002).]

06-21-2002, 02:38 PM
There are things in OpenGL that could use fixing, but in many cases I disagree with the approach taken in the proposals, and in others it's not clear that it's _really_ worth fixing them.

Obvious example that you alluded to: it's lame how texture enables work in OpenGL, with the precedence and all. But it would be difficult to make any change that both improves the behavior _and_ preserves compatibility.

Some of the things you are talking about will get fixed up simply by standardizing a low-level assembly language for vertices and fragments. (For example, with a fragment program, all the texenv state can just get ignored.)

Others are solely a function of how WGL works, and really can't be fixed by the ARB at all (wglShareLists).

I do think OpenGL needs a unified object interface, but have issues with the specific proposal.

What I personally (again, not speaking as an NVIDIA employee, but as someone who cares about OpenGL and its future) would like to see would be for everyone to take a step back from what are currently fairly concrete proposals and instead look at design decisions. There has been little dialogue on the really tough, but really important questions: "Does this feature _really_ belong in core OpenGL? How should this feature be exposed in the long run in OpenGL?" Once you have a concrete proposal written, it's hard to give feedback beyond the level of "I found a typo here" or "this doesn't make sense" without seeming really crass, i.e., "you should delete this entire section from the proposal." It's especially hard when everyone works at different companies and (to put it mildly) not everyone trusts everyone else.

In the short term, I think the best course for OpenGL and the ARB is clear: get OpenGL 1.4 out the door, and then (in perhaps another 6 months) start thinking about fragment programmability. It's important that standardization processes not be rushed. Rushing things guarantees lots of unhappiness with the outcome. For example, I think it would be downright silly for the ARB to start working on an ARB_fragment_program extension today.

That's just what you have to live with when it's a standards body and not Microsoft. Microsoft can talk to each vendor in private and come up with some sort of compromise in advance. The ARB doesn't work that way. Each way has advantages and disadvantages.

There are interesting meta-discussions to be had here about "how to design a good standards body".

- Matt

06-21-2002, 03:30 PM
Hey Matt -

You mentioned that it is currently very difficult, if not impossible, to support a transparent multi-pass compiler.

This is probably a naive question, but why can't we do away with the need for (most) multi-pass rendering by putting some simple looping capability into the hardware?

Now, let me explain why I think this is reasonable before everyone jumps on me http://www.opengl.org/discussion_boards/ubb/smile.gif

Don't the Geforce3 & 4 use a loop-back mechanism to apply 4 textures or 8 register combiner stages? What stops you from generalizing this? Why not allow n loop backs for 2*n texture applications?

Wouldn't this also work for vertex programs? For example, if I had a 256 instruction vertex program, couldn't the driver just split it up into 2 128 instruction programs, switch the programs when a vertex reaches the end, and run the vertex through again?

Probably good reasons why this isn't possible, but I'd be curious to hear them http://www.opengl.org/discussion_boards/ubb/smile.gif

-- Zeno

06-21-2002, 04:08 PM
Originally posted by Zeno:
This is probably a naive question, but why can't we do away with the need for (most) multi-pass rendering by putting some simple looping capability into the hardware?

I just kindof answered this same question in another thread recently. Let me repost my response:
Pretty much. The difficulty with using a feedback into the pipeline for a second/third/etc pair of textures is that all of these textures need to be accessible. Even texturing from video memory is painfully slow. Graphics hardware use a texture cache to buffer small parts of of a texture for extremely fast access. When you do a feedback loop, you either need to
A) Every pixel, load the first texture set, draw, flush the texture cache & load the next texture, draw, repeat.
B) Have a mechanism for sharing the cache among multiple textures
C) Have separate caches for each texture.

The problem with A is that its a performance waster. You waste tons of bandwidth loading and unloading texture from vidmem to cache. On top of that, you will have excessive idle time while waiting for the textures to load, unless you create some type of batching system in the hardware (ie: process 100 pixels partway, then switch texture and continue). Even then its still a memory access hog.

The problem with B is that you then only have half the cache available for each texture stage (or if you want to loopback more than once, you only get 1/3, 1/4, etc the ammount of cache available).

C is about the best option, but I think by the time you get that far in the hardware, I think you probably a large portion of the way to just making those fully separate texture units.

Perhaps matt, working a little bit closer to the hardware than I do, can confirm my above thoughts on the matter.

06-21-2002, 06:43 PM
SGI once designed hardware with the kind of 'recirculation' described. They didn't fully implement it in the end. I wouldn't be surprised if someone else does it. It would be easy to handle an arbitrarily large multitexture scheme with vanilla daisy chaining, but it would be more difficult to make that work with newer crossbar combiner style of texenv (or your equivalent) which is essential to make it work. The way stuff can be combined together you'd probably long for more registers if you were using massively multitexture shaders, but it's max textures seems to be the problem for now. Yes at some point you'd totally thrash your cache, hopefully you have a bigger cache and many more registers on that hardware, worst case performance drops significantly at some point.

[This message has been edited by dorbie (edited 06-21-2002).]

06-21-2002, 06:52 PM
The short answer to your question is: higher resource limits cost $$$, and no matter what you still have a finite resource limit.

There's all the difference in the world between being able to perform a "large" number of operations per pass and an "unlimited" number per pass.

Obviously a truly "unlimited" number is impossible because computers are finite. Even a software implementation will always have _some_ limit. So the API *needs* to have a way to say "no!" at some point on the grounds that "this program is too big for resource X". (The ARB_v_p working group discussed this problem at great length...)

Okay, so you might say, "yeah, eventually that might happen, but surely you could make the limits high enough that no one will ever really hit them?" I think this evades the question (think the infamous 640KB), but, for the sake of argument, I'll pretend to concede this point.

Let's make the problem easier by assuming that there is no branching at all. Branching/looping only makes it harder.

Let's also assume that each vertex and each fragment is fully independent, i.e., they don't write to any state that affects the others. One way you could break this assumption would be to let programs write to a "constant" register, sort of like how vertex state programs do in NV_vertex_program. Another would be to let fragment programs do programmable blending or the like inside the program by allowing you to get the current pixel's color or Z or stencil as a fragment input. Again, this stuff makes things harder, so let's drop it for now.

You can draw a dataflow graph of any shader. Many nodes are simple math operations. There are also certain "special" nodes. For example, an "interpolator" node converts a per-vertex quantity into a per-fragment quantity. And then there are the input and output nodes. Input nodes correspond to vertex attributes, and output nodes correspond to the final shaded color (and possibly Z) of a pixel. Other nodes contain constants used by the program. You would probably use a special node to do relative addressing. Another would probably be a 'position' node that indicates the need for rasterization.

One very special node is a texturing node. It would take a texture coordinate as input, refer to a specific texture, and do a lookup.

With the right set of nodes like this, we can now draw every shader as a DAG.

Output nodes do not constitute a resource; there are a fixed number of them possible.

Input nodes are definitely a resource that is tough, if not impossible, to virtualize. Adding more vertex attribs puts more burden on the driver and really boosts the size of RAMs in the hardware.

Interpolators are not a problem math-wise, because you can reuse a single interpolator as many times as you want, but it is still necessary to _store_ the computed vertex somewhere for rasterization and interpolation. This is likely a fixed-size RAM.

Math operations can easily be looped, but you need to have enough temporary registers. Temporaries can get _very_ costly -- big multiported RAMs.

Textures all need to fit in RAM -- bind too many textures and you're in trouble. You also only have so many API slots for _binding_ textures, and adding more slots may lead to other assorted HW costs.

It depends to some extent, but you probably can build something that can handle "a lot" of instructions (total DAG nodes in this framework) and constants without too much trouble.

There are two big approaches to splitting things up: spilling and breaking up the DAG into workable pieces.

There's plenty of space for spills -- video memory, for example. But hardware that can spill extra (e.g.) temporaries to video memory could get rather complicated and slow. Spills also only deal with restrictions on temporaries, in general.

Breaking up the DAG can work also, though you can construct some rather degenerate DAGs where this falls apart. Again, you need temporary storage off-chip to store results from each "pass". F-buffers are problematic because they have unbounded size. Anything that relies on "1 fragment per pixel", like using a pbuffer, breaks when the wrong depth test/stencil/etc. modes are used. In practice you could probably use an F-buffer and flush it whenever it fills up, but this would be worse in performance than letting the app do its own multipass, and I think you can construct degenerate cases where you will get exponential (well, at least greatly superlinear, not sure if it is really exponential) blowup of runtime.

So is it impossible to do all this stuff? No, not impossible. Is it a good use of fixed hardware resources? Definitely not. These hardware resources are better spent putting more math units in the chip. And even then, say you _did_ build enough stuff to support "absurdly big" shaders. Then old hardware would still have little choice but to fall back to software, which really isn't what you want, and you still will hit that limit _somewhere_ and need the API to reject certain sets of programs.

Once you recognize that the API needs to be _able_ to reject programs, the nature of the problem domain changes. Now you can ask the real questions here, like "how many temporary registers are _really_ useful? Does anyone _really_ want 256 4f registers?"

If you truly want _unlimited_ flexibility, you might as well be using a CPU. Graphics is all about making the common case fast.

- Matt

06-21-2002, 07:04 PM
Nobody is dumb enough to really suggest unlimited texture. When they say unlimited (nobody actually wrote that) they mean considerably more that ohh... 4 for example. Straw man implementations of unbridled complexity don't mean that some number like 32, 64 or 128 wouldn't be interesting. It's actually simple to imagine a situation where each few textures (or even just a few interpolated parameters, same textures more combiners for most params) buys you a full bump mapped shadowed fragment light source in a single pass, just as a for instance. Recirculation just becomes one interesting way to implement this.

[This message has been edited by dorbie (edited 06-21-2002).]

06-21-2002, 10:40 PM

Yep. I agree. 4 is definitely not enough for the long run, and obviously there's going to be a limit somewhere no matter what. But the developer may want to write shaders without regard to hardware limits (and still get "good" performance even over those limits), and that's certainly a valid desire.

- Matt

06-22-2002, 12:24 AM
Dorbie & LordKronos:

I see the problem with cache thrashing...I should have thought of that.


As Dorbie mentioned, you made a bit of a straw man out of my suggestion by assuming that I meant 'n' had to be able to go to infinity. I realize that there are limits that are going to come into play somewhere...whether it's not enough RAM or the shader is just too slow for it's intended purpose.

Also, I meant to suggest loop-back only in the parts of the pipeline that are currently programmable, so I didn't mean to suggest that an interpolater needs to loop back as well (this is currently in a "hidden" part of the pipeline). That wouldn't really be useful anyway since it's input is per vertex and it's output per pixel.

Anyway, tell me if I understood the rest correctly:

Spills. I sorta see what you're saying here. Some features, like stenciling, currently require multi-pass rendering simply because the stencil buffer must be complete before it can be used to affect further rendering.

Similarly, if you want to be able to stream vertices through the program and not do one vertex at a time, you have to have somewhere to store a batch of vertices between application of the first and second programs. I can definitely see this being a pain in the butt, especially since you don't know how many vertices are coming.

Thanks for the detailed answers http://www.opengl.org/discussion_boards/ubb/smile.gif

-- Zeno

06-22-2002, 01:04 AM
On the subject of HLL vs ASM - I think making a low-level language part of the core API is wrong.
1. It may not map well to all hardware. Imagine a TTA architecture, where there's only one instruction - MOV. The output of one unit is fed directly as input into another unit. The unit may perform multiplication or texture read, just work as a temporary register or may change the instruction pointer if the passed parameter is 0. The current register based low-level shaders will not work very well with that. This is just an example.
2. It may not scale well with future hardware. PS1.0 are not as optimal as PS1.4. on ATI hardware.
3. Transparent handling of multipass. I am sure some vendors will have it implemented in their drivers and this is a killer feature.

And the shaders are not the most important thing. The current state of OpenGL is a mess, OpenGL 2 is the only way out of the issues with the synchronisation, render-to-texture, vertex arrays... If somebody knows of any other plans to handle these problems in a clean, platform independent way - please tell me. Just because some people may not like how textures work in OpenGL 2.0 (what is wrong with that anyway?) is a lame excuse not to provide all that goodness in the hands of the programmers. If there is a platform independent way to do something that most ISV and IHV agree on (I think this is the case with OpenGL 2.0) - it MUST be implemented, even if you don't agree with it.

06-22-2002, 04:08 AM
Originally posted by mcraighead:
That's just what you have to live with when it's a standards body and not Microsoft. Microsoft can talk to each vendor in private and come up with some sort of compromise in advance. The ARB doesn't work that way.

I think I'm right - NVidia aims to be the Microsoft of OpenGL. It may be a layer on top of opengl, but it's a layer that does all the difficult stuff. Stuff outside this layer (in OpenGL) would simply point the state machine at array bases, and call the compiled shaders. If it becomes popular with developers (and why shouldn't it), then NVidia will have effective control over the core features of OpenGL. Maybe this will be a good thing - ATI, Matrox and most definitely 3dlabs won't agree, though.
Come back SG, all is forgiven!

06-22-2002, 07:30 AM
Originally posted by mcraighead:
Obvious example that you alluded to: it's lame how texture enables work in OpenGL, with the precedence and all. But it would be difficult to make any change that both improves the behavior _and_ preserves compatibility.Maybe this is too maximalistic approach?
I dont think it would be useful in practice to mix old and new commands when both control the same part of GL machine (eg. texture binding).

I think texture binding is not hopeless case to design reasonable interoperablity of 1.x and 2.0 objects (to say short: introduce new target, having highest priority for 1.x commands, and being invisible for 2.0 commands)

In case of any *really tough* conflict, the simplest (and the best IMHO) way would be making complete disjoint of particular group of states between 1.x and 2.0. So 1.x commands work with 1.x states, and 2.0 commands work with 2.0 states.

Of course this would not allow using any 1.x tex_env nor NvRC with 2.0 texture objects. But this is just one more reason to include textual versions of tex_env, NvRC, AtiFS into 2.0 shaders. In general, i think legacy functionality should be upgraded to fit into 2.0, not reverse (cripple OGL2 to fit legacy)

Others are solely a function of how WGL works, and really can't be fixed by the ARB at all (wglShareLists).I was meaning that with 2.0 buffer-objects the problems with mentioned wgl*** (when used with RTT) would not exist.

06-22-2002, 08:54 AM
I was thinking of how Cg is suppose to be supported and about the benifits of vendor specific extensions.

Is Cg suppose to take over this whole shading language mess. The Cg website says if something is not supported, then it bypasses or something. Cg will have to be up to date all the time.

The nice thing about extensions was that each vendor showed one way of doing things. Maybe we, the developers, should vote for which ones better (for extensions that show similarities of course).


06-22-2002, 12:08 PM
Matt yes, clearly to free implementations ultimately from the shackles of a fixed pipeline you need more stores to pass data between passes which hopefully translates to fewer restrictions (or actually greater performance) on compiled code of arbitrary complexity. The ultimate difference between this and a recirculation scheme is that the multipass scheme might remain fragment cache coherent reguardless of complexity, but to be equivalent will require much larger framebuffers to store more data over the full frame, recirculation might thrash cache but requires less framebuffer memory although it would need significantly more registers. That seems to be the real tradeoff (ignoring geometry for now).

[This message has been edited by dorbie (edited 06-22-2002).]

06-22-2002, 01:59 PM
Is there some reason why we can't have a framebuffer dotproduct blending mode? And MAX/MIN logic? You know, with colour compressed vectors, etc.?

06-22-2002, 02:19 PM
I think it would be possible in theory. You might want to simply make destination color available as a combiner register instead of replicating the whole gamut of texture operations orthogonally in the blend equation.

For efficiency maybe you could just have some post depth test combiner with the extra register available. OTOH that's less flexible than a general destination color register available to any combiner.

Taking things to their logical conclusion the depth test just becomes a texture operation with source and destination registers (the operations already exist although they don't cull fragments), but it would take more thought to preserve optimizations like coarse Z rejection.

06-22-2002, 07:12 PM
I believe you can already do DOT3 with the frame buffer.

Assuming you have render-to-texture, you can treat that texture as your frame buffer for all operations except the last. Or maybe even for the last operation, if you have a fast full-screen blit to tack on in the end :-)

ARB_backbuffer_as_texture anyone?

06-22-2002, 08:20 PM
Bah it's not the same :-)

06-23-2002, 11:42 PM
Originally posted by knackered:
Eric, I didn't mean any offense to you.
Why the agressive reply? Have you shares in nvidia plc?

I probably misinterpreted the tone of your post then. Sorry for the agressive reply http://www.opengl.org/discussion_boards/ubb/frown.gif.

And no, I haven't got any shares in NVIDIA but if they want to give me some, I am open to discussion http://www.opengl.org/discussion_boards/ubb/wink.gif.



06-26-2002, 03:58 PM
Originally posted by mcraighead:
I don't think Cg's design really "favors" NVIDIA in any way. The _profiles_ may be "dumbed down" to present hardware, but the language itself is just a programming language. It's not as though the C programming language "favors" x86, Sparc, Mips, or Alpha... and it would likely be safe to say that future profiles will support larger subsets.
- Matt

Cg is not "just" a programming language. It is a "cool" Beta toolkit for NVIDIA to gain/keep market share.

The Cg language is focussed on supporting the hardware capabilities of the nv2X and upcoming nv30: declaring that && and | | may "not (necessarily) short circuit"; many reserved keywords which I assume NVIDIA will see fit to define as and when the capability becomes available in NVIDIA hardware; etc, etc. It doesn't even have define proper control flow!

I like the idea of Cg - but not the execution. I hope this changes. For starters, control flow should be already defined as per the OGL2 shading language (which has a much cleaner language spec than the current Cg spec). The same goes for expressions which create side-effects.

Basically, any feature of a program that is not supported by the target you are compiling for should be disabled via the Profile - not via the language specification!

This combination of a GOOD language definition plus Profiles would be much more attractive.

For "just" a programming language I give it a C+. I know you could do much better it you had the will.