PDA

View Full Version : Future of ARB_fragment_program ?



LarsMiddendorf
02-01-2005, 04:52 AM
It is possible to use all the new ps30 features of the geforce6 with OPTION NV_fragment_program2 and ARB_fp. Is a vendor independent version of this extension or something similar planned or is GLSL the only solution ?

idr
02-01-2005, 06:19 AM
Other than Nvidia, the vendors that I have talked to aren't very enthusiastic about keeping the "assembly language" interfaces alive.

simongreen
02-01-2005, 10:21 AM
This came up at the last ARB meeting, but none of the vendors other than NVIDIA seemed interested in new assembly language programmability extensions (i.e. ARB_fragment_program2).

I personally think this is a shame. It's true that the assembly doesn't bear much resemblance to what is actually executed by the hardware these days, but I still think it has value as an intermediate representation.

Zengar
02-01-2005, 11:48 AM
You seem to like writing optimising compilers, Simon :D Oh ok, I forgot, you already have it for NV_fragment_program2 :-)

Why do you need two different things to do the same work? Agreed, a intermediate language will be very handy(for example if one wants to write his own shading language), but it shouldn't be an assembly language but rather some sort of compiler-fiendly language with fairly universal grammar. I can understand your concern, as this brings a lot of problems to Cg. But I don't see how you can convince other vendors to extending GPU assembly.

cass
02-01-2005, 12:15 PM
As long as MS continues to define extended ASM for vertex and fragment, it will be valuable to support the equivalent in OpenGL, IMO.

Humus
02-01-2005, 03:27 PM
Frankly I don't see the point of an assembly language. Today's drivers have built-in optimisers that typically does an equally good job as handwriting the code in assembly would give.

Korval
02-01-2005, 04:01 PM
Frankly I don't see the point of an assembly language. Today's drivers have built-in optimisers that typically does an equally good job as handwriting the code in assembly would give.Have you tried to use ATi's drivers recently via glslang? They've got compiler bugs in there that are still, to this day, limitting what developers can do. The hardware isn't getting in the way; the driver is.

From this crash bug (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000565) to an oddball bugs (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000461) to miscomputing the dependency chain (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000592) coupled with driver bugs like these (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000492) , how can you expect us to believe that your driver produces optimal code? Even today, many months after the first glslang implementations started showing up, I bet we can beat the ATi glslang compiler with ARB_fp most of the time. And with fewer bugs too.

I'm not a big fan of having 10 ways to do the same thing, but I prefer that to having one sub-optimal path and no way to bypass it.

zed
02-01-2005, 04:25 PM
I bet we can beat the ATi glslang compiler with ARB_fp most of the time. And with fewer bugs tooah but then its hardcoded in + as such aint as likely to benifit from future graphics driver upgrades.
good example is quake3 (quasi details) there was a function in it fast_sqrt() (or someit) in asm, unfortunatly a few months later it was actually slower than sqrt() cause as it was asm it 'had to be obeyed'.

found this
http://www.icarusindie.com/DoItYourSelf/rtsr/ffi/ffi.sqrt.php
actually now i think about it another anology would be doom3 and the specular (or someit i forget) where the texture was used as a 'hack' to look up the specular, ati changed the shaders to use their interpretation of the actual function and it was quicker (+ prolly more acurate)

Stephen_H
02-01-2005, 06:16 PM
I use the fragment programs equally between 3D renderer type applications and for GPGPU applications like image processing and filtering.

I have to say that some high level languages really do a crappy job of optimizing GPGPU code. In some cases, involving branching or longer code using a lot of variables (eg. certain color conversions), I've produce hand optimized assembly that was 3 to 4 times shorter than the assembly being produce the by the compiler.

For typicaly 3D type applications, like games, where you're most doing typically lighting and transform stuff, GLSL/Cg/HLSL is cool and definitely worth it, provided you don't want to do frequent on the fly compilation (some high level language shader compile over 100-1000 times slower than the corresponding ASM shaders!).

Also, getting the GLSL/Cg/HLSL compiler to produce fast code usually involves a few iterations of:

1) look at the assembly for the GLSL/Cg/HLSL program

2) add some hints using swizzles, masks, reorganizing the shader and/or variables, pack variables differently, then goto (1)

The high level compilers need to get a lot better before we're ready to do away with ARB_fp.

EDIT - still waiting for rectangular texture sampling support in GLSL from ATI here... has this been implemented yet?

SirKnight
02-01-2005, 06:25 PM
Doom 3 used that specular function texture so that the cards that support fragment programs will look the same as the older cards. One of the big deals with doom 3 was that id wanted it to look the same on all cards.

-SirKnight

Korval
02-01-2005, 07:01 PM
ah but then its hardcoded in + as such aint as likely to benifit from future graphics driver upgrades.
good example is quake3 (quasi details) there was a function in it fast_sqrt() (or someit) in asm, unfortunatly a few months later it was actually slower than sqrt() cause as it was asm it 'had to be obeyed'.If your game doesn't sell because it's too slow, what happens in the future is irrelevant.

Plus, with better hardware, even the "less optimal" path will still be faster than it used to be. It may not be as fast as it could be, but, as I said, there's no reason to have any faith in ATi's driver development team.


EDIT - still waiting for rectangular texture sampling support in GLSL from ATI here... has this been implemented yet?You want them to implement something? We should hope they get their bugs fixed before bothering to implement things... ;)

V-man
02-01-2005, 08:21 PM
It is possible to use all the new ps30 features of the geforce6 with OPTION NV_fragment_program2 That's the idea, plus NV_fp2 has more instructions.
Pack, unpack.
extended swizzle.
Lit.
Trig functions.
Reflect.
Various set functions.

The same should be exposed in GLSL a la Cg language.

KRONOS
02-02-2005, 03:05 AM
Originally posted by V-man:

The same should be exposed in GLSL a la Cg language.It is exposed. NVIDIA has a paper concerning it (I couldn't find it). All of my fragment shaders use:


#ifndef __GLSL_CG_DATA_TYPES
# define half float
# define half2 vec2
# define half3 vec3
# define half4 vec4
#endif

Zengar
02-02-2005, 07:59 AM
Doom3 uses a special complex specular function, hence the lookup texture. Replacing it with common specular function(which was the idea of Humus :-) - yes, if someone doesn't know it, Humus works for ATI :-D ) is not correct way - as stated by Carmack himself.

martinho_
02-02-2005, 10:10 AM
It's just a matter of time that assembly languages dissapear. Compilers can create portable and faster code than humans, optimized for present and future hardware. If they don't do so right now, they will in a near future (near enough for projects that start today).

valoh
02-02-2005, 10:55 AM
Originally posted by martinho_:
It's just a matter of time that assembly languages dissapear. Compilers can create portable and faster code than humans, optimized for present and future hardware. If they don't do so right now, they will in a near future (near enough for projects that start today).well the question is, how long will it take to get there. How many years? As Korval already mentioned: till now (> one year driver support) IHVs unfortunately are not even able to provide a bug free support.

Another funny thing is, that imo the high-levelness of glsl is very limited. Several things which would be needed for good high-level use are not supported: attribute arrays and invariant uniform calculation optimization. Plus some annyoing specs like: missing literal conversion and varying attributes.

Till now for me glsl just provided lots of reasons for trying an alternative 3D API next time, or if that does not fix the problems, create an own high level language and compile it to glsl (assembler or whatever).

btw: Does any driver already support invariant uniform calculation optimizations? This would be an important optimization, but last time I checked it wasn't supported by ATI/nvidia.

zed
02-02-2005, 11:22 AM
doom3 to look the same on all cards?, why didnt they limit quake3 to 16bit color then (as voodoo couldnt do 32bit color). Doom3 uses a special complex specular function, hence the lookup texture thats the official word but with the relations between ati/id im not 100% sure.

personally (i have an aversion to asm having used it in the early 80's where there was zero documentation (no internet), learning by trial and error, it was fun not! actually in the start for a year i didnt even have a compiler thus code was just numbers, arrrgh, whoops i typed BA instead of B9 )
but speed is not the main issue (ease of use etc) are, how much faster is it to write a shader with glsl vs asm at least 10x im guessing, in the end time is what it all comes down to, my engine could be a lot better if i had an extra hour to spend on the material system, an extra hour on the particle system etc, but its not gonna happen if im farting around with asm

cass
02-02-2005, 01:54 PM
There's a distinction between whether you should use a high level language for shader development and whether there's utility in having a low level language.

From a stability and robustness perspective, low-level languages are easier to get implemented quickly and correctly.

As people have pointed out, language choice isn't an either/or proposition. With a low level language, you can troubleshoot and analyze what kind of code the high level compiler is generating.

At the end of the day, the compiler is a software tool, not a hardware abstraction. I think MS got this right in the D3D design. And for better or worse (depending on your inclination) there will be a common shader ISA for some time to come.

Time will tell which model will be most successful, of course. But how many software developers do you know that like supporting multiple C++ compilers and all revisions of those compilers over multiple years. And C++ has been "done" for a long time now. The shading languages will continue to evolve for the forseeable future.

This is my personal view of the situation. I'm not "advocating" anything. I just believe that existing market forces will keep ARB_fp around for some time to come. And I believe those same forces will result in multi-vendor extensions to that ISA over time.

Time will tell for sure though.

[edit: fix crappy formatting]

Humus
02-02-2005, 03:21 PM
Originally posted by Korval:
Have you tried to use ATi's drivers recently via glslang?Every day.


Originally posted by Korval:
They've got compiler bugs in there that are still, to this day, limitting what developers can do. The hardware isn't getting in the way; the driver is.

From this crash bug (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000565) to an oddball bugs (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000461) to miscomputing the dependency chain (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000592) coupled with driver bugs like these (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000492) , how can you expect us to believe that your driver produces optimal code?I don't see what these bugs have to do with optimal code. GCC refuses to compile some valid C++ files occasionally. Doesn't mean it's not producing fast code.

Humus
02-02-2005, 03:24 PM
Originally posted by Stephen_H:
I have to say that some high level languages really do a crappy job of optimizing GPGPU code. In some cases, involving branching or longer code using a lot of variables (eg. certain color conversions), I've produce hand optimized assembly that was 3 to 4 times shorter than the assembly being produce the by the compiler.I hope you don't take Cg for being representative of compiler quality.

Humus
02-02-2005, 03:30 PM
Originally posted by Zengar:
Doom3 uses a special complex specular function, hence the lookup texture. Replacing it with common specular function(which was the idea of Humus :-) - yes, if someone doesn't know it, Humus works for ATI :-D ) is not correct way - as stated by Carmack himself.Actually, my first implementation was incorrect as it wasn't a pow() function as I initially thought. Later revisions were fully equivalent though (and a good deal faster). He used saturate(4 * x - 3)^2, which could be implemented on Radeon/GF2 level hardware as well.

cass
02-02-2005, 03:32 PM
Originally posted by martinho_:
It's just a matter of time that assembly languages dissapear. Compilers can create portable and faster code than humans, optimized for present and future hardware. If they don't do so right now, they will in a near future (near enough for projects that start today).Out of curiosity, when you say that it's just a matter of time before assembly languages disappear, do you mean from OpenGL or from the CPU world as well?

Humus
02-02-2005, 03:36 PM
Originally posted by valoh:
btw: Does any driver already support invariant uniform calculation optimizations? This would be an important optimization, but last time I checked it wasn't supported by ATI/nvidia.Are you asking for something like the preshader in DX?

Humus
02-02-2005, 03:45 PM
Originally posted by cass:
Out of curiosity, when you say that it's just a matter of time before assembly languages disappear, do you mean from OpenGL or from the CPU world as well?It's not gone already? ;)
Even on the CPU, you're probably better off using C++ code most of the time. Exceptions are 3DNow/SSE and the like. But even then, using intrinsics will probably work better and be quicker to write and debug and allows the compiler to optimize better across function calls and blocks etc.

V-man
02-02-2005, 03:48 PM
Kronos, you just use the Cg syntax in GLSL.
GL_EXT_Cg_shader spec, if it exists, should mention this.


Originally posted by simongreen:
From a stability and robustness perspective, low-level languages are easier to get implemented quickly and correctlyIsn't most of the problems related to optimization?
The original expression is too complex, so you try to reorder it to make it fit the GPU's resources and you end up ****ing up everything?

valoh
02-02-2005, 03:51 PM
Originally posted by Humus:
Are you asking for something like the preshader in DX?exactly.

Haven't used directx but I think this behaviour is called preshaders and handled by the fx framework. For glsl this should have been specified with a compiler flag to check if it is supported and a flag to enable/disable. Imo it's a very important feature to allow a high level application/shader interface.

btw: as you are working with ATI/glsl every day. When can we do expect stable glsl/pbuffer support? this (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000545) two bugs still hasn't been fixed...

martinho_
02-02-2005, 04:45 PM
Originally posted by Humus:

Originally posted by cass:
Out of curiosity, when you say that it's just a matter of time before assembly languages disappear, do you mean from OpenGL or from the CPU world as well?It's not gone already? ;)
Even on the CPU, you're probably better off using C++ code most of the time. Exceptions are 3DNow/SSE and the like. But even then, using intrinsics will probably work better and be quicker to write and debug and allows the compiler to optimize better across function calls and blocks etc.I assume use that no significant amount of programmers uses CPU assembly anymore...

cass
02-02-2005, 05:16 PM
Again, it's not a question of what you should write your code in, it's a question of whether an assembly-level is useful.

Tools like compilers generate assembly. Having a common ISA target allows for different languages to link together into a single executable. It allows programs to generate fast machine code on-the-fly.

Could we really get by just expressing everything in high-level languages? I don't see how. As shaders get more complex (and begin to look more like CPU ISAs) it seems natural that the tool chain will want to go the way that it has on
the CPU side.

Will the GPGPU folks want to have their shader code look like GLSL, or will they want to tailor
it more toward their computational goals?

Will we have two separate HLSLs?

In a lot of ways "less is more". With low-level access, you can provide an abstraction suitable
to your goals. Software developers are free to extend the language constructs however they choose.

Of course HLSLs are the future of shading. Whether they're based on an underlying assumption of an ISA is the question. ISA as a foundation for compiler tools and programmability is definitely the "tried and true" path.

Korval
02-02-2005, 07:03 PM
I don't see what these bugs have to do with optimal code. GCC refuses to compile some valid C++ files occasionally. Doesn't mean it's not producing fast code.I guess ISV's are funny about that kind of thing. They've got this crazy notion that, before you start optimizing something, it should probably work first. I guess IHV's like ATi have different notions about whether something is useful.

I, for one, don't agree that making something fast and broken is good.


Tools like compilers generate assembly. Having a common ISA target allows for different languages to link together into a single executable. It allows programs to generate fast machine code on-the-fly.Admittedly, true, but then again, you're not 3D Labs with their wierd vertex and fragment architecture.

We don't share ISA's across PowerPCs and x86 chips. We don't ask a DEC Alpha to natively run x86 code. But shader hardware can be just as varied; as such, trying to create an ISA that provides for easy optimizations (which none of our current ones do) is not easy. Just look at the compiled code that comes out of your Cg compiler. It's fine for an nVidia card, but it does things that an ATi fragment program doesn't need to. High level constructs that a (theoretically functional) ATi compiler could have used to generate more optimal code has been lost. As such, passing the results of a compilation around is of no great value.

That's not to say that I don't ultimately agree with you. The problem is that even ARB_vp/fp are simply too low level to create good optimizations from. Perhaps there's a happy medium between ARB_vp/fp and glslang, but nobody's currently working on it, so it won't get developed.


Will the GPGPU folks want to have their shader code look like GLSL, or will they want to tailor
it more toward their computational goals?
On a personal note, OpenGL and the ARB should spend absolutely no time finding solutions for the GPGPU people. If they want to hack their graphics cards into CPU's, fine, but they shouldn't expect a graphics library to help them at all.

ffish
02-02-2005, 08:24 PM
Originally posted by Korval:
On a personal note, OpenGL and the ARB should spend absolutely no time finding solutions for the GPGPU people. If they want to hack their graphics cards into CPU's, fine, but they shouldn't expect a graphics library to help them at all.:mad: Thanks a lot :p Personally I don't see a need for anything special for GPGPU. It's slowly coming together anyway. GLSL is fine for it - I don't use GPU asm personally. FBO will be fine when it arrives. Higher floating-point precision would be nice, but I won't hold my breath for that - maybe 5-10yrs I'd expect it. The main thing that'll allow GPGPU to really come into it's own is scattering. Will that arrive with DX10/OpenGL x.x? Dunno, but I wouldn't be surprised either way. If scattering is seen to be useful for graphics, then maybe. It'd certainly be a much appreciated OpenGL extension (hint, hint Cass and Simon ;) ). Other useful things - tools for OpenGL. Maybe a free shader debugger like Visual Studio's HLSL debugger (but better - it's a bit cumbersome for my liking).

sqrt[-1]
02-02-2005, 11:04 PM
ffish: How would a shader debugger work? would you run in some software mode(mesa?) with a custom extension to access variables as a shader is running?

The closest I can come with GLIntercept is to allow you to edit and re-compile the shaders back into the program at runtime. (ie. so you can move a variable into the output color to "see" the value)

ffish
02-02-2005, 11:24 PM
Yeah, I guess. The D3D version you run with the Microsoft reference (software) rasterizer. I've only used it with C# and there are a number of pitfalls, but it still works mostly. Maybe the unmanaged C++ shader debugger works better. Unfortunately I can't see how anyone would want to create a tool like the HLSL version for free, but it'd be nice if they would.

Since I'm doing GPGPU stuff, it's kind of hard to see what changing values does at runtime. My textures are just a bunch of numbers and sometimes it's hard to see what's happening. So your GLIntercept solution wouldn't work for me.

spasi
02-03-2005, 02:27 AM
Correct me if I'm wrong, but I think what Cass is talking about is a Java-like bytecode representation. That is, an almost 1-to-1 mapping of GLSL to "bytecodes", with no optimizations whatsoever (like D3D does) and probably higher level than ARB_vp/fp. The result could then be (greatly optimized and) executed on any architecture. Additionally, more languages (non-GLSL) could be created, as long as they could be compiled to these "bytecodes".

cass
02-03-2005, 04:41 AM
Hi spasi,

Yes, that's more or less what I mean. From a tool chain perspective, there's a target for compiling and linking. Then the driver performs a JIT phase of mapping the generic ISA onto the native architecture.

The generic ISA provides a reasonable hardware abstraction (but does not dictate actual underlying microarchitecture), and software developers can develop any sort of language tools they like on top of it.

This seems to be the way MS has gone, so any IHV hoping to sell D3D parts will already have to support this generic ISA.

Thanks -
Cass

Zengar
02-03-2005, 05:19 AM
I fully agree with mr. Everitt at this point.
We need low-level shading interface to have a possibility of providing different HL shading languages and toolkits.
But I still believe that a assembly-like language is not sufficient for such a goal as you still need too many compilation steps to compile the code to an native repserentaion. As a matter of fact, I am working on a virtual machine(just for fun), something like java or .NET but with abstract intermediate language. This language merely describes the state transformation chains. I found it to be much more effective with respect to further optimisation then usual assembly representation. Another advantage of such a system is ability to extend easily. Somethin glike this could be used for shading too.
Just my 2 cents :D I release that no company will suddenly develop a new method because I said so ;)

spasi
02-03-2005, 06:45 AM
Cass,

I can see the advantages of such an architecture, but have a few questions:

1. Has this been discussed with other ARB members? Not in a "let's extend the low-level language" manner, but given the mentioned advantages.

2. Are ARB_vp/fp (and whatever D3D uses) sufficiently abstract, or would something new be necessary? Also, I would expect this to work completely internally, something developers have no access to (no temptation to write low-level code, like a Java/.NET developer never writes bytecode).

3. Is NVIDIA willing or has plan to go after such an architecture for OpenGL? Not necessarily independently implement it, at least try to prove its usefulness.

V-man
02-03-2005, 07:20 AM
Originally posted by sqrt[-1]:
ffish: How would a shader debugger work? would you run in some software mode(mesa?) with a custom extension to access variables as a shader is running? Yes, you have to run it in software and no you don't need a custom extensions. D3D does it with it's reference rasterizer.

Unfortunatly, this is only good to debug your shader.

Dumping the assembly to a file with the hw driver would be useful.

With ATI (3DLabs too), it's a black box.
NV simply has the right solution.

cass
02-03-2005, 08:02 AM
Originally posted by spasi:
Cass,

I can see the advantages of such an architecture, but have a few questions:

1. Has this been discussed with other ARB members? Not in a "let's extend the low-level language" manner, but given the mentioned advantages.
There have been discussions, they just don't seem to go anywhere. I don't think enough ARB members are interested to have critical mass (yet).



2. Are ARB_vp/fp (and whatever D3D uses) sufficiently abstract, or would something new be necessary? Also, I would expect this to work completely internally, something developers have no access to (no temptation to write low-level code, like a Java/.NET developer never writes bytecode).
No. Had we gone down the path of exending the
ASM it would have eventually been general enough,
I think. NVIDIA will continue to generalize this path with vendor extensions. For external tools to take root, there needs to be multi-vendor support for these general paths though. There's no knowing when (or even if) that will happen.



3. Is NVIDIA willing or has plan to go after such an architecture for OpenGL? Not necessarily independently implement it, at least try to prove its usefulness.If other IHVs and ISVs were interested, I'm sure NVIDIA would be involved. But there's no groundswell of interest/support for this direction today.

The opinions I have expressed on this thread are about the way I think things are likely to go. I'm not trying to advocate action. After all, I could be wrong. :-)

Thanks -
Cass

Humus
02-03-2005, 09:52 AM
Originally posted by valoh:
exactly.

Haven't used directx but I think this behaviour is called preshaders and handled by the fx framework. For glsl this should have been specified with a compiler flag to check if it is supported and a flag to enable/disable. Imo it's a very important feature to allow a high level application/shader interface.Yeah, it's probably a useful thing going forward. I think what we need is not an assembly language (they break down occasionally as well), but rather more control for the application in terms of optimization flags, enable/disable preshaders and so on. Simply disabling optimizations should hopefully let many broken shaders run (although slow) until the bug is fixed.
If we're talking about an intermediate language I might be in iff it doesn't destroy the semantics of the original shader. But then again, then we're not really saving a whole lot more than the parsing.


Originally posted by valoh:
btw: as you are working with ATI/glsl every day. When can we do expect stable glsl/pbuffer support? this (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=11;t=000545) two bugs still hasn't been fixed...I'm working with but not on (I'm no driver writer) ATI GLSL, so don't know of when certain bugs will get fixed. But the bug you mention was just recently fixed. Don't know when it goes into a public driver though.

Humus
02-03-2005, 09:56 AM
Originally posted by Korval:
I guess ISV's are funny about that kind of thing. They've got this crazy notion that, before you start optimizing something, it should probably work first. I guess IHV's like ATi have different notions about whether something is useful.On the other hand, the ISV people don't risk losing millions in sales because their bar was a few pixels shorter than the competition on a chart on Toms Hardware Guide.

valoh
02-03-2005, 10:24 AM
Originally posted by Humus:
On the other hand, the ISV people don't risk losing millions in sales because their bar was a few pixels shorter than the competition on a chart on Toms Hardware Guide.Well, and I guess that is the biggest argument for making the hardware/platform dependent driver side as low-level as possible and then layer on top the high level stuff which seem to be the directx approach. Imo the ARB and IHVs should also use a similar approach to make good high-level features faster and wider available. GLSL showed a very disappointing development with the OpenGL API: late introduction of high level, modern GPU features already widely available on Direct3d, followed by a very long and buggy implementation phase. No good combination.


Originally posted by Korval:

On a personal note, OpenGL and the ARB should spend absolutely no time finding solutions for the GPGPU people. If they want to hack their graphics cards into CPU's, fine, but they shouldn't expect a graphics library to help them at all.

I disagree with that. After all graphics is also just solving an integral equation called global illumination equation. The methods used in graphics can also used in many other domains and vice versa. So imo for a good future development of the API it should also follow the path of the hardware to providing primitives (good hardware acceleration streaming computations) rather than solutions. In the end the graphics domain will also benefit from it more than from a bunch of solutions called graphics API.

Korval
02-03-2005, 12:44 PM
After all graphics is also just solving an integral equation called global illumination equation. The methods used in graphics can also used in many other domains and vice versa. So imo for a good future development of the API it should also follow the path of the hardware to providing primitives (good hardware acceleration streaming computations) rather than solutions.OpenGL is a Graphics Library, not a GPGPU library. If something is found to be useful for graphics work, that happens to be of great interest to the GPGPU people as well, it can be brought in. However, if it is something that exists solely for the GPGPU crowd, it should not even be considered.

You don't see people asking DirectSound or OpenAL to do graphics work. Even though, with some finesing of the internals, it could function as a weak GPU.

-NiCo-
02-03-2005, 04:19 PM
Originally posted by Korval:
OpenGL is a Graphics Library, not a GPGPU library. If something is found to be useful for graphics work, that happens to be of great interest to the GPGPU people as well, it can be brought in. However, if it is something that exists solely for the GPGPU crowd, it should not even be considered.Although I can think of some nice features that would both serve the graphics and the GPGPU people, I have to agree with Korval.

GPGPU people, myself included, should be grateful that there is a relatively cheap and fast commercial platform to program on compared to very expensive custom made ASICS.

Nico

SirKnight
02-03-2005, 05:52 PM
You don't see people asking DirectSound or OpenAL to do graphics work. Even though, with some finesing of the internals, it could function as a weak GPU.
Once I tried asking my floppy drive to compute me some SH coefficients, but all it did was format my commander keen install disk. :(

Just for the record. I think an asm interface rox and is a good idea to always have one imo.

-SirKnight

harsman
02-04-2005, 08:35 AM
Originally posted by Humus:
If we're talking about an intermediate language I might be in iff it doesn't destroy the semantics of the original shader. But then again, then we're not really saving a whole lot more than the parsing.
This is a very important point. Cass, in what way do you feel glsl is too high level at the moment? Obviously, the D3D shader models were too low level since new ones keep popping up (2.0b is a good example). For a low level language it seems reasonable to put the abstraction at about C's level (it was designed to be a portable assembler after all), but that's roughly where glsl is today.

cass
02-04-2005, 09:06 AM
Originally posted by harsman:

Originally posted by Humus:
If we're talking about an intermediate language I might be in iff it doesn't destroy the semantics of the original shader. But then again, then we're not really saving a whole lot more than the parsing.
This is a very important point. Cass, in what way do you feel glsl is too high level at the moment? Obviously, the D3D shader models were too low level since new ones keep popping up (2.0b is a good example). For a low level language it seems reasonable to put the abstraction at about C's level (it was designed to be a portable assembler after all), but that's roughly where glsl is today.I don't think GLSL is too high level as a high-level shading language, I think GLSL is too high level as a compile target.

The low level abstractions aren't too low-level, they actually represent what the hardware is capable of. If you try to compile an HLSL shader to a PS2.0b shader and it fails, you'll know it's because the compiler couldn't make it work given the rules of that model. If you try to do the same thing in GLSL, it'll just fail and maybe give you some text info, but at that point it's too late.

One is a hardware abstraction, the other is just an abstraction.

I disagree that the low-level language just saves the parsing. That's a gross oversimplification. Compilers are more complex than that. Today's shader models are simple enough perhaps to fool yourself into thinking that shader compilers will always be simple. But you are fooling yourself if you think that.

Korval
02-04-2005, 10:40 AM
The low level abstractions aren't too low-level, they actually represent what the hardware is capable of.I'm sure 3DLabs doesn't think so. Their hardware isn't a near 1:1 mapping from D3D shaders to their hardware. Their hardware is fairly unorthadox.

The problem with ISA's, one that we're faced with in the x86 world, is when the ISA just doesn't match the hardware anymore. Look at the P4; a pretty significant portion of the chip is dedicated to transforming x86 into the low-level machine opcodes; effectively, it has a compiler. C/C++ optimzing compilers are designed to work around the weaknesses of the x86 ISA.

The primary thing the x86 architecture buys you is backwards compatibility; the ability to install and run DOS on a P4 (other hardware issues aside).

Certainly, the glslang approach can function. In a perfect world, it works. However, the ISA approach has problems too, as 3DLabs's hardware doesn't match the ISA at all. The high-level information that was lost in compilation is very useful to them. The fact that the user asked for a vec2 rather than a vec4 has meaning for their optimizer; losing this information is bad for them and their implementation. Now, they have to write code to figure out whether each 4-vector float param is used as a 2-vector or not.

You need an ISA with the richness of a higher-level language to be able to allow compilers for disperate hardware to optimize reasonably. ARB_vp and fp can't do that; they're too bound to the traditional ideas of a vector-based system. As such, it's a bad general shader ISA.

cass
02-04-2005, 10:56 AM
Originally posted by Korval:

You need an ISA with the richness of a higher-level language to be able to allow compilers for disperate hardware to optimize reasonably. ARB_vp and fp can't do that; they're too bound to the traditional ideas of a vector-based system. As such, it's a bad general shader ISA.ARB_vp and ARB_fp are fine targets for scalar designs as long as swizzling and masking are
supported (which they are).

Existing hybrid microarchitectures that have 3+1, 2+2, and 4+1 operating modes take great advantage of being able to specify 1,2,3,and 4 component operations clearly and succinctly in the ISA.

I'm not arguing for a specific ISA here, by the way. Just that there is goodness to having a low-level. If a vendor can't support the D3D ISA well, they're going to have problems selling hardware, I think.

Humus
02-04-2005, 06:55 PM
Originally posted by cass:
If you try to compile an HLSL shader to a PS2.0b shader and it fails, you'll know it's because the compiler couldn't make it work given the rules of that model. If you try to do the same thing in GLSL, it'll just fail and maybe give you some text info, but at that point it's too late.Or it may just work because your compiler knew the hardware, rather than an artificial model that's a common denominator of a class of hardware from different vendors.


Originally posted by cass:
I disagree that the low-level language just saves the parsing. That's a gross oversimplification. Compilers are more complex than that. Today's shader models are simple enough perhaps to fool yourself into thinking that shader compilers will always be simple. But you are fooling yourself if you think that.Exactly what step beyond parsing would using a semantics-preserving IL save? If you have a shader with all the semantics, you still have the compilation step ahead.

V-man
02-04-2005, 08:37 PM
The low level abstractions aren't too low-level, they actually represent what the hardware is capable of. If you try to compile an HLSL shader to a PS2.0b shader and it fails, you'll know it's because the compiler couldn't make it work given the rules of that model. If you try to do the same thing in GLSL, it'll just fail and maybe give you some text info, but at that point it's too late. Whooooooooaaaaaaaa! Fail?
It's suppose to run in software mode.

cass
02-05-2005, 12:19 AM
Originally posted by Humus:
Or it may just work because your compiler knew the hardware, rather than an artificial model that's a common denominator of a class of hardware from different vendors.
"It MAY just work" is not a particularly fun place for developers to target. Common denominators are critical for writing code that you know will work.


Exactly what step beyond parsing would using a semantics-preserving IL save? If you have a shader with all the semantics, you still have the compilation step ahead.Maybe I don't know what you mean by "semantics-preserving IL" then. The low-level that I'm talking about couldn't simply be disassembled back into the original high-level shader. Old c-front style C++ compilers used C as their compilation target of C++, but the good ones still did tons of optimization in the c++->c compilation.

In CPU-land the hardware is only aware of the ISA. Any correlation of a high-level language to the hardware is a software problem. You could argue that the ISA can be clunky and inefficient, but the market seems to have decided that maintaining x86 compatibility is worth adding extra hardware support. And not having x86 support jeopardizes the viability of the product.

I don't see how the GPU world is immune to those same forces.

cass
02-05-2005, 12:59 AM
Originally posted by V-man:
Whooooooooaaaaaaaa! Fail?
It's suppose to run in software mode.If your OBJECT_COMPILE_STATUS_ARB or OBJECT_LINK_STATUS_ARB are FALSE, then all bets are off.

The docs state that these kinds failures must be limited. But you have to deal with it, even if you think it's the driver's fault.

In any case, if you're on the hairy edge of being able to run in hw on your min-spec system and a compiler change (driver update) causes you to fall off the fast path, you probably want to know it. If your game/app suddendly started hitting sw fallback all the time, your customers would not be happy.

V-man
02-05-2005, 06:59 AM
OK, a failure can happen.
What is the solution?
1. ARB_vp/fp provided query functions. Do the same for GLSL?
2. The developer educates himself on what all the GPUs can do and writes various shaders?
3. Provide feature to turn off optimization to get a static well defined behavior for all drivers of the same GPU?
4. Caps for all the instrinsic functions in GLSL?
(Should I avoid noise function)

I think #2 is the best. #1 isn't bad cause it helps #2. I have a feeling #1 will never be intoduced.
Everyone hates #4.

marco_dup1
02-05-2005, 07:06 AM
Originally posted by cass:
In CPU-land the hardware is only aware of the ISA. Any correlation of a high-level language to the hardware is a software problem. You could argue that the ISA can be clunky and inefficient, but the market seems to have decided that maintaining x86 compatibility is worth adding extra hardware support. And not having x86 support jeopardizes the viability of the product.
[/QB]I'm not so sure that the market has decided. I see much more a OS vendor lock in to windows which has a naturally monpoly and is only supporting activly one plattform. If the OS vendor is changing the ISA that software developer does too. Look at apple. I think this is bad. Assembly support is good for the old ones and bad for the innovators because the transactionscosts for the ISA change are to high to get a critical mass to show the advantage of the new platform. A higher level interface in this sense is much better for the customizers and the developers in the long run. Look at Unix/Linux/BSD etc., its a high level interface and there is competition. There are other problems but that is about its maybe to high level. So the question how high level should the interface be. I find GLSL is at the right level but the error/warning/debug interface should be much more standardized(the info log(XML? who cares)). But maybe we should see GLSL more as a target interface and not so much as programer interface so I'm not so sure about C syntax. I mean give the hardware developer the many informations but don't do sugar for the developer in the interface(for example automated casts). This is IMHO the job of other tools.

cass
02-05-2005, 08:42 AM
Originally posted by V-man:
OK, a failure can happen.
What is the solution?
1. ARB_vp/fp provided query functions. Do the same for GLSL?
2. The developer educates himself on what all the GPUs can do and writes various shaders?
3. Provide feature to turn off optimization to get a static well defined behavior for all drivers of the same GPU?
4. Caps for all the instrinsic functions in GLSL?
(Should I avoid noise function)

I think #2 is the best. #1 isn't bad cause it helps #2. I have a feeling #1 will never be intoduced.
Everyone hates #4.I don't think there's really any substitute for #2 today, and #3 only makes sense if you're using GLSL as a compile target. Otherwise you lose all the advantages of writing in a high-level language.

cass
02-05-2005, 09:45 AM
Originally posted by marco:
I'm not so sure that the market has decided. I see much more a OS vendor lock in to windows which has a naturally monpoly and is only supporting activly one plattform. If the OS vendor is changing the ISA that software developer does too. Look at apple. I think this is bad. Assembly support is good for the old ones and bad for the innovators because the transactionscosts for the ISA change are to high to get a critical mass to show the advantage of the new platform. A higher level interface in this sense is much better for the customizers and the developers in the long run. Look at Unix/Linux/BSD etc., its a high level interface and there is competition. There are other problems but that is about its maybe to high level. So the question how high level should the interface be. I find GLSL is at the right level but the error/warning/debug interface should be much more standardized(the info log(XML? who cares)). But maybe we should see GLSL more as a target interface and not so much as programer interface so I'm not so sure about C syntax. I mean give the hardware developer the many informations but don't do sugar for the developer in the interface(for example automated casts). This is IMHO the job of other tools.You make good points, Marco. I think the point I am making about the market forces correspond quite well to what you say. Stability, extensibility, and backward compatibility are all critical aspects of this.

The core of these ideas have a software basis (Win32 APIs for example), but they also have a hardware basis (stable ISA and ABI).

OpenGL has traditionally taken this route. Code written for OpenGL 1.0 will still compile and run well more than 10 years later. The basic hardware abstraction is stable, extensible, and backward compatible. But it is still a low-level hardware abstraction.

My assertion is that there will be natural pressure to have a low-level programmable interface for OpenGL. I'm not saying it'll necessarily look more like ASM than C, but that a tool chain external to the driver will treat it as a compile target.

This is all speculation, of course, but it's why I think ARB_fp and ARB_vp will be around for quite a while.

[edit: fix formatting]

Humus
02-05-2005, 04:11 PM
Originally posted by cass:
"It MAY just work" is not a particularly fun place for developers to target. Common denominators are critical for writing code that you know will work.Everything can break. GLSL is no different than VBOs, blending or other features. And things can most definitely run in software. This is no different either. There's no guarantee that Quake3 will run in hardware on future GPUs either. Somehow developers are able to deal with this, and I'm don't see how GLSL changes that.


Maybe I don't know what you mean by "semantics-preserving IL" then.I mean an IL that doesn't hide the intent of the programmer. It should be possible to reconstruct a shader that's not too different from the original shaders. That means that no if-statements are replaced with SLT/MUL, no loops unrolled, no register renaming etc.


I don't see how the GPU world is immune to those same forces.[/QB]We don't need to go there in the first place. The CPU world ended up there because of historical reasons and now we're stuck with x86 because it's darn near impossible to change due to the abundance of old software that needs to work. GPUs on the other hand have the unique position of not being interfaced directly but through an API, which doesn't lock us in as much. Putting an assembly language interface there on the other hand easily ties our hands for future generations, and I think that's a bad idea. Just as what was good for 8086 isn't all too great for the Pentium4 I don't think what's a good representation for R420/NV40 will be a good representation for R800/NV80.

cass
02-05-2005, 05:43 PM
Originally posted by Humus:
Everything can break. GLSL is no different... Everything is not equally susceptible to breakage. More complex things are more difficult and time consuming to implement correctly and more likely to break.

GLSL is more fragile than ARB_vp/fp in this regard. Quality implementations of the ASM interfaces were available as soon as products that supported them shipped. GLSL implementations have taken much longer to surface and their quality generally doesn't match the ASM.


Originally posted by Humus:
We don't need to go there in the first place. The CPU world ended up there because of historical reasons and now we're stuck with x86 because it's darn near impossible to change due to the abundance of old software that needs to work. GPUs on the other hand have the unique position of not being interfaced directly but through an API, which doesn't lock us in as much.
You don't think we need to go there, but MS does. Their ISAs have required regular revamping in the first few generations of programmable shaders, but when they stabilize (as they are doing rapidly), GPUs will have a lot of legacy code that will need to be maintained indefinitely.

OpenGL may choose to never embrace the ISA approach, of course, but there are several good reasons why it might.

I feel like I'm beginning to repeat myself on this thread, so I'll probably refrain from further posts.

I have enjoyed following this thread - it's good to hear the thoughts and opinions of ISVs. For the most part I already know the opinions of IHVs. :)

Thanks -
Cass

sqrt[-1]
02-05-2005, 06:43 PM
I would have to second the notion that the GLSL interface probably needs a "is under native limits" type of query that was in ARB_fragment_program.

That and possibly a few others from ARB_fragment_program on errors:
- Error line number
- "Reason for failure enum" (ie enums for too many instructions, too many texture lookups, too many dependant texture lookups etc)

valoh
02-05-2005, 07:25 PM
Originally posted by sqrt[-1]:

That and possibly a few others from ARB_fragment_program on errors:
- Error line number
- "Reason for failure enum" (ie enums for too many instructions, too many texture lookups, too many dependant texture lookups etc)I second that. I'm doing some developement in the direction of high level and component based render algorithm description. Therefore I need to make as many as possible configuration decisions automatically for which I need machine manageable (== specified) informations about shader components.

An additional error information which would be very helpful in this context is reason for link errors. Like an enum for undefined function and a mechanism for querying information for the undefined function (name and parameter names/types).

I think in the medium term complete shaders most likely won't be written by hand but will be generated automatically out of a huge collection of shader components based on some constraints (hardware, speed, quality). Unfortunately till now glsl (resp. glsl spec) is very unusable for this context :(

zeckensack
02-05-2005, 08:04 PM
Cass,
I must say I'm a bit befuddled by your use of "ISA", "ABI" and the parallels you have been drawing between C++ to x86-compilers and shader APIs.

ARB_fragment_program (and PS2.x for that matter) may be dubbed "assembly" interfaces, but they clearly are not native machine code interfaces, and as such cannot be an ABIs. Yes, they define an instruction sets, not expressions, so ISA may be an appropriate term, even though I wouldn't call it that, based on the "usual" meaning of ISA as it is known to me.

Whatever, my point is that ARB_fragment_program does not expose the machine. You can make arbitrary changes to instruction encodings and it will still be possible to transparently support ARB_fragment_program. This is not the case with x86 assembly, where all details are open for (ab)use and are relied upon.

One very important example is that ARB_fp abstracts away the register count.
(I know that PS2.x pretends that it doesn't, but then I don't quite believe that current drivers care much about whether a temporary is called r12 or funky_thing.)
IMO this alone is proof that ARB_fp is not an assembly language but rather a high(er)-level language ... without scopes and expressions.

Korval
02-05-2005, 08:43 PM
Somehow developers are able to deal with this, and I'm don't see how GLSL changes that.It used to be pretty obvious and transparent as to what was and wasn't available. You just check the extension string; if the driver didn't expose ARB_crossbar, you didn't use it. The unspoken and unwritten pact between the driver developers and ISVs was that the things that were exposed (as well as a well-known subset of standard OpenGL functionality) would run in hardware.

With glslang, we just have no idea. A shader that ran today just fine may break because some moron on the ATi driver staff happened to change the compiler and it now blows the instruction limit where it did not before. Is there a solution for this?

By contrast, if a driver suddenly stops supporting ARB_crossbar, the software can detect this and turn off features appropriately. So, while it still is a driver bug, the game doesn't suddenly run in software.

Basically, drivers can screw ISVs post release already. This just gives them a really, really easy way to do it. And it will happen. There's no guarentee that an implementation of glslang will ever compile and run any shader at reasonable speed, and if it can't, there's no way to tell why. These are not acceptable risks for software development.

And I guarentee you that it will be one of the reasons why OpenGL will be used less and less frequently in games.


ARB_fragment_program (and PS2.x for that matter) may be dubbed "assembly" interfaces, but they clearly are not native machine code interfaces, and as such cannot be an ABIs.Cass's use of the term "ABI" is more for the idea that it is a standardized interface that is easily supported across multiple "compilers". You can compile a C library on GCC (for Win32) and link it to a C program compiled with VC++.


One very important example is that ARB_fp abstracts away the register count.If I recall correctly, the P4 has far more actual hardware registers than x86 calls for. What they do is they map registers from the native hardware to what the x86 opcode is looking for.

x86 is not (any longer) "native assembly"; modern x86 chips have microcoded "compilers" that translate x86 commands into native internal opcodes that are then executed. x86 now exists solely as an interface; a single target for compilers to compile to that multiple chips can code for.

However, I fundamentally dislike the idea of making hardware specifically for an ISA. I don't like the fact that the ISA compilers for modern x86 chips are a part of the chip and not the external compiler. I understand why it is, but I still don't like it. Most important of all, this must never happen with graphics shaders. The minute someone starts adding transistors for the purpose of making some outdated ISA work is a bad day for everyone; it's the first step on the way to what x86 is now.

The way around that is to make the ISA good enough to handle a plethora of hardware, such that hardware vendors do not consider trying to rebuild their hardware to look more like the ISA. The compilation from the ISA to the hardware needs to be a real process, and the ISA should not in any way influence how hardware is made.

martinho_
02-06-2005, 05:49 AM
It used to be pretty obvious and transparent as to what was and wasn't available. You just check the extension string; if the driver didn't expose ARB_crossbar, you didn't use it. The unspoken and unwritten pact between the driver developers and ISVs was that the things that were exposed (as well as a well-known subset of standard OpenGL functionality) would run in hardware.
Are you really sure about this? because as far as I know this is true just in one direction, this is, if the extension is exposed it runs in hardware, but the opposite is not true (if not exposed, it doesn't mean that it would run in software).

My GFFX doesn't expose ARB_texture_env_crossbar but it supports it through GL1.4, and it works fine.

If I had looked at the extensions string I would never had used a feature in a wide range of hardware that supports it.

cass
02-06-2005, 07:53 AM
Originally posted by martinho_:
Are you really sure about this? because as far as I know this is true just in one direction, this is, if the extension is exposed it runs in hardware, but the opposite is not true (if not exposed, it doesn't mean that it would run in software).

My GFFX doesn't expose ARB_texture_env_crossbar but it supports it through GL1.4, and it works fine.

If I had looked at the extensions string I would never had used a feature in a wide range of hardware that supports it.That's a good question. The unwritten rule that NVIDIA has followed has always been to always try to support the latest version of OpenGL, but not expose the corresponding extensions for functionality when there's no direct hardware support.

The texture_env_crossbar example is an exception.
It's not about hardware functionality, it's about interoperation with other supported extensions.
If I remember correctly, we don't support the extension because it conflicts with texture_env_combine4. This issue was corrected when the functionality was added to the 1.4 core.

The extension says:

ARB_texture_env_core:
If a texture environment for a given texture unit references a texture unit that is disabled or does not have a valid texture object bound to it, then it is as if texture blending is disabled for the given texture unit. Every texture unit implicitly references the texture object that is bound to it, regardless of the texture function specified by COMBINE_RGB_ARB or COMBINE_ALPHA_ARB.
The core says:

OpenGL core spec:
If a texture unit is disabled or has an invalid or incomplete texture (as defined in section 3.8.10) bound to it, then blending is disabled for that texture unit. If the texture environment for a given enabled texture unit references a disabled texture unit, or an invalid or incomplete texture that is bound to another unit, then the results of texture blending are undefined.
The core doesn't define the behavior if you reference a disabled or incomplete texture in a stage where texture blending is enabled, but the ARB spec requires you to disable blending for that unit.

The ARB spec here wouldn't have extended very well to ARBfp or GLSL anyway. If you think of this extension as being implemented as an ARBfp, we couldn't just generate the program based on the combine state. We'd also have to re-generate and reload the program every time a texture is bound or enabled/disabled nulling out groups of instructions that correspond to that "blend unit".

Not all that attractive, huh?

Anyway, just wanted to clear up that point since texture_env_crossbar is a bad example of how we try to convey hardware capabilities. Clearly your GeForceFX supports the 1.4 functionality, but we don't support the extension because all the extra hassle of implementing it correctly.

CrazyButcher
02-06-2005, 11:47 AM
Originally posted by martinho_:
if the extension is exposed it runs in hardware, but the opposite is not true isnt that wrong taking ARB_vertex_program on older Nvidia Hardware as example, like I think even TNT supports the extension, but surely doesnt do it on GPU.

nevertheless I am quite a noob to all this shader stuff, but I prefer the low-level approach the ASM like extensions offer. It's more like "you see exactly what you do" it helps you doing the minimal you need for an effect. While of course some cool compiler could take over that job, the hlsl stuff is still way too new I think, to have great compilers that both work fine and optimize well.

it would be cool if feature wise the ARB_programs would hold up to the HLSL so that personal preference/backup/comparison is possible, just like if one would favor Cg synatx over GLSL...

but then again its understandable when architecture differs so much and people have to write big compilers for hlsl and asm, that they would minimize work and do just one.

V-man
02-06-2005, 06:29 PM
Originally posted by Korval:
With glslang, we just have no idea. A shader that ran today just fine may break because some moron on the ATi driver staff happened to change the compiler and it now blows the instruction limit where it did not before. Is there a solution for this?
If you offend them, they won't come.




By contrast, if a driver suddenly stops supporting ARB_crossbar, the software can detect this and turn off features appropriately. So, while it still is a driver bug, the game doesn't suddenly run in software.
Look at my suggestion #3
A static well defined behavior is what is needed. In my suggestion, I said disable optimization, but we could have the option of certain optimization such as conserve temp registers, conserve ALU instructions, don't unroll, ...

Now I know the ARB is really squimish about solutions that will have a short term scope, so I suggest that an proprietary extension (or multivendor) be written, and in 10 years, it could be dropped.

Having a way to query instr. count is good too because it will give us an idea of how heavy the shader actually is.

If the rumors are correct, MS will drop shader targets.
You can bet they won't let dumb issues like these slip.

tfpsly
02-07-2005, 12:39 AM
Originally posted by cass:
This is all speculation, of course, but it's why I think ARB_fp and ARB_vp will be around for quite a while.I second that. From the Steam database :

NVidia GeForce FX 5600-5200/GeForce 3/4 23.83 %
NVidia GeForce4 MX/2/1 17.51 %
ATI Radeon X600/9700/9500/9600/9650/9550 15.87 %
ATI Radeon X800/9800 13.72 %
ATI Radeon X300/9200/9100/9000/8500/7000 8.94 %
NVidia GeForce FX 5950-5700 6.65 %
NVidia GeForce 6800 2.70 %
NVidia TNT2/TNT 1.54 %
Intel 8xx 2.57 %
SiS 7xx/6xx/3xx 1.03 %
ATI Radeon/Rage 128/Rage 0.84 %
S3 Graphics ProSavageDDR 0.59 %
3Dfx Voodoo 3 0.13 %
PowerVR KYRO/KYRO II 0.11 %
Trident Video Accelerator Blade 3D/ProMedia 0.08 %
Autres 3.82 %We see that a huge part of today's market has no support for high level shaders (except cg).

EDIT : cards are grouped not only according to their features but also according to whether they are fast enought to make their feature useable.

Obli
02-07-2005, 05:00 AM
I see this topic evolved quite a bit but I still wanted to say I'm against dropping ASM-like interfaces. Besides the fact I like ARB_vp and ARB_fp quite a bit for their syntax, I also hard that some compilers actually translate to ASM before going to "native" code. This is after all what some compilers do if I'm not wrong.
If this is true I would hate to see ASM dropped because it in fact would be still avaiable.
I also don't think the time has come for this. There are still some people who uses ASM on CPU right now (need to recall RDTSC or CPUID?). Sure GPUs are much evolved but I think we can all live with it. Who cannot still have a choice after all.
I also agree about the handwritten-shader problem. I still have to carefully consider how much can I automate in a shader generation system but I still think a large majority of shaders will be built "at runtime".
Notice that RenderMan itself could prove me wrong however this was just to put in my two cents.

knackered
02-07-2005, 07:33 AM
There will always be people who've got plenty of time to bugger about with asm trying to outdo compilers. But there will always be a majority who have more imagination and just want a way of expressing it quickly, so they can move on to the next exciting thing. On the whole, I want performance to be the drivers responsibility whenever possible, so long as I follow basic guidelines - this is why I'm not pushing for an opengl extension to allow me to push hardware byte codes directly to the card....that's not my job, that's why opengl exists, as a hardware abstraction.

Korval
02-07-2005, 11:03 AM
If you offend them, they won't come.Who won't come where?


A static well defined behavior is what is needed. In my suggestion, I said disable optimization, but we could have the option of certain optimization such as conserve temp registers, conserve ALU instructions, don't unroll, ...That's silly. If the hardware can compile a shader in some way such that it fits and does what it is asked to, then it should do that. I shouldn't have to tell it, "Try to make this shader fit within your ill-defined resource limits." It should do that as a matter of course. Failure to compile should be a last resort, not a standard fallback.


EDIT : cards are grouped not only according to their features but also according to whether they are fast enought to make their feature useable.Poor grouping. "Cooking" the data to benifit your argument is of no value to anyone.

A glslang shader can easily be a small thing or a big thing. While 5200 cards are going to be slow with any kind of cross-platform shader code, 5600's are rather servicable for smaller shaders.

More importantly, the chunk of cards that can't use shaders is getting smaller, not bigger.


There will always be people who've got plenty of time to bugger about with asm trying to outdo compilers.It's not just a question of "outdoing compilers"; it's a question of getting shaders to work. How likely is it for a driver update to push a shader that was once barely within the limits outside of those limits? For glslang, it is very likely. For ARB_vp/fp, it is highly unlikely.

To have a driver release break your game would suck. Considering that boneheaded driver development can already screw your game over, let's not give them more opportunities to break your software.

tfpsly
02-07-2005, 12:53 PM
Originally posted by Korval:
[QB]
EDIT : cards are grouped not only according to their features but also according to whether they are fast enought to make their feature useable.Poor grouping. "Cooking" the data to benifit your argument is of no value to anyone.

A glslang shader can easily be a small thing or a big thing. While 5200 cards are going to be slow with any kind of cross-platform shader code, 5600's are rather servicable for smaller shaders.Feel free to read directly the source:
http://www.steampowered.com/status/survey.html

And fx5600 cannot run HL2 in DX9 path smoothly. This card is definitely not a glsl target (that is if you use glsl to do something more interesting than the fixed path). Everyone knows fx sux with floating fp.


More importantly, the chunk of cards that can't use shaders is getting smaller, not bigger.That's 1) a good thing 2) obvious.
Still I would not like to prevent a big part of the market buying my games just because I was lazy or because I choose a too advanced technology. Today I use arb vp/fp/cg, tomorrow I might use glsl.

Korval
02-07-2005, 02:45 PM
Feel free to read directly the source:I get approximately 41.99%. Not half, but then again, this is the population that is growing.

And yes, this includes FX's 5600 or better. Equally importantly, ARB_fp exhibits the same problem with FX hardware that glslang does, so it is not guarenteed to be slower due to the use of glslang.


Today I use arb vp/fp/cg, tomorrow I might use glsl.But every card that supports ARB_fp supports glslang. So it isn't a question of lack of support so much as lack of good, trusted support.

V-man
02-07-2005, 03:00 PM
Originally posted by Korval:
Who won't come where?Who : ATI's staff. Their driver developers.
Where: these boards.


That's silly. If the hardware can compile a shader in some way such that it fits and does what it is asked to, then it should do that. I shouldn't have to tell it, "Try to make this shader fit within your ill-defined resource limits." It should do that as a matter of course. Failure to compile should be a last resort, not a standard fallback.
OK, let me reword it. Let's say today you have driver version 30.1 installed.
Your shader runs fine and you release your product.
2 months later, a new driver is released, version 30.2) and your shader hits a limit, whatever that may be.
You know that 30.1 was good enough.
Why not have 30.2 contain 30.1's GLSL compiler and allow us to tell it to use that version?

It doesn't need to be about hitting instruction limits or temp register limits.
It could be about performance. What if 30.1 was better for your shader? I have encountered this case, but not the former.

It's better to stand away from the hw's limits. Developers should just educate themselves.
For the performance, we can't do squat.

Korval
02-07-2005, 09:35 PM
Who : ATI's staff. Their driver developers.
Where: these boards.
Good. They need to be working on their drivers anyway.


What if 30.1 was better for your shader?Yes, but how do we know that 30.1 was "best" for our shader? All we knew was that 30.1 worked.


It's better to stand away from the hw's limits. Developers should just educate themselves.Educate themselves on what? Do we now have to write a glslang compiler to know how close to instruction limits we are? There is no way a priori to know how close a given shader is to the limits. We don't even know what the limits are (unless we have an assembly language spec that tells us what the limits are, that is). How can you code to limitations that you don't know under an environment that tries its best to hide these limitations?

This "guess and check" refrain from glslang proponents is getting tiresome. The idea that developing for OpenGL means that every shader should be compiled and tested on every card made in the last 2 years with every driver made in the last 2 years is just nonsense, especially when there is specific knowledge out there as to whether or not the shader should fit with particular hardware. It'd be one thing if it were a software issue that had a signficant reason to vary from driver release to driver release. But these are hardware limitations being dicated through a highly transparent software interface.

tfpsly
02-07-2005, 11:33 PM
But every card that supports ARB_fp supports glslang. So it isn't a question of lack of support so much as lack of good, trusted support.[/QB]I should have said "using Cg" or "using fx integer fp path as much as possible", sorry for the misunderstanding.

Aeluned
02-08-2005, 10:27 AM
There will always be people who've got plenty of time to bugger about with asm trying to outdo compilers. But there will always be a majority who have more imagination and just want a way of expressing it quickly, so they can move on to the next exciting thing. On the whole, I want performance to be the drivers responsibility whenever possible, so long as I follow basic guidelines - this is why I'm not pushing for an opengl extension to allow me to push hardware byte codes directly to the card....that's not my job, that's why opengl exists, as a hardware abstraction.
I couldn't agree more with this.

Although I admit that it's been painful at times to work with glsl it's important to remember that this is in its infancy. As the hardware becomes capable of more, a high level language will be quite handy for creating shaders that are as of today only imaginable. Although we still have ASM languages now for CPU targets would you develop an application using them? This is an inevitable step in the right direction. I'm pretty sure that the ARB_vp/fp will stick around for quite some time (if they ever even go away at all, which im not sure that they should).


Who : ATI's staff. Their driver developers.
Where: these boards.
Good. They need to be working on their drivers anyway.
Hah, nice...
Korval, unless you're writing your own drivers, unfortuanately you work with what they give you.
It's nice to know that when we voice our concerns, ideas, etc... in these forums that those in the position to take these things into consideration are listening.

V-man
02-09-2005, 05:39 AM
Yes, but how do we know that 30.1 was "best" for our shader? All we knew was that 30.1 workedThink of a way.


There is no way a priori to know how close a given shader is to the limits. We don't even know what the limits are (unless we have an assembly language spec that tells us what the limits are, that is).For getting limits at run time, see suggestion #1 in my posts way way above.

I'm aware of what the X800 and the Gf6800 can do and I'm sure everyone here does as well. That's what I meant by developers educating themselves.

I think my 4 points (way way above) covered the essentials. And doing EXT_vp2/EXT_fp2 is not a bad idea either.

idr
02-09-2005, 08:58 AM
I'm aware of what the X800 and the Gf6800 can do and I'm sure everyone here does as well. That's what I meant by developers educating themselves.Great. That's two cards. What about i915, Wildcat VP, Volari and other future cards that may support GLSL? Having to know how each card works defeats the purpose of having a device-independent API. :(

cass
02-09-2005, 12:38 PM
Originally posted by idr:

I'm aware of what the X800 and the Gf6800 can do and I'm sure everyone here does as well. That's what I meant by developers educating themselves.Great. That's two cards. What about i915, Wildcat VP, Volari and other future cards that may support GLSL? Having to know how each card works defeats the purpose of having a device-independent API. :( That's a fair criticism, but it's not acceptable for most software vendors to fall of the hardware-accelerated path. Until falling off the fast path is a rare exception and not the norm developers will need to be keenly aware of the limitations of particular classes of hardware.

Making that determination today is easier with the ASM APIs since they expose these limitations more explicitly.

The ASM APIs will probably continue to expose these features (with ever-larger minimum requirements) even when it is unlikely that developers will bump into them.

Korval
02-09-2005, 02:02 PM
Having to know how each card works defeats the purpose of having a device-independent API.And yet, we know plenty about each card. We can ask how many textures we can bind; we aren't forced to keep binding more textures until the card runs out. We know how big textures can be; we don't simply call glTexImage2D and check the error to see if it worked.

And yet, this is exactly what we're asked to do with shaders. The only difference is that it is not entirely simple to specify these limits in a meaningful way.

martinho_
02-10-2005, 03:31 PM
Having to know how each card works defeats the purpose of having a device-independent API. This problem will fade over time, and the situation will become like in the CPU world. Do you use to care if your CPU has the intructions run your C program?

And to help me in this discussion let me remember some old words from John Carmack regarding the resource queries:


I do need to get up on a soapbox for a long discourse about why the upcoming high level languages MUST NOT have fixed, queried resource limits if they are going to reach their full potential. I will go into a lot of detail when I get a chance, but drivers must have the right and responsibility to multipass arbitrarily complex inputs to hardware with smaller limits. Get over it And regarding low level interfaces:


I have not done a detailed comparison with Cg. There are a half dozen C-like graphics languages floating around, and honestly, I don't think there is a hell of a lot of usability difference between them at the syntax level. They are all a whole lot better than the current interfaces we are using, so I hope syntax quibbles don't get too religious. It won't be too long before all real work is done in one of these, and developers that stick with the lower level interfaces will be regarded like people that write all-assembly PC applications today. (I get some amusement from the all-assembly crowd, and it can be impressive, but it is certainly not effective)

Korval
02-10-2005, 04:38 PM
This problem will fade over time, and the situation will become like in the CPU world. Do you use to care if your CPU has the intructions run your C program?And precisely how long will that be? 3 years? 5? 10?

If we don't have basic useability now, it doesn't matter if we will be in the right place 10 years from now. Because everyone writing performance apps will be using D3D, and OpenGL will be that thing that CAD programs/non-Windows programs use.


And to help me in this discussion let me remember some old words from John Carmack regarding the resource queries:First, I do not subscribe to the notion that what Carmack says is the divine word of God, or even more significant than what any other graphics programming professional would say. Plus, logic is with me; the source of an argument or position is irrelevant to the veracity of that argument or position.

In that vein, do note that the ARB promptly shot his nonsesne about non-querryable limits down flat. The mere thought of a low-level graphics API being required to do multipass or whatever it takes to make shader X work is sheer lunacy. It'd guarentee that no glslang implementations would even exist (let alone be at all trustworthy) until hardware was actually capable of virtualizing its limitations. Certainly, you couldn't write performance code based on it; you have no guarentees, or even educated guesses, about what the compiler, multipasser, etc is going to do with it.

While the ARB agrees with him about not trying to support any further form of assembly (to GL's detriment), they do not accept that hardware limits need to be virtualized. As they pointed out, if an implementation wishes to do so, they can expose an extension to virtualize these and require the glslang implementation to run any shader it is given.

Equally importantly, Carmack just wants someone else to write the dull, boring, boilerplate code for him. He wants driver developers to write code that takes a shader and breaks it down into multiple passes for inferior hardware so that he doesn't have to do it. He's looking ahead to his next engine knowing full well that he's going to have to write some boring code into that engine to be able to run on R300 hardware just as much as R600 with the same shaders.


And regarding low level interfaces:Considering that Carmack made a game that can't be fully run on current hardware, I don't think he's the kind of authority you want on performance apps (note: I realize this is irrational, but Carmack bashing is a hobby of mine, so I felt the need to indulge ;) ).

More on-point, nobody's arguing that higher level interfaces aren't the future. The question is much more a matter of when that future gets here.

To my mind, high-level languages are a luxury until the day that hardware limits are actually virtualized (via reasonable means. IE, not because the driver decides to multipass). Once that happens, any usefulness of low-level languages is gone, and we therefore ought to use something else. In that vein, I agree with Carmack that glslang shouldn't allow for querryable limits, but I add to that that it shouldn't exist yet.

I recall this quote from Sid Meyer's Alpha Centauri:

"Technological advance is an inherently iterative process. One does not simply take sand from the beach and produce a Dataprobe. We use crude tools to fashion better tools, and then our better
tools to fashion more precise tools, and so on. Each minor refinement is a step in the process, and all of the steps must be taken."

Effectively, this means that attempting to leap over a step simply because we know that it will eventually lead to something else doesn't make it a good idea. This isn't an API refinement like VBO, which theoretically we could have had at GL1.0. This is a significant feature; a technological advance.

You can skip a step when you're climing the ladder, but if the rung you're going to is too high, you'll pull a muscle or something on the way. CPU improvement didn't skip the "code in assembly" phase; imagine what would have happened if they did. Imagine sending out your source code to be compiled for their CPU and hardware setup. Maybe it'll work, but maybe it won't. There's no way for you to tell. And your consumers don't want to here, "Oh, set this build parameter" or whatever other nonsense; if it doesn't work, it isn't worth their time.

It isn't that glslang isn't a good idea to have even today. The point is that, for quite some time, CPU code was written in both C and assembly. There were good reasons for this; it was a transition period and to not have one of them would have made the transition more painful. It doesn't make sense to try to just skip the assembly phase just because it's going to end eventually.

I think the ARB is living in the Ivory Tower, where they can just go for the right answer, when they don't realize that the right now answer is not only not a bad idea, but quite useful and crucial for many applications.

knackered
02-10-2005, 11:31 PM
That's the first time I've ever heard anyone quote from a game to support an argument. Oh brave new world.

zed
02-11-2005, 12:29 AM
ah my two favs knackered and korval (are they one in the same sybil like)

It doesn't make sense to try to just skip the assembly phase just because it's going to end eventually.ild argue, yes it does make sense, stick that broken record back on,
fact - we know that asm shaders will become irrelavant in the future.
the question is when, well i believe since 6 months.

personally ive never even written a arb_vp or arb_fp, why not, well after spending time learning gl1.0 then/multitexture/combine/register_combiners etc. and realising they became outdated quickly, i didnt want to make the same mistake again, spending all that effort to learn something that is practically useless today.
i want those 3 months back that i spent using register combiners! its worth nowt today, i would of been better off doing something else eg collision detection, at least 90% of what i learnt would still been relevant.

ive long since given up caring about an extra 10% performance, im focusing on the extra 300%

Korval
02-11-2005, 01:25 AM
the question is when, well i believe since 6 months.What happens in 6 months?

Neither ATi nor nVidia has any significant card releases lined up for then. There may be a few performance-tweaked refreshers, but that's it. Certainly, these cards won't be able to virtualize resources internally, so they will still have resource limits.

The glslang compilers may have improved in that time (though I'm skeptical about ATi's ability to write an optimizing compiler at all), but will they be trustworthy? Can I write a shader that happens to be near the limits (that I am not aware of because nobody tells me I'm close to them) and then a later driver revision breaks my shader? Can I trust the compiler to not break on a shader that worked on that hardware 3 driver revisions ago?

These are some of the principle reasons for using simpler interfaces (not trusting the optimizer would be another one). Until these problems are solved, glslang has little real use in the performance graphics arena. You just can't trust it, and if you can't rely on something, you can't use it.

Personally, I'd even like to see a return to more hardware-specific things like ATI_fragment_shader. In this language, you have to do a lot of the "compiler's" work yourself, splitting your shader into distinct phases and doing all the 3-vector/scaler opcode stuff yourself. What it lets you do is work around compiler bugs that the driver developer hasn't fixed, as well as write a shader that is guarenteed to work on the hardware.

Note that none of these need to be ARB extensions, nor do they ever need to be considered to go into the core. I don't need to see ARB_vp/fp go into the core. As long as they're there, with up-to-date features, everything is fine.


personally ive never even written a arb_vp or arb_fp, why not, well after spending time learning gl1.0 then/multitexture/combine/register_combiners etc. and realising they became outdated quickly, i didnt want to make the same mistake again, spending all that effort to learn something that is practically useless today.There's no way you could mistake NV_register_combiners as anything other than direct access to specific nVidia hardware of the time. If you were under the impression that this extension was anything more than that, then this was a poor assumption on your part. There's a reason it was "NV", not "ARB" or core.

Plus, NV_RC was not "oudated quickly". It was quite viable for 2-3 years. That's the normal life-cycle of game development. Plenty of time to put them to use.

For my part, the time I spent learning NV_register_combiners was invaluable. Not only did I get a pretty good glimpse at what the hardware was actually doing, it was the first extension that created the concept of a "fragment program". Up until then, the "fragment process" was bound to texture enviromnents and the sequence of multi texture. The novelty of NV_register_combiners was that it decoupled texture accessing from fragment usage. In doing so, it helped define fragment processing in much less strict terms.

The mere idea that a texture could be a normal rather than a color image was very profound for me at the time. It seems obvious and silly nowadays, but this was pretty heady stuff back then. If the ARB had just popped out glslang for everyone to use, it'd have taken far longer to simply wrap your head around what it was supposed to do, let alone start thinking of stuff to do with it. This lends some weight to the theory that all the steps in a progression need to be taken. The idea is that, by slowly building up to the concept, it gives people time to analyse it, figure out what works (NV_RC not being object-based didn't work), figure out what they want to do with it, and figure out which limitations on the overall concept need to be expanded.

The time spent learning ARB_vp/fp has prepared me for the complexities of writing larger shaders that consume more resources. When/if the time comes for glslang to be useful, this experience will be quite useful.

Programmers pick up all kinds of languages. The language is alomst irrelevant (to the degree that it is easy to use, of course); what matters is the experience you get when using it.

Aeluned
02-11-2005, 06:21 AM
Personally, I'd even like to see a return to more hardware-specific things like ATI_fragment_shader. In this language, you have to do a lot of the "compiler's" work yourself, splitting your shader into distinct phases and doing all the 3-vector/scaler opcode stuff yourself. What it lets you do is work around compiler bugs that the driver developer hasn't fixed, as well as write a shader that is guarenteed to work on the hardware.
NOOOOO! make it stop!
You do realize there are many people who don't feel this way. For those people with release dates and deadlines this would make life hell. As I said before, I agree that the rug shouldn't be pulled out from under ASM shader languages - right now, we could definitely use both (I would even agrue that we could use both indefinitely)

But Korval, you can't just say that glslang is useless - personally, I hate writing tons of ADD,SLT,whatever instruction code.

Plus I think it's just plain wrong to factor in things like: "well, maybe i can't trust the driver" into the development of a language. If the language was repsonsible for all that crap think of what a nightmare it would be to actually use it.

Have you actually had a new driver release break one of your shaders? I'm sure it could happen, but hell, I've had a new driver reboot my machine whenever i was rendering in selection mode, using VBOs in a pbuffer context.

These things happen; you point the finger at the driver developer and say: "Hey you really screwed up here, can you fix this?". They're writing drivers because that's what their job is, it's not my job to hand everything to the board sugar-coated.

Just as the CPU knows what to do with the instructions I've sent it, we should learn to expect the same from the GPU.

idr
02-11-2005, 06:33 PM
Personally, I'd even like to see a return to more hardware-specific things like ATI_fragment_shader. In this language, you have to do a lot of the "compiler's" work yourself, splitting your shader into distinct phases and doing all the 3-vector/scaler opcode stuff yourself. What it lets you do is work around compiler bugs that the driver developer hasn't fixed, as well as write a shader that is guarenteed to work on the hardware.NOOOOO! make it stop!
You do realize there are many people who don't feel this way. For those people with release dates and deadlines this would make life hell. As I said before, I agree that the rug shouldn't be pulled out from under ASM shader languages - right now, we could definitely use both (I would even agrue that we could use both indefinitely)
I'm going to go way out on a limb and, at least partially, agree with Korval. I don't think the future of the assembly-level APIs will be in ARB extensions. I think, assuming there is a future, it will be in vendor-specific extensions. This is for the simple reason that multivendor extensions aren't going to match any vendor's hardware close enough to actually be useful.

In fact, we're already seeing this in NV extensions. It would be nice if ATI and Intel would expose extensions for features of their hardware, but they don't seem interested. What can you do?

tfpsly
02-12-2005, 12:55 AM
Originally posted by idr:
I don't think the future of the assembly-level APIs will be in ARB extensions. I think, assuming there is a future, it will be in vendor-specific extensions. This is for the simple reason that multivendor extensions aren't going to match any vendor's hardware close enough to actually be useful.That's exactly what I don't want to see happening as a 3d coder and as a game developper. I don't want to write 10 times the same rendering logic for X brands and Y generations of hardware.

I want the driver compiler to either create the best vp/fp the card can run, or to give me back a "failed" message. And then if the program runs too slow, the user will just choose a smaller quality level.

And for the CPU comparison : do you write amd_x86 code and intel_x86 code, or do you write x86 code (not speaking about 3dnow/sse*/mmx here) ?

Dez
02-12-2005, 03:10 AM
That's exactly what I don't want to see happening as a 3d coder and as a game developper. I don't want to write 10 times the same rendering logic for X brands and Y generations of hardware. I couldn't agree more, although I use ARB_vp/ARB_fp (+ NV extensions to ARB_fp) more often these days. The advantages of run-time compilation in terms of portability can not be denied. And we shouldn't make any assumptions based on somewhat broken implementations available today.
Of course it would be nice to have vendor specific extensions for ARB_fp, but put yourself in the driverwriter position. I think that the most important thing for them is to improve the core features of the GL (which includes GLSL). Moreover, writing specs for new extensions to ARB_fp/vp is far more time consuming than applying some fixes to existing GLSL implementation allowing it to take advantage of new hardware features.
Personally I don't want to see ARB programs dead but it is inevitable I am afraid. Most of the current GPU vendors can’t be compared in terms of man power and development potential to the biggest CPU vendors that are able to maintain ASM and high level interfaces at the same time. And the history of languages used for GPU development seems to be somewhat “flipped” compared to the CPU court: there was relatively short period of ARB_fp/vp usage and now developers are rapidly switching to high level languages. Maybe it has something to do with that we don’t have any reference architecture on GPU side, something like x86 for CPUs. Maybe some kind of intermediate target that Cass spoke of could help in this matter, but than again I don’t believe that anyone except Nvidia has the man power sufficient to develop for two APIs (GL and DX) and multiple languages for at least one this APIs.

V-man
02-12-2005, 08:29 AM
I emailed the webmaster about making a poll.
The question is, what should the poll question be and what should the options be?

Perhaps...

Do you think ARB_vp/fp should be extended to support the new generation of GPUs?

- Yes, but I prefer GLSL.
- Yes, I prefer low level shaders.
- No
- Huh?

Jan
02-12-2005, 09:03 AM
Well, in general i like GLSL better, than ASM programming. On the other hand, i can understand, that it might be easier to translate ASM into GPU-code, in certain cases.

And i am willing to use, whatever runs better. As a programmer, of course i like having better tools, but it is my JOB to do the dirty work, if it runs faster then.

BUT, i think ARB_fp is a really bad language. It is ASM, so i may be closer to the actual hardware capabilities, but it is still quite abstract. For example, i am not able to use a float, a vec2, vec3 or vec4. No, i have to use a TEMP. How is that supposed to help the driver?? Now we still need a good optimizer, to find out, what gets actually used.
Also, there is not a normalize-function and some other very general stuff. And it was obvious, that this would be implemented in hardware pretty soon!

So, in my oppinion, ARB_fp should not be EXTENDED, but merely REPLACED, by a better thought-through (?) ASM langauge. I am really wondering, how one could make such a castrated "low-level" language. This is definitely not efficient.

Also, i think, nVidias approach of using halfs is not a bad idea. Certainly, some hardware doesn´t support this, but where is the problem? Hardware, that does support it, can use this extra information for a speedup, and other hardware simply uses full precision.

A really good GLSL implementation would be most desireable, but i doubt that ATI and nVidia will get it done fast enough (~1 year).

Jan.

zed
02-12-2005, 11:17 AM
I think, assuming there is a future, it will be in vendor-specific extensions. This is for the simple reason that multivendor extensions aren't going to match any vendor's hardware close enough to actually be useful.but is there gonna be anyone whos gonna use them?, look at ALL the games (not little tech demos) released to date not a single one AFAIKS streches the hardware and couldnt run with glsl, based on this evidence its safe to say the future is gonna be similar. ppl might say they want this and that but if they dont use it then do they really need it, the effort would be better placed elsewhere.

performance isnt everything, personally i believe ease of use is more important, eg i was writing a small demo last night to test something, now i coded it up with immediate mode, quick easy to make changes (which u often do when youre experimenting) sure i could of used VA/VBOS whatever but then it would of taken more time to make/change the test program. im a great believe in results (does anything else matter)

Korval
02-12-2005, 12:14 PM
And for the CPU comparison : do you write amd_x86 code and intel_x86 code, or do you write x86 code (not speaking about 3dnow/sse*/mmx here) ?GPU's aren't CPU's, despite what people would like you to think. You're trying to compare modern CPU's with GPU's that are hardly in nearly as mature a state.

My point is that, if ATI_fragment_shader were extended to cover the 9500+ hardware, we would be able to give hints/direction to the compiler that would let us work around compiler bugs in their ARB_fp compiler. Would we prefer that these bugs not exist? Sure. Has ATi fixed them over 2 years after getting ARB_fp implemented? No.

We need a way to bypass the harder parts of a compiler so that, if they are buggy, then we can work around those bugs.


look at ALL the games (not little tech demos) released to date not a single one AFAIKS streches the hardware and couldnt run with glsl,While you could arguably call Doom 3 a "tech demo", there's no question that it does push the hardware. HL2 does so as well, though it is a D3D game.

The principle reason why games don't push hardware is because they can't afford to. They need to reach the widest number of people possible, so they develop for lower-end machines.

And I'm not sure what you mean by, "couldnt run with glsl." Following English grammar rules would help.


performance isnt everythingUm, no, actually performance is everything. That people can whip up a little demo in 5 minutes is irrelevant to actual software development. It's this kind of thought that makes people consider OpenGL a hobbiest API.

Performance is the #1 limitation for application developers. If you don't have the performance to do bumpmapping, you don't do it. If you don't have the performance to do that fresnel specular computation, you don't do it. It's that simple.

zed
02-12-2005, 03:51 PM
We need a way to bypass the harder parts of a compiler so that, if they are buggy, then we can work around those bugs.gee ild love to work with your code after youve finished with it :)
if ( card == ati && driver_version_is_between( 32.3, 33.5 )
do this
else if ( card == ati && driver_version_is_between( 35.3, 36.5 )
do this
else if ( card == nvidia && driver_version_is_between( 32.3, 33.5 )
do this

While you could arguably call Doom 3 a "tech demo", there's no question that it does push the hardware. HL2 does so as well, though it is a D3D game.both doom3/hl2 could run with glsl on my gffx

Following English grammar rules would help.a language that doesnt use phonetic(should be f) spelling + has illogical rules
actually now i think about it i can see your attraction
if ( i before e && after c && !not sounds like y)
{
write ie
}
if ( i before e && after c && not sounds like y)
{
write e
}
etc :)


Um, no, actually performance is everything.personally this is my order of preference
1/ get the thing up and running
2/ get it running with correct results
3/ get it bug free/stable
4/ finally get it fast

if i stuck 4 first on the list i wouldnt of released the [big font] worlds first unified lighting game [/big font]
a little while ago, he saiz with no ego ;) , ild still be farting around squeezing the extra 2.34% out of my collision detection stuff.
btw its actually quite fun to play now.

Korval
02-12-2005, 05:06 PM
if ( card == ati && driver_version_is_between( 32.3, 33.5 )
do this
else if ( card == ati && driver_version_is_between( 35.3, 36.5 )
do this
else if ( card == nvidia && driver_version_is_between( 32.3, 33.5 )
do thisNothing so wierd. Simply attempt to compile the shader one way, and if it fails, try another. Standard fallback techniques. It's something real developers have to live with. At least, if they want to ship a functioning product, rather than one where incompetence of drivers can break them.


both doom3/hl2 could run with glsl on my gffxNo, it would not. There's no question about it; the only reason it runs well at all on an FX card is because Id use nVidia-specific API's for programming it. Through glslang, the same shaders would have murdered your card. This is a proven fact. Carmack himself said that the ARB_fp version of his shaders ran so slowly on nVidia hardware that he had to use NV_fp.

Getting a bit off-topic, but...


a language that doesnt use phonetic(should be f) spelling + has illogical rulesI asked for grammar, not spelling. The rules for periods, punctuation, capitalization, and possession are, not only quite logical, but very well specified. Asking that you use the 'shift' key at appropriate places and adding a few punctuation marks here and there so that people can more easily read your text is hardly unreasonable. After all, if your grammer is getting in the way of getting a point across, or annoys those who read it, then you're not getting your point across and thus you wasted your time making the post to begin with.

zed
02-13-2005, 12:13 AM
Carmack himself said that ...didnt u just dis out some other fella for quoting carmack the few messages ago


No, it would not. There's no question about it;how can u be so sure, granted youre prolly more knowledgable with the uptodate features of hardware/hardware direction etc than me, but ive got a sneaking suspicion you havent really developed any major apps yourslef and yet youre telling me (supply evidence otherwise)


Performance is the #1 limitation for application developersdoes the same rules apply to ati driver writers? if so u should be praising them instead of knocking them, cause its likely theyve put performance ahead of stabilty. your reply 'they should give both performance + stabilty features etc'.

you twist and turn like a twisty turny thing

idr
02-13-2005, 12:34 AM
And for the CPU comparison : do you write amd_x86 code and intel_x86 code, or do you write x86 code (not speaking about 3dnow/sse*/mmx here) ? But from an instruction set point-of-view, those are essentially the same. The differences in hardware architecture and instructions set between, say, a Radeon X800 and a Wildcat Realizm is larger than the difference between x86 and PowerPC. Would anyone expect for a second to use the same assembly code on both those?

That's why high-level shading languages are the future. If there is going to be a continuation of the assembly-level shading languages, it will be, by necessity, hardware specific.

V-man
02-13-2005, 05:26 PM
say, a Radeon X800 and a Wildcat Realizm is larger than the difference between x86 and PowerPC. Would anyone expect for a second to use the same assembly code on both those? I don't see why the assemblies can't be extended AND why it can't be implemented on the particular GLSL capable hw.

Is it easier to compile assembly shaders as opposed to GLSL or not?

If there was a single GLSL compiler that could compile to a simpler form (ARB_vp/fp), it would make things more consistent.
I know this has been said before, but it's simply true and it should be considered.

Korval
02-13-2005, 07:49 PM
didnt u just dis out some other fella for quoting carmack the few messages agoI figured someone would point that out. But, then again, I also figured that the person might realize that, in this case, the quote states facts derived from testing rather than his own personal opinions of the future.


how can u be so sure, granted youre prolly more knowledgable with the uptodate features of hardware/hardware direction etc than me, but ive got a sneaking suspicion you havent really developed any major apps yourslef and yet youre telling me (supply evidence otherwise)
No experience is even needed to know this. This is a fact based on knowledge of FX level hardware and how the driver is forced to compile glslang.

Glslang's spec requires 24-bit floats or better for all floating-point types. This forces nVidia to use 32-bit floats, the slowest possible type. Not only that, because 32-bit floats take twice the room of 16-bit floats, it takes up more room in their "temporary cache" thing that they have in their fragment hardware. Taking up more room means that you can have fewer quads in the fragment pipe. Fewer quads in the pipe means that more cycles are wasted. More cycles wasted means slower hardware.

tfpsly
02-13-2005, 11:53 PM
Originally posted by Korval:

how can u be so sure, granted youre prolly more knowledgable with the uptodate features of hardware/hardware direction etc than me, but ive got a sneaking suspicion you havent really developed any major apps yourslef and yet youre telling me (supply evidence otherwise)
No experience is even needed to know this. This is a fact based on knowledge of FX level hardware and how the driver is forced to compile glslang.

Glslang's spec requires 24-bit floats or better for all floating-point types. This forces nVidia to use 32-bit floats, the slowest possible type. Not only that, because 32-bit floats take twice the room of 16-bit floats, it takes up more room in their "temporary cache" thing that they have in their fragment hardware. Taking up more room means that you can have fewer quads in the fragment pipe. Fewer quads in the pipe means that more cycles are wasted. More cycles wasted means slower hardware.In fact it is even worse : the Geforce FX was designed long before it was shipped, but it was delayed because of the 3dfx integration into NVidia. At that time, Nv thought the arb would standardize both integer and float based fp. It seems life float fp were thought not to be too important at the time. They were wrong, and they did not have time to remove their integer fp path and to optimize their float fp path.

martinho_
02-14-2005, 08:46 AM
First, I do not subscribe to the notion that what Carmack says is the divine word of God, or even more significant than what any other graphics programming professional would say. Plus, logic is with me; the source of an argument or position is irrelevant to the veracity of that argument or position.I agree. Don't flame me. I didn't say that. When I quoted Carmack it was just to add one more opinion that I thought It was relevant to this thread.


Equally importantly, Carmack just wants someone else to write the dull, boring, boilerplate code for him. He wants driver developers to write code that takes a shader and breaks it down into multiple passes for inferior hardware so that he doesn't have to do it. He's looking ahead to his next engine knowing full well that he's going to have to write some boring code into that engine to be able to run on R300 hardware just as much as R600 with the same shaders. Wasn't that the purpose of drivers? To make boring hardware abstraction layers once, instead of forcing application programmers to do it in every app they code?. Will this situation, where you have to code a shader for each generation of GPUs, last forever?

Do CPU programmers detect if their CPU is K6/K7/K8/P3/P4 and code a different program? NO, the compiler takes care of this, the only difference the user sees is that it runs different speeds.


personally ive never even written a arb_vp or arb_fp, why not, well after spending time learning gl1.0 then/multitexture/combine/register_combiners etc. and realising they became outdated quickly, i didnt want to make the same mistake again, spending all that effort to learn something that is practically useless today. I totally agree. It's just a matter of "when" assembly shadign languages will become useless, and when this happens all the effort spent learning them would have been a waste of time.
If you start coding your engine today, I bet that when you finish it GLSL will have a decent support.

Aeluned
02-14-2005, 09:52 AM
Um, no, actually performance is everything.
Personally, I disagree. Stability comes first in my opinion. If it doesn't run, it really doesn't matter how fast it could have run.

I just wanted to add that.

Concerning ARB_fp/vp being extended:
Though I never use it, I could see why some people obsess over that extra 2.643% performance gain - that's why it's nice to have that tool available (Same with ASM for CPUs).
To me, such a gain is nonsense and is easily overshadowed by the cut in development time from a high level language.

Naturally there are driver bugs, and I don't disagree that GPUs aren't CPUs.
But the topic was the future of shaders and shader hardware - GPUs should be like CPUs. I'm talking about how things should be here.



Do CPU programmers detect if their CPU is K6/K7/K8/P3/P4 and code a different program? NO, the compiler takes care of this, the only difference the user sees is that it runs different speeds.
exactly my point: this sort of thing would suck.

All I'm arguing is that the development phase shouldn't be all mirked up with hacks for workarounds to bugs.

Korval
02-14-2005, 11:16 AM
To make boring hardware abstraction layers once, instead of forcing application programmers to do it in every app they code?Yes, but multipassing in drivers is far from a "hardware abstraction layer". It looks much more like a "scene graph" or "engine", and thus has no place in OpenGL.

V-man
02-14-2005, 12:36 PM
Originally posted by Aeluned:
[QUOTE]
Personally, I disagree. Stability comes first in my opinion. If it doesn't run, it really doesn't matter how fast it could have run.

I just wanted to add that.

Concerning ARB_fp/vp being extended:
Though I never use it, I could see why some people obsess over that extra 2.643% performance gain - that's why it's nice to have that tool available (Same with ASM for CPUs).
To me, such a gain is nonsense and is easily overshadowed by the cut in development time from a high level language.
There are two ways to interpret the conversation.
I think it wasn't "Which is of prime importance : performance or stability".

GLSL shaders are buggier and have worst performance "on certain cards".
What's the point of using GLSL if it can cut >35% your performance.

Both performance and stability is needed, otherwise I might as well write my 100 line shader in assembly or use a third party high level language that targets ARB_vp/fp.


Originally posted by Aeluned:
[QUOTE]
Naturally there are driver bugs, and I don't disagree that GPUs aren't CPUs.
But the topic was the future of shaders and shader hardware - GPUs should be like CPUs. I'm talking about how things should be here. I really don't know what this is suppose to mean.

Shaders CAN be a bottleneck.

What attribute of CPU's do you want ported to GPU's? What would make GPU run shaders faster/better?

knackered
02-15-2005, 01:33 AM
Originally posted by Korval:

To make boring hardware abstraction layers once, instead of forcing application programmers to do it in every app they code?Yes, but multipassing in drivers is far from a "hardware abstraction layer". It looks much more like a "scene graph" or "engine", and thus has no place in OpenGL.Eh? What has multipassing a triangle got in common with scenegraphs or 'engines'?
The driver is in the best position to decide how to split a job across its resources...we don't have to tell the driver how to best use it's 16 vertex pipes or whatever, so why, if we have a suitable high level shader language to describe a job, should we have to tell it how to use its pixel pipes in order to accomplish the shader?
It's the next logical step, but of course this asm/c hybrid step is needed too, but I wouldn't dismiss multipassing in the driver as never being appropriate because it's a 'scene management' issue!

tfpsly
02-15-2005, 03:09 AM
Originally posted by knackered:

Originally posted by Korval:

To make boring hardware abstraction layers once, instead of forcing application programmers to do it in every app they code?Yes, but multipassing in drivers is far from a "hardware abstraction layer". It looks much more like a "scene graph" or "engine", and thus has no place in OpenGL.Eh? What has multipassing a triangle got in common with scenegraphs or 'engines'?
The driver is in the best position to decide how to split a job across its resources...we don't have to tell the driver how to best use it's 16 vertex pipes or whatever, so why, if we have a suitable high level shader language to describe a job, should we have to tell it how to use its pixel pipes in order to accomplish the shader?
It's the next logical step, but of course this asm/c hybrid step is needed too, but I wouldn't dismiss multipassing in the driver as never being appropriate because it's a 'scene management' issue!There might be several issues, like for example :
* the driver must not compromise the gl states in case the program also uses the fixed path.
* the program might just run far too slow if we keep doing multipass/multitexturing
* some shader stuff will not be possible using the fixed path unless we do some demoscene-like tricks that might compromise the program behavior. Maybe the program don't want us to mess the alpha buffer , or whatever.