Vertex and Shader Programs

just becuase glslang is here, it doesn’t mean that VP/FP should be neglected.
I suggest that there would be 2 routes … a high level langauge implementation, i.e. glslang … and an assembly langauge route, thus extending ARBvp1.0 and upgrading it to ARBvp2.0 and then to ARBvp3.0, etc …
This would give developers who want to fine tune their code a straight forward way to do that. And let’s face it, assembly programs for the GPU are easier to manage than for the CPU since the length of the program on a GPU is magnitudes smaller, simply because they are small procedures that perform specific tasks … that is, they are not full blown applications.
Also, it would be better to allow the output from glslang to be either in assembly format or native GPU code, depending on a compiler switch … this would also give developers the ability to fine tune the code generated by the compiler.
it is dangerous to let glslang compilers do all the work, since fine-tuning a rendering operation is a work of art … and frankly, I don’t see any compiler being able to produce tight code … look at x86 assembly and c/c++ … a good assembly coder can easily produce code that is 2 times faster than that produced by the best c/c++ compiler out there.
And rendering is all about speed and getting the most out of the GPU … that is these are not run of the mill apps. where assembly quality code can be sacrificed for the sake of convenience of easier to write high level code. anyways, i think DX offers assembly code for VP2.x and VP3.x alongside its HLSL … I have no idea why OpenGL doesn’t support this.

a good assembly coder can easily produce code that is 2 times faster than that produced by the best c/c++ compiler out there.

Think so? Try this.

I’ll make up an assembly language. However, I’m not going to tell you anything about scheduling or pipelining of the CPU that uses that assembly language. Optimize code written in it.

You can’t because you lack crucial information for doing that optimization. So, tell me, how does scheduling and pipelining work on ARB_fp or ARB_vp. You can’t, because it is implementation dependent. And the implementers (ATi, nVidia) aren’t going to tell you enough about the hardware to really optimize your shaders around them.

Also, as to your assertion that a good assembly programmer can make better code than a good optimizing C/C++ compiler, bull. This is only the case when the compiler isn’t using features of the chip (like MMX, SSE, or 3DNow). The driver’s compiler will certainly be using said features of the hardware.

Intel’s compiler is a very good compiler; generally considered the best Pentium compiler. Why? Because Intel wrote it; they have intimate knowledge of their chips. ATi and nVidia have similarly intimate knowledge of their hardware. We don’t. And we aren’t going to in the future.

You’re asking for an ongoing maintenance nightmare and if you believe you need it to “optimize”, you need reality banged into your head, hard and repeatedly, until it sticks.

Firstly, I don’t want to start a flame war. I support Korval’s (other) points.

Also, as to your assertion that a good assembly programmer can make better code than a good optimizing C/C++ compiler, bull.

Unfortunately, that’s still not true even today. It’s become harder, but not impossible to increase the speed of some routine by a significant amount (on a particular chip) by writing the code directly in assembly. The main reason is the assumptions that can be made by the assembly programmer are much looser and tuned to the problem at hand than the concervative assumptions the compiler must make.

That said, a good C/C++ compiler will optimize all the time, and will do it orders of magnitude quicker than a good assembly programmer, at small fraction of the cost.

[This message has been edited by al_bob (edited 02-18-2004).]

hehe… well, at least this:
TEMP det, blend, base;
TEX base, fragment.texcoord[0], texture[0], 2D;
TEX blend, fragment.texcoord[0], texture[1], 2D;
TEX det, fragment.texcoord[1], texture[2], 2D;
DP3 det, det, blend;
MUL result.color, base, det;
is about twice as fast as

vec3 Blnd=(texture2D(Blend, DetTexCoord)).xyz;
vec3 Dtl=(texture2D(Detail, MapTexCoord)).xyz;
vec4 Bse=texture2D(Base, DetTexCoord);
gl_FragColor=dot(Blnd,Dtl)*Bse;

but that doesnt mean handwritten assembly is so much better. it just means that ati still has a long way to go til their glsl support is something i’d want to touch (either that or glsl is so completely different, that you shouldnt compare them).

id like the programs to stay for another reason: in comparison using glsl feels horribly complicated by binding attributes to attributes (why arent those built-in already?). the “language” might be cleaner, the compiler can take care of optimization, but for just quickly trying something all the setup is a pain. just like it would be if they removed immediate mode and every lousy quad would require setting up vertex arrays.

I suggest that there would be 2 routes… a high level langauge(sic) implementation, … and an assembly langauge(sic) route

I agree. Essentially provide an instruction oriented interface (assembly), as well as a procedural oriented interface(glsl).

…assembly programs for the GPU are easier to manage than for the CPU…

In general this may be the case, but not with all rendering. Look at some of the vertex programs for rendering slimy surfaces. They can be huge, thousands or even tens of thousands of instructions.

… look at x86 assembly and c/c++ … a good assembly coder can easily produce code that is 2 times faster than that produced by the best c/c++ compiler out there.

x86 assembly language was designed with one overriding goal a compact memory footprint. It also used a segmented addressing model. Over several generations, (8086, Pc Jr., 286, 386a, 386b, 486, 586, 686a, 686b, P2, P3, P4a, P4b), it evolved several features while maintaining backwards compatibility. It was practically designed to be hard to compile. (Don’t argue or I’ll make you write 16-bit real-mode code to interface with a modern OS). Look at some RISC, or VLIW assembly languages, it’s like compiler candy.

I will also concede that C/C++ compilers aren’t as good as they should be. Most compilers were designed with speed of compilation (not speed of compiled code) in mind. There are many perfect algorithms for instruction scheduling, register allocation, etc. Algorithm known since the 1960s and early 1970s, however they have enormous run-times (NP-complete problems). Most compilers settled for a good-enough solution, a solution which was rarely best. So naturally, an assembly programmer could do better, because the compiler wasn’t doing its job. (It was doing good-enough.)

rendering is all about speed and getting the most out of the GPU

OpenGL isn’t just for games. It can be used to obtain the best image quality, or the most accurate image. It is used in CAD, movies, scientific visualization, and many other situations where rendering speed is a secondary concern.

id like the programs to stay for another reason: in comparison using glsl feels horribly complicated by binding attributes to attributes (why arent those built-in already?). the “language” might be cleaner, the compiler can take care of optimization, but for just quickly trying something all the setup is a pain.

If glslang is not clean enough, take it up with the makers of the language. This can, and should, be improved.

good point. what i actually meant was that glsl might be cleaner code than programs in pseudo-assembly. but now that you mention it, besides vertex attributes, why cant i initialize samplers and have to set them to a tex unit from within the program. all those things would be nice and handy as an option but are somewhat annoying if they are mandatory.

but so far its hard to make good suggestions, as i can never tell if the problem is with my code or with the implementation. considering that the compiler produced code thats twice as slow as the frag program i tend to blame the implementation though ,-)

Ok. I have been following your responses and i do have a few notes.
The first is, There is no need for some of you to get aggressive … we are merely floating/exchanging ideas here to make OpenGL better.
If someone doesn’t agree with someone else’s idea then you may just say so without blowing his/her head off. let’s try to be civilized.
Now, As for the debate about compiled code vs. handwritten assembly … I think that having both implementations would give the coder more options, regardless of the point which we are debating about (which code is faster). since frankly, i doubt that we will agree on which code is faster. but it is still nice to have assembly as an option.
Now, as for modifications to GlSlang: That may be a good idea … making the linking process faster is difinitely a plus.
I understand the point that some of you have mode that speed is not that important all the time since OpenGL is not just about games, and that it can be used in many other applications … well, having more speed and a faster, leaner OpenGL will definitley not create any problems with using OpenGL in other non-game apps. … then why not have OpenGL be as fast as it can be ? what have we got to loose ? … nothing of course.
A more powerful implementaion of the API is good for coders, good for IHVs, good for all types of ISVs and good for the OpenGL community … so let’s work together instead of against each other …

regards.

OK, so you are saying it’s nice to have the option.

The problem is, it needs people to write a spec, then it needs to be approved, then implemented, …

What if you encounter a vendor that provides support for GLSL but not ASM?
And another that does it the other way.

I don’t think that people will like coding for both.

Should both become part of the core forcing vendors to implement both?

IMO, GL needs to be FOCUSED on one goal, both specwise and driver development.

It certainly took some time for ATI and NV to improve their ASM support. There were plenty of bugs in the beginning!