ARB_vertex_program2 when?

I hope all the ihv get together and work on this. I understand that it’s much faster to ignore everyone else and work on your own stuff but one interface sure would be neat. How many of you are planning on using NV_vertex_program2 extension? I’m not.

me neighter. it sucks in the language, a really ugly assembler. just made to force people to use cg…

I don’t see any point to using it directly if we can use Cg and make the Cg compiler generate all that ugly code for us.

-SirKnight

A bit off topic: anyone has an idea why SIN/COS were left out from the spec? They are present is ARB_fp.

I feel the asm approach is the correct one - other extensions / higher level languages will build on this.

I don’t have anything against asm shaders since the programs are usually very short for games at least. I used d3d9 hlsl and asm and eventhough hlsl is easier to use I felt ok using asm(it’s more straighforward than hlsl). I won’t use CG but I might use glslang in the future. Shading language needs to be governed by ARB and not one company to favor their way of thinking.

I think there’s much interest in an ARB_vertex_program2 extension right now. Most focus is on finishing GL2 glslang, and once that’s done noone would want a asm language anymore.

Originally posted by davepermen:
[b]me neighter. it sucks in the language, a really ugly assembler. just made to force people to use cg…

[/b]

What do you mean it sucks and its for forcing people into cg?

NV_vertex_program2 is just an extenstion of 1.0 and 1.1

And it’s not really different then using ARB_vertex_program except for about 3 new features. It just adds looping, conditional execution and subroutine calling. Maybe a few other instructions too.

I have no real use for it at the moment but it’s a logical progression.

Originally posted by Humus:
I think there’s much interest in an ARB_vertex_program2 extension right now. Most focus is on finishing GL2 glslang, and once that’s done noone would want a asm language anymore.

There are a couple of nice things about having low-level interfaces:

  • reduced driver complexity
  • 3rd party shading language design

Driver quality has always been a dodgy issue for OpenGL implementors. Having a simple-to-implement, reliable interface is not a particularly bad idea.

Hardware shading languages are still a relatively new thing, and we should expect them to change (perhaps significantly) as hardware becomes more general and capable and we learn the most natural programming models. Allowing 3rd party development of shading languages is a good idea, and a flexible, powerful, low-level interface is usually the best target for a shading system.

I’m all for high-level shading. I just don’t think it’s time to do away with low-level interfaces.

Thanks -
Cass

vertex_program2 is needed for a few things which most current hardware can support:

Subroutines
Conditionals
Jumps
Loops

fragment_program2 could conceivably add at least a little bit of predication without breakign on current hardware, but fragment_program feels more complete than vertex_program.

Hi,

I plan to use ARB_vertex_program2, since I prefer the assembler approach to the high level langage approach to program the video card.

The current complexity of GPUs don’t really justify the use of a higher level langage, in my eyes at least.

regards,

that’s the same as asm vs. c/c++.
i think hlsl is really nice. less work,less errors … at least it can be used for fast prototyping of shaders. if exec speed isnt fast enough and you have the time, go back to asm and hack around.

If arb_HLSL becomes a part of the driver it doesnt have to compile to arb_vp first and then to internal format… it can just skip the middle layer. Thats good, it makes the internal implementation more flexible. like for instance vertex*matrix that always has to be in 4 operations nowadays… if the chip has a single instruction for that it will be unused until ARB ratifies the VP spec. With hlsl it could use it immediately.

but until then i will use arb_vp, and i would like to see a vp2 spec with branches and such stuff

Back to the original question. Does anybody know the status of ARB_vp2?
In december meeting notes: “Pat will convene a new working group…”
But in the march notes: “The low-level instruction set is not complete (e.g. supporting looping/branching constructs). There’s still interest in reviving the vertex_program WG and doing this work.”
So it seems that none has been done.

Originally posted by cass:
[b] There are a couple of nice things about having low-level interfaces:

  • reduced driver complexity
  • 3rd party shading language design

Driver quality has always been a dodgy issue for OpenGL implementors. Having a simple-to-implement, reliable interface is not a particularly bad idea.

Hardware shading languages are still a relatively new thing, and we should expect them to change (perhaps significantly) as hardware becomes more general and capable and we learn the most natural programming models. Allowing 3rd party development of shading languages is a good idea, and a flexible, powerful, low-level interface is usually the best target for a shading system.

I’m all for high-level shading. I just don’t think it’s time to do away with low-level interfaces.

Thanks -
Cass[/b]

Well, I think it’s time to do away with it. The sooner the better. The bad things about maintaining asm languages by far outweights the good things. Not only will it just sum up to the same huge mess as all fixed function stuff did over the years as technology progresses, but it will also put arteficial restraints on how hardware is designed, which sucks. We should have learned from the x86 that asm middle layers will just prevent innovation, cause trouble and limit performance. I don’t think we should repeat that for GPU’s.

Originally posted by Humus:
Well, I think it’s time to do away with it. The sooner the better. The bad things about maintaining asm languages by far outweights the good things. Not only will it just sum up to the same huge mess as all fixed function stuff did over the years as technology progresses, but it will also put arteficial restraints on how hardware is designed, which sucks. We should have learned from the x86 that asm middle layers will just prevent innovation, cause trouble and limit performance. I don’t think we should repeat that for GPU’s.

What are the bad things about a low-level interface?

Note that I’m not saying a “limited” interface. Just “low-level”.

  1. As I said, it adds up to a huge mess over time. Look at DirectX, we have ps1.1, ps1.2, ps1.3, ps1.4, ps2.0. You must support all these interfaces. Where are we in 3 or 4 years? ps3.0, ps3.1, ps4.0, ps4.1, ps5.0, ps6.0… ? All which must be supported.

  2. It limits innovation. Hardware will have to be designed around a certain set of instructions. You’re kicking a huge level of optimization out of the sight from the driver. If you for instance have special hardware for executing sines and cosines you should be able to use it, something that will not be feasible if you’re fed with a low level taylor series, especially if the compiler have tried to optimize and rescheduled instrucions.

  3. It limits performance. Look at DX9 HLSL today. If I compile a shader using fancy swizzling, say .xzzy and use ps2.0 as a target, then your beloved GFFX which should be able to do that in a single instruction will have to run several instruction for that swizzle due to hardware limitations that exists on ATi boards, all because of this middle layer and that pesky ps2.0 target that is designed around the smallest universal set of functionality.

As I said, it adds up to a huge mess over time. Look at DirectX, we have ps1.1, ps1.2, ps1.3, ps1.4, ps2.0. You must support all these interfaces. Where are we in 3 or 4 years? ps3.0, ps3.1, ps4.0, ps4.1, ps5.0, ps6.0… ? All which must be supported.

I don’t imagine that ARB_vertex_program_2 is going to do much more than add instructions. It’s not an issue of supporting an interface then; it’s simply a matter of whether or not you compile opcodes.

As for the ps1.* mess, blame Microsoft. Per-fragment programs didn’t even really exist at that time; it was more like a more flexible fixed-function pipe. Microsoft wanted to turn it into some kind of assembly, rather than recognise it for what it was.

ps3.0 won’t look much different from ps2.0. Mostly new opcodes.

It limits innovation. Hardware will have to be designed around a certain set of instructions. You’re kicking a huge level of optimization out of the sight from the driver. If you for instance have special hardware for executing sines and cosines you should be able to use it, something that will not be feasible if you’re fed with a low level taylor series, especially if the compiler have tried to optimize and rescheduled instrucions.

Which is why you have new revisions of your low-level language. All it adds are new instructions.

You haven’t pointed out why these have to go into my drivers (and come out of driver development time). The Cg method is to compile to some assembly language that can then be further compiled into a program object. Since you can pick and choose which target you want, you don’t much need to worry about the underlying hardware issues.

If it can stand alone, it should stand alone. And I don’t want driver developers spending their valuable time writing a C compiler when they could be getting me better performance.

If I compile a shader using fancy swizzling, say .xzzy and use ps2.0 as a target, then your beloved GFFX which should be able to do that in a single instruction will have to run several instruction for that swizzle due to hardware limitations that exists on ATi boards

I hope you’re not assuming that all hardware actually runs the “assembly” code just as it’s written? Think of the shader “assembly” code as the logical equivalent of a Java Virtual Machine bytecode. The driver compiles to whatever the hardware is doing at the time of load. It can “easily” be made to detect a swizzle to temporary preceding an operation on that temporary, and turn that into a single “native instruction” if that’s what the hardware provides.

Originally posted by Korval:
Which is why you have new revisions of your low-level language. All it adds are new instructions.

Which is what x86 did over the years too, add new instructions. And it sure it a fatass, inefficient mess today.

What instructions do we want? It may be clear today, but in 5 years it will most likely be different. Do we want vector instructions or many independent scalar processors? Do we want attributes and constants to be limited to four components? Things will change, and we’ll get a fatass mess that will hunt us for many years thereafter.

Originally posted by Korval:
You haven’t pointed out why these have to go into my drivers (and come out of driver development time). The Cg method is to compile to some assembly language that can then be further compiled into a program object. Since you can pick and choose which target you want, you don’t much need to worry about the underlying hardware issues.

I have pointed it out already. Compiling to a target and then loading that shader is not only inefficient, but it will cause a huge mess over the years. Not only will we need to create new assembler version, we will also need to update the compiler, and we need to update optimizers for each shader version we create. We also loose the ability to optimize on high level, which will also limit IHV innovation. We’ll not get the maximum out of our hardware and driver writers will have to spend MORE time trying to reverse engineers a low-level assembler into their high level meanings than would they have direct access to the shader itself.

Plus, why should I have to care about targets? Give me one valid reason. What I care about is the hardware and I want to get the max out of it and get the max out of future hardware. I don’t want to have to detect ps shader version and try to select the best combination of what the compiler support and what versions the driver support.

To turn the question around, give me one valid reason why the driver should NOT have access to the high-level shader?

Originally posted by Korval:
[b]If it can stand alone, it should stand alone. And I don’t want driver developers spending their valuable time writing a C compiler when they could be getting me better performance.

By taking the approach of compiling against a target pixel shader version instead of targetting the underlying hardware you have already lost your performance. I don’t want driver writers to have to spend their valuable time trying to figure out high-level sematics by reverse engineering low-level assembler.

Originally posted by jwatte:
I hope you’re not assuming that all hardware actually runs the “assembly” code just as it’s written?

Which is the actual point. It doesn’t represent the hardware. Today it may slightly does, but in the future it wont. Just as the x86 instructions does not represent the executing hardware. The CISC instructions are decoded into micro-ops and executed on RISC processors. And the prices we pay is in transistors, power consumption, heat and a non-optimal executing of code. Why not learn from the history? Why repeat the x86 debacle all over again?

Originally posted by jwatte:
Think of the shader “assembly” code as the logical equivalent of a Java Virtual Machine bytecode. The driver compiles to whatever the hardware is doing at the time of load.

Which is an inefficient model and proven to be so by compilers that compile java code directly into native platform code. Despite loads of effort Sun has put into it, their virtual machine will never run as fast as the java code compiled directly to hardware code. The VM already runs lots of code natively, tries to match the bytecode into native code whenever possibly, but since the bytecode does not represent the underlying hardware we aren’t getting anywhere near the performance we could get. Not only that, but I can garantuee that the amount of time Sun has spent optimizing the bytecode compiler and the VM easily exceed whatever effort the gcc teams put into optimizing their java-compiler.

Originally posted by jwatte:
It can “easily” be made to detect a swizzle to temporary preceding an operation on that temporary, and turn that into a single “native instruction” if that’s what the hardware provides.

But why the double work? First the compiler splitting it into many instructions, then the driver has to go through the shader and try to figure out where it can pack instructions into one and eventually (if successful) be back where we started. It’s a waste of time. What if the compiler tries to be smart and reschedules instructions in an order that it thinks will benefit some hardware? Will the driver still find these swizzles it can pack into one? Swizzles are also fairly simple. Will the driver be able to detect trigonometric functions expresses as series of adds and muls? Powers? Exponentials? It goes on.

[This message has been edited by Humus (edited 04-17-2003).]