PDA

View Full Version : NVidia: Where has GL_EXT_vertex_weighting gone?



Christian Schüler
10-21-2003, 10:46 PM
The slow death of an extension :-)
----------------------------------

In Detonator 41.xx the extension was exposed in the extension string.

In Detonator 45.xx, the extension was not exposed in the extension string, but the entry point was still there.

The new detonator driver 52.14 doesn't support it. The entry point is 0.

flo
10-21-2003, 11:13 PM
From the specification:


Status

Discontinued.

NVIDIA no longer supports this extension in driver updates after November 2002. Instead, use either ARB_vertex_program & NV_vertex_program.

Christian Schüler
10-21-2003, 11:36 PM
You are right, I should have looked there first.

tfpsly
10-21-2003, 11:39 PM
http://www.flipcode.com/cgi-bin/msg.cgi?showThread=00009347&forum=3dtheory&id=-1

crossposting is the Evil.

knackered
10-22-2003, 12:39 AM
Explain why you think posting the same question to more than one newsgroup is evil please.
Seems like the most logical thing to do.

dmoc
10-22-2003, 01:34 AM
But doesn't this restrict a developer to only new cards? I only have GF256, GF2 DDR and a GF4mx... all crap I know in comparison to newer cards, but still, I bet there are many people still using them.

Christian Schüler
10-22-2003, 02:05 AM
Many people may still be using them, but that doesn't help NV to sell new parts. So I wouldn't be surprised if they would rather not spend time to re-implement this extension in their drivers.

dmoc
10-22-2003, 03:21 AM
So am I likely to buy a new NV card with shallow marketing tricks like this? No, I don't think so.

flo
10-22-2003, 03:39 AM
Vertex programs are supported by GeForce 256 and better (In software, but they still run well). So i guess also for those cards, the transition to use vertex programs should be possible.

[This message has been edited by flo (edited 10-22-2003).]

Christian Schüler
10-22-2003, 04:34 AM
I have tested the emulated vertex shaders (at least with Detonator 45.xx) and they are not useful for me. I rather write my own 3Dnow/SSE code.

On an Athon 600 Mhz with GF2 MX and custom code, I can get 19 M point-lit triangles / s this way, something neither the hardware alone or the emulated software shaders will give me.

Cyranose
10-22-2003, 06:43 AM
Originally posted by cschueler:

I have tested the emulated vertex shaders (at least with Detonator 45.xx) and they are not useful for me. I rather write my own 3Dnow/SSE code.

On an Athon 600 Mhz with GF2 MX and custom code, I can get 19 M point-lit triangles / s this way, something neither the hardware alone or the emulated software shaders will give me.

That's the best route for general support, though SSE and a fallback CPU-only path may cover most hardware at this point.

A client of mine had the same issue. They benchmarked the old weighting extension at 100 cycles per vertex (2 matrix). The equivalent SSE code was less than 50 (and I was still learning SSE at the time).

Interestingly, they also decided to use the software skinning path for older ATIs as well.

Avi


[This message has been edited by Cyranose (edited 10-22-2003).]

Zengar
10-22-2003, 07:38 AM
I thought vertex programs in software where aviable also with TNT2 etc. And it is rather fast implementation.

Ostsol
10-22-2003, 07:53 AM
Does the TNT2 have hardware T&L? If not, then I guess software emulated vertex programs certainly wouldn't be much slower than transforms usually would be on that card.

deadalive
10-22-2003, 08:15 AM
This sucks. I had a demo that made EXTensive use of this extension and now I see it's pretty much useless.. I have no idea how to replace what I was doing with a shader (or I'd have used one to begin with..)
Damn indian givers, that's it I refuse to update my nvidia drivers or hardware from now on (joking)..

Zeno
10-22-2003, 09:16 AM
There are definitely two schools of thought about deprecation of hardware features.

One school thinks that, no matter what, backwards compatibility should never be broken. Most people seem to be in this camp, based on the comments I've seen in this thread and the old one on paletted textures.

The other school thinks that old baggage should be discarded if the functionality is a subset of some new functionality. I know I'd rather have another programmable vertex shader in parallel than keep around the old fixed-function T&L. The drivers could build shaders on the fly to emulate old fixed-function.

I'm not sure yet which school of thought I subscribe to. I tend to lean towards the second because I want hardware to move forward as quickly as possible and not waste silicon on old functionality. On the other hand, as a programmer, I will probably be annoyed when some extension I have used in the past is no longer supported (I am dreading when register combiners go the way of the dodo).

I guess it seems like the best thing would be if the drivers continued to support old extensions by emulating them using new extensions (behind the scenes). Why not do this?

Anyway, just my 2 cents on the issue.


[This message has been edited by Zeno (edited 10-22-2003).]

Korval
10-22-2003, 09:58 AM
I'm definately in the "out with the old, in with the new" camp. I don't mind an extension surviving a few generations, but once superior functionality is avaliable, the old extension should be lost.

At the very least, functionality that was not very good to begin with, and was not widely used (EXT_vertex_weighting falls into this category) should be a prime candidate for removal. Sure, there are a few vertex weighting demos out there, but no actual product ever even considered using it. The extension didn't expose decent functionality, and better functionality exists.


On an Athon 600 Mhz with GF2 MX and custom code, I can get 19 M point-lit triangles / s this way

With EXT_vertex_weighting? I highly doubt it. The size of your strips for any complicated model would be too small to effectively get around per-primitive and state-change overhead.


Interestingly, they also decided to use the software skinning path for older ATIs as well.

Not surprising. Hardware skinning hasn't really become reasonably avaliable until the advent of vertex shaders. The vertex_blend extension did make a valiant attempt to provide for decent skinning, but vertex programs are the prefered and superior method. Vertex_blend was never supported by nVidia, and ATi was much smaller than they are now, so nobody bothered to use it. And now, we have vertex programs for our skinning needs.


I am dreading when register combiners to go the way of the dodo

RC's are never going away; Doom3 supports them. Just like CVA's, you're never going to get rid of an extension that is (going to be) so widely used.


I guess it seems like the best thing would be if the drivers continued to support old extensions by emulating them using new extensions (behind the scenes). Why not do this?

To an extent, this is being done. However, for each extension that needs to be back-ported with new functionality, driver development time is wasted. I'd rather nVidia spend their time improving their fragment-program compiler than on back-porting EXT_vertex_weighting.

For ATi, this might be reasonable, because they already have a framework in their driver for building shaders for old functionality; they no longer have hardware fixed-function support, so they had no choice. nVidia still have various bits of hardware lying around, so they never had to write shader compiling code to do this kind of thing. For them, it would be a significant undertaking if any actual fixed-function hardware is removed.

Ysaneya
10-22-2003, 10:06 AM
I'm also in favor of suppressing older/deprecated extensions.

Try to be realistic, for a vendor it's a nightmare to support everything, and to make sure everything is bug free. I'd rather have NVidia or ATI work on the new extensions, than loose their time trying to maintain old ones.

In addition, the OpenGL extensions mechanism has become a real mess. How many extensions are available at the moment? I'm sure we're not very far from the hundredth. It makes the whole API a nightmare to maintain, with difficult dependencies between new and older extensions. I'd even dare to say that proprietary extensions should be dropped in favor of ARB ones, when possible. And when i mean "dropped", i mean, completely removed from the driver/extension string.

I hate to say it, as i'm an OpenGL programmer at heart but... DX9 is much better in this area. Coding advanced effects in OpenGL is Tricky.

Y.

jwatte
10-22-2003, 02:30 PM
Two comments:

1) This extension never performed very well, and wasn't useful for much, so I don't miss it.

2) Saying that software vertex programs "run well" on a GeForce 2 is only true if the CPU is otherwise idle. The product I'm working on pushes enough polys and does enough physics and other things that a Pentium IV at 2.4 GHz is NOT ENOUGH to match a GeForce 2 MX and a Pentium III/800.

Christian Schüler
10-23-2003, 06:54 AM
> With EXT_vertex_weighting? I highly doubt
> it. The size of your strips for any
> complicated model would be too small to
> effectively get around per-primitive
> and state-change overhead.

Of course I don't get 19 M verts/s with vertex weighting, I meant the 19 M reference as example of how custom CPU code can actually be faster than shaders or hardware T&L (on older cards).

Korval
10-23-2003, 09:33 AM
Of course I don't get 19 M verts/s with vertex weighting, I meant the 19 M reference as example of how custom CPU code can actually be faster than shaders or hardware T&L (on older cards).

True though it may be, is your CPU doing anything but T&L? A GeForce 256 can get around 4-8M lit tris, but it frees up the CPU significantly.

Cyranose
10-23-2003, 09:52 AM
Originally posted by Korval:
True though it may be, is your CPU doing anything but T&L? A GeForce 256 can get around 4-8M lit tris, but it frees up the CPU significantly.

With respect to two-matrix blending, we found that if properly interleaved, CPU/SSE-based blending could fit neatly in parallel with the rendering of the blended verts. So for a series of characters, blend A, draw A, blend B, draw B, etc..., the "draw A" and "blend B" phases could be done more or less in parallel. This required large batches of VAR'd verts to be rendered with a single glDrawElements call, which isn't hard considering the blending puts the verts in a single coordinate space and the textures were pre-combined.

Someone may point out that the CPU could be doing something else during that time too, so it's not truly "free." But in that single-threaded app, without a fine-grain list of schedulable tasks, blending N verts and drawing N verts in parallel was a nicely balanced pair of activities.

Avi

Christian Schüler
10-24-2003, 02:51 AM
Originally posted by Cyranose:
With respect to two-matrix blending, we found that if properly interleaved, CPU/SSE-based blending could fit neatly in parallel with the rendering of the blended verts. So for a series of characters, blend A, draw A, blend B, draw B, etc..., the "draw A" and "blend B" phases could be done more or less in parallel. This required large batches of VAR'd verts to be rendered with a single glDrawElements call, which isn't hard considering the blending puts the verts in a single coordinate space and the textures were pre-combined.

Someone may point out that the CPU could be doing something else during that time too, so it's not truly "free." But in that single-threaded app, without a fine-grain list of schedulable tasks, blending N verts and drawing N verts in parallel was a nicely balanced pair of activities.

Avi

I second this. Interleaving CPU and GFX can give great performance. I have had occasions where this even pays off things simple like a texgen.

Korval
10-24-2003, 08:43 AM
Interleaving CPU and GFX can give great performance.

If you don't care about anything other than rendering a scene, of course. If, however, you are interested in, say, running a physics simulation or some kind of game, the CPU time is critical. It is important to these applications that as little CPU time is spent on graphics tasks as possible.


Someone may point out that the CPU could be doing something else during that time too, so it's not truly "free." But in that single-threaded app, without a fine-grain list of schedulable tasks, blending N verts and drawing N verts in parallel was a nicely balanced pair of activities.

Except that it takes CPU time that could, if well written, be spent on other tasks. Even other rendering-based tasks (creating matrices for the next character, generating state, even compiling dynamic shaders).

Cyranose
10-24-2003, 09:56 AM
Originally posted by Korval:
Except that it takes CPU time that could, if well written, be spent on other tasks. Even other rendering-based tasks (creating matrices for the next character, generating state, even compiling dynamic shaders).

I think you may need to define "well written" in this case. For example, in most systems I've seen or worked on, character matrices, IK, physics, state determination/sorting and dynamic shaders are all necessarily resolved well before rendering begins. I'm sure it's possible to design versions of those that work in parallel, but it could have other implications, such as adding a frame of latency...

The core of it is that scheduling arbitrary (or even dynamically picked) tasks to fit between draw calls is extremely difficult to make work consistently in your favor. If you miscalculate, you can wind up not gaining anything or causing the GPU to wait, which is the worst case. Relying on the windows scheduler (for those of us who don't use a real-time OS), doesn't add much when the time slices are very small. What seems to work best (IMO) is finding balanced, predictable CPU and GPU tasks and pipelining those together. These are very small time-slices and so something like a 50-cycle per vertex times N vertex task, where the verts are computed, written to VBOs, and rendered immediately is very predictable and pipelinable.

As for GPU/CPU tradeoffs, keep in mind that using a few more CPU cycles in parallel with the GPU may make the rendering faster (by using simpler, even NOP shaders on occasion) and therefore free up more big chunks of time at the beginning and ends of the frame for other less pipelinable tasks, such as physics.

Avi

[edit: read NOP in this case to mean minimal, not literally no instructions. A shader needs to at least set the output vertex fields and perhaps do lighting here if nowhere else.]


[This message has been edited by Cyranose (edited 10-24-2003).]

Christian Schüler
10-25-2003, 04:25 AM
Originally posted by Cyranose:
[B] These are very small time-slices and so something like a 50-cycle per vertex times N vertex task, where the verts are computed, written to VBOs, and rendered immediately is very predictable and pipelinable.

[B]

I did exactly the same. 50 CPU cycles per vertex allows for some dot products for texgen, or for lighting, or softskinning. Then the results are written into a STREAM_DRAW buffer and streamed together with the static rest, vertex positions for instance. This approach has given me the fastest overall throughput I could ever get. Not only peak performance for untextured, unlit triangles during Z-layout, but the same performance for any render pass.


Regarding to multiplexing CPU time I have had this idea, but not tried anything in this direction yet. What I had in mind is a kind of multiplexer that has a list of tasks to do (for instance, calc physics) and the renderer can yield cycles to the multiplexer whenever the CPU has to wait for GFX. Not coded as multithreading, just cooperative with timeouts.