More musings on NV_vertex_program and skinning

system · December 5, 2000, 8:26pm

Okay, so it’s six GPU instructions per
matrix you want to skin, plus the “standard”
20 or so for hardware T&L. That leads me to
believe you can squeeze 16 matrices into the
128 instructions allowed for a vertex
program – just barely – and retain
transform and lighting.

Each process would have to be skinned with
all 16 bones, even though many weights will
be zero, because there are no tests or
branches. It’s four instructions for the
matrix multiply for the bone; one DP4
between vertex parameters and constants to
select the correct weight, and one MAD to
accumulate the result into temp (for output).

While reading the specification, I noticed
how almost straightforward it would be to do
a mapping from that instruction set to SSE
on Pentium III and Celeron II. If they
support this extension on TNT2, that’s
certainly where it’ll happen, but what if I
want to run this program on a GeForce2?
Will the run-time assembler be smart enough
to recognize the “standard” T&L “idiom” if
it’s at the end of a program and run-time
emit SSE code for what comes before it? It
would be a shame to lose the actual
capabilities of the hardware if you used this
extension.

Last, is there anything related to the naive
or straightforward implementation of this
extension that is under patent protection for
nVidia? If other vendors cannot implement
this same instruction set, I can see how it
will go nowhere in the end, which would be a
shame – graphics processing certainly needs
a lingua franca, like OpenGL has been for the
last 10 years or so.

mcraighead · December 6, 2000, 2:53am

You missed the ARL instruction and relative addressing, which allows you to pass in indices to pick which matrices to use.

There are IP issues associated with this extension, but I won’t say any more on that, since I have no desire to face the wrath of lawyers.

Matt

system · December 6, 2000, 7:57am

>You missed the ARL instruction and relative
>addressing, which allows you to pass in
>indices to pick which matrices to use.

I see. It’s more like my brain rejected the
idea of using a floating-point value as an
index. I have now properly spanked it into
submission. Of course, then the trade-off
is that each vertex can only have up to N
matrices affecting it, but I guess that’s OK.

>There are IP issues associated with this
>extension, but I won’t say any more on
>that, since I have no desire to face the
>wrath of lawyers.

That’s a shame, as I’m sure most hardware
vendors would rather not have their future
controlled by one of their competitors. A
programmable GPU like this DOES seem like the
right thing, but you’d probably have enough
trouble getting it to “stick” in the market
even if the baseline spec was free