PDA

View Full Version : Software weighting as fast as Hardware



WhatEver
10-26-2002, 07:46 PM
Now that I understand matrices so well I thought I'd try the weighting extension out again. I figured it out, got it to work, and it's actually slightly slower than my software method.

Why? It's suppose to be faster. If not now, will it in the future? I'm running a Geforce2 so maybe the newer cards will?

I could rig my program up to switch between software and hardware if enough of you are interested.

Let me know.

Aaron Taylor

WhatEver
10-26-2002, 07:59 PM
New benchmark:

Software = 336 fps
Hardware = 319 fps

I'm disappointed http://www.opengl.org/discussion_boards/ubb/frown.gif.

V-man
10-26-2002, 08:42 PM
Let's see it, or at least a description of what your doing.

V-man

Bob
10-27-2002, 01:58 AM
It looks to me you don't benchmark a real life situation, unless you target your application at about 300 fps. In a real life situation, you may have lots of other things going on, and you will benefit from not doing it in software, because the GPU will offload the CPU (assuming the vertex weighting is actually performed in hardware http://www.opengl.org/discussion_boards/ubb/tongue.gif ), leaving more processor time for other things.

[This message has been edited by Bob (edited 10-27-2002).]

WhatEver
10-27-2002, 05:07 AM
Bob, I think I understand what you mean. The only thing the CPU is doing in my program is the deformation. So what you're saying is if I had a lot of physics going on the hardware weighting would alleviate any burden off the CPU which would cause the app to run faster.

Even so, the fact still does remain that the CPU is doing it faster than the GPU. Go figure.

Here's my code V-man:



if(s3dVertexWeightPointerEXT && s3dCaps[S3D_WEIGHTING] == true)
{
//
// deform mesh using hardware
//

glEnable(GL_VERTEX_WEIGHTING_EXT);

s3dVertexWeightPointerEXT(1, GL_FLOAT, 0, Objects[i].Weights);
glEnableClientState(GL_VERTEX_WEIGHT_ARRAY_EXT);

//push out to Hook
glMultMatrixf(skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Hook);

//reorientate
glMultMatrixf(Objects[i].AxisInverse);

glGetFloatv(GL_MODELVIEW_MATRIX, DeformMat);

glMatrixMode(GL_MODELVIEW1_EXT);
glLoadMatrixf(DeformMat);

//push out to Hook
glMultMatrixf(skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Hook);

//custom transform
glMultMatrixf(skeleton->Bones[Objects[i].BoneAssignment].Custom);

//reorientate
glMultMatrixf(Objects[i].AxisInverse);

Objects[i].Draw(Objects[i].Meshes->Vertices, Objects[i].Meshes->Normals, s3dTextureHub.TextureObjects, s3dCaps);


glMatrixMode(GL_MODELVIEW0_EXT);
}
else if(s3dCaps[S3D_WEIGHTING] == true)
{

//
// deform mesh using software
//

//Create an inverse of the Hook
s3dMatCopy16f(HookInverse, skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Hook);
s3dMatInvert16f(HookInverse);

//bring the RelativeMat home relative to the Hook
s3dMatCopy16f(RelativeMat, skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Hook);
s3dMatInvert16f(RelativeMat);

s3dMatMultiply16x16f(RelativeMat, skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Origin);

//start the deformation with the bone frame matrix
s3dMatCopy16f(DeformMat, skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Hook);

//transform by the RelativeMat
s3dMatMultiply16x16f(DeformMat, RelativeMat);

//apply custom transformation
s3dMatMultiply16x16f(DeformMat, skeleton->Bones[Objects[i].BoneAssignment].Custom);

//bring the matrix back home to it's new location
s3dMatMultiply16x16f(DeformMat, HookInverse);

//deform mesh
s3dDeformMesh(VertexBuffer, NormalBuffer, &Objects[i], DeformMat);

//
// reorientate object into Hook 3 space
//

//push out to Hook
glMultMatrixf(skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Hook);

//reorientate
glMultMatrixf(Objects[i].AxisInverse);

Objects[i].Draw(VertexBuffer, NormalBuffer, s3dTextureHub.TextureObjects, s3dCaps);
}

[This message has been edited by WhatEver (edited 10-27-2002).]

WhatEver
10-27-2002, 05:08 AM
Heh, by the time it gets to that code there was a lot of tabs...I'll fix it...

WhatEver
10-27-2002, 06:19 AM
Ok, it makes sense to me now why the software is faster...'cause my CPU is faster than my GPU.

jwatte
10-27-2002, 06:45 AM
The vertex weighting extension is basically useless. On the GF2, it seems that your vertex processing throughput plummets in half (which suggests some implementation details of the hardware). Also, each triangle can only be touched by two matrices on a two-matrix card like that, which means you can't realisticly do anything "soft" like a human. Elbows, armpits, neck, etc will all look really poor.

If you want to do matrix palette skinning in hardware using OpenGL, you should look to GL_ARB_vertex_program, or possibly the vendor-specific extensions found previous to that (NV_vertex_program, EXT_vertex_shader or whatnot).

For our product (which targets GF2) we coded up skinning in SSE and optimized it as much as we could (alignment, interleave, writing to AGP, etc) and it runs decently fast. About 40,000 tris per frame soft-skinned at 30 fps takes about 10% of a Pentium3/800, if I recall the numbers right. This is split in about 100 separate chunks (each with its own material).

zed
10-27-2002, 10:08 AM
im sure u all know this vertex programs are done in software on a gf2 (hardware gf3+),
the weighting extension is like a specialised vertex program.
also ive had major problems with the vertex weighting extension in the past on my tnt2 (buggy drivers).
so personally i would ditch weighting + would go with a software version + only use vertex_program if supported

mcraighead
10-28-2002, 08:16 AM
I agree with jwatte here -- this was a feature that didn't quite pan out. Furthermore, since we don't support ARB_vertex_blend, and ATI doesn't support EXT_vertex_weighting, it's hard to use it portably. (From our point of view, it's not worth the effort to implement ARB_vertex_blend -- we see it as a dead end obsoleted by vertex programs.)

We're thinking of phasing out support for this extension in a future driver -- preferably for all chips, not just the new ones. Does anyone here use the extension in any important application?

- Matt

dorbie
10-28-2002, 03:14 PM
Your problem is simple, your CPU is too fast. Purchase a slower CPU or underclock your existing CPU until you get the desired results.

Another way of looking at this is that your CPU is available for other tasks while the GPU performs the transform.

zed
10-28-2002, 05:51 PM
actually a slower cpu wont make a difference cause i have the feeling that in his case (gf2) BOTH are done on the cpu.

AdrianD
10-29-2002, 06:18 AM
Originally posted by mcraighead:
[...]
We're thinking of phasing out support for this extension in a future driver -- preferably for all chips, not just the new ones. Does anyone here use the extension in any important application?

- Matt


just do it.i never used this extension in my applcations and i think there is no serious application out there using this extension. and if there is one, i am sure it has also a cpu-only version for 3d-cards which does not support this extension... http://www.opengl.org/discussion_boards/ubb/wink.gif

ToolTech
10-29-2002, 11:45 AM
YES. We use it in military applications for skinning character bone systems. From my point of view as well as my experience with high end gfx HW the vertex weight extension is vital !!!

That people don't like it is in my oppinion a bad understanding as well as bad usage of the extension. Peoples talk about to few matrixes and control in the EXT implementation is wrong. You can build a very good character animation with only one weight matrix + a good bone system.

People that say it goes as fast in CPU is wrong. I bet they don't calculate the normals. If we use the software version we need to recalc the length of each transformed normal to be able to use it in the weight formula.

Of course it can be implemented in vertex programs but then you will break the "standard" path without VP software.

THE BEST solution would be for ATI to implement the EXT version on both MAC and PC and then you would have the platform independancy.

Matt. Please don't take it away. Then we must change HW !!

ToolTech
10-29-2002, 12:03 PM
Another issue is of course the simplicity of OpenGL. To use the EXT version you need only a few lines of code. To use the VP you need a lot more code.

mcraighead
10-29-2002, 12:54 PM
Our experience is that a good skinning system will run faster in software than in hardware using this extension. Trying to use this extension will lead to an unacceptable number of matrix state changes, which will in turn cripple your T&L performance -- your batches will be too small.

I'm not saying that this extension *can't* be useful; simply that we don't find it interesting ourselves, and in fact we try to discourage developers from using it.

- Matt

WhatEver
10-29-2002, 01:01 PM
My knowlege of skinning is pretty limited in that I don't know how they do some of it, but the basic technique of interpolating between vertex at rest and rotated vertex looks pretty realistic.

Here's the program I spoke of: http://www.spider3d.com/dl/weighting.zip

If it doesn't run create a shortcut for the exe...dunno why but some people have to do that. Run the exe then pull the console down and scroll up using the page up key. If you see a line that says "GL_EXT_vertex_weighting", then the hardware method was found. It's the same basic technique that I do with software.

jwatte
10-29-2002, 04:56 PM
ToolTech,

The problem with the vertex weighting extension with only two bones is that that means that you only get two bones for the entire TRIANGLE. If I had two bones per vert, I'd be reasonably happy. In fact, that's what our software transform system uses.

And, yes, the software transform system is faster than the hardware implementation on the GF2, while giving the artists SUBSTANTIALLY better control over vertex weighting, because they get per-vertex, not per-triangle, bone control. Oh, and we do the normals too, although we let the card normalize with GL_NORMALIZE.

We don't do scale/stretch in our animations, which means that we can transform vertices and normals with the same matrices (using a w of 1 for vertices and 0 for normals).

Last, we can overlap our skinning with OpenGL drawing thanks to the various asynchronous data submission extensions available.

WhatEver
10-29-2002, 05:38 PM
If I had two bones per vert, I'd be reasonably happy.

Isn't this the same thing, jwatte? This is the GL_EXT_vertex_weighting extension Overview.


The intent of this extension is to provide a means for blending
geometry based on two slightly differing modelview matrices.
The blending is based on a vertex weighting that can change on a
per-vertex basis. This provides a primitive form of skinning.

A second modelview matrix transform is introduced. When vertex
weighting is enabled, the incoming vertex object coordinates are
transformed by both the primary and secondary modelview matrices;
likewise, the incoming normal coordinates are transformed by the
inverses of both the primary and secondary modelview matrices.
The resulting two position coordinates and two normal coordinates
are blended based on the per-vertex vertex weight and then combined
by addition. The transformed, weighted, and combined vertex position
and normal are then used by OpenGL as the eye-space position and
normal for lighting, texture coordinate, generation, clipping,
and further vertex transformation.

ToolTech
10-29-2002, 11:17 PM
Measure the EXT extension versus software.

In a typical situation I get 1000 FPS using the EXT version on a GForce 4 and 750 using SW version.

In my calcs I have a model matrix M and a Weight matrix W so I get the weight

Vout = M * w * V0 + (1-w) * W * V0

where V0 is one of my vertices and w is the weight.

To use a single transform system I need to create the transform matrix P = Inv(M)*W

so I can transform the vertice by

Vsw = w * V0 + (1-w) * P * V0

where Vsw is the transformed software vertice

this way the Vout = M * Vsw is equal to the first equation.

This can be done fast with SIMD ops but you still need to create an area to hold the transformed vertices. You still need to keep this area on a per rendering thread basis and if you share geometry with different weight matrixes you need to recalc the Vsw several times per frame.

Normals. A bit trickier.

Basically i need to exchange the P with S = transpose(inv(P)). This matrix has a det = 1 but still the transformed value S * N0 is not unit len wich means that
Nsw=w*N0 + (1-w) * unit((S*No))

This can not be accomplished with GL_NORMALIZE

.... so you matt say that all extensions that can be replaced by vertex programs and fragment programs shall be obsolete or ?

If you take the EXT weight version out you should also take away all other extensions that can be replaced by Fragment programs and vertex programs to keep OpenGL clean... and then you only have some data transfer extensions + fragment + vertex progs left...

davepermen
10-29-2002, 11:38 PM
Originally posted by ToolTech:
.... so you matt say that all extensions that can be replaced by vertex programs and fragment programs shall be obsolete or ?

If you take the EXT weight version out you should also take away all other extensions that can be replaced by Fragment programs and vertex programs to keep OpenGL clean... and then you only have some data transfer extensions + fragment + vertex progs left...


wich will be the future, yes.. if you take a look at the extensionlists of the nvidia cards, you find tons of extensions wich are simply obsolente, stupid, useless, or bether combined into one extension..

but on the other hand, backward compatibility has to stay there, so the old extensions should stay supported..

but how about moving your project to GL_ARB_vertex_program? first just write a wrapper for the original vertex weighting... then think of optimizing even.. it help's to drop the old stuff.. i know.. why changing as its working..? dunno http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Julien Cayzac
10-30-2002, 02:21 AM
Originally posted by davepermen:
but how about moving your project to GL_ARB_vertex_program? first just write a wrapper for the original vertex weighting... then think of optimizing even.. it help's to drop the old stuff.. i know.. why changing as its working..? dunno http://www.opengl.org/discussion_boards/ubb/biggrin.gif

But ARB_vertex_program ain't supported on all platforms, even for "recent" cards. It's sad the GeForce4 Ti I got 4 months ago is now obsolete because of the lack of up to date drivers.

BTW, why is there such a gap between NVidia's Windows drivers and Linux ones when changes are not platform dependant ? (and when can we expect NV30 emulation to be part of the linux driver ?)

Julien

AdrianD
10-30-2002, 05:05 AM
Originally posted by ToolTech:
Measure the EXT extension versus software.

In a typical situation I get 1000 FPS using the EXT version on a GForce 4 and 750 using SW version.
[...]


1000 FPS isn't a real-life situation.
if you want to measure the real difference, you have to test it with some background geometry, or draw your skinned character more than once in a frame...

and what about multipass rendering ???
what if you have do draw your character more
than once in a frame (ie. once per lightsource...sounds familiar ? http://www.opengl.org/discussion_boards/ubb/wink.gif)

another scenario:
if you have a scene with very much characters, you can save processing time when you compute the skinning of distant characters not in every frame (ie. every 5.th frame). with HW you MUST skin your character every time it's rendered (even with VP's)

in both cases it turns out, to be much more practical to do the skinning in SW than in HW...

cass
10-30-2002, 05:23 AM
The reason for potentially dropping support for EXT_vertex_weighting is that there is a "not insignificant" burden in driver complexity associated with maintaining functionality that we see as 1) usually not a performance win, 2) almost completely unused, and 3) essentially an extension to the dead-end fixed-function world.

Had this extension gained widespread adoption, the story would be different.

If we were still living in a fixed-function world, the story might be different.

Thanks -
Cass

Mezz
10-30-2002, 10:22 AM
What about the support for the ARB vertex weighting and matrix palette skinning extensions?

I don't see them in my extensions string on my GF4.

-Mezz

Asgard
10-30-2002, 11:35 AM
Originally posted by Mezz:
What about the support for the ARB vertex weighting and matrix palette skinning extensions?

NVIDIA has never supported these in any driver, if I'm not mistaken.

cass
10-30-2002, 12:48 PM
Right - we never supported those extensions.

Mezz
10-30-2002, 12:52 PM
Yeah I figured they weren't supported, I just wondered why. I suppose you can do everything in the vertex programs can't you?

-Mezz

Asgard
10-30-2002, 12:54 PM
Originally posted by Mezz:
I suppose you can do everything in the vertex programs can't you?

Yes, you can. Therefore these extensions are obsolete IMHO (just like many other fixed-function extensions). I'm all for cleaning up the extension mess.

Mezz
10-30-2002, 01:56 PM
I'm all for cleaning up the extension mess, but then there is the issue (as has been previously mentioned) that you need to keep legacy extensions used by applications.
But yeah, true - there isn't much life in the FF pipeline any more.

-Mezz

Gorg
10-30-2002, 02:07 PM
Originally posted by Mezz:
I'm all for cleaning up the extension mess, but then there is the issue (as has been previously mentioned) that you need to keep legacy extensions used by applications.
But yeah, true - there isn't much life in the FF pipeline any more.

-Mezz

I haven't look deeply into that, but it will probably very easy to implement most(all?) OGL1.X extensions using ogl2.0. So I don't think backward compatibility is a problem.

WhatEver
10-30-2002, 02:22 PM
How about the multitexture extension? I suppose it will be moved into the SDK...

I wouldn't mind seeing extensions go. It seems sort of messy anyway. vp are definatly the way to go.

Asgard
10-30-2002, 02:39 PM
Originally posted by WhatEver:
How about the multitexture extension? I suppose it will be moved into the SDK...

Which extension do you mean? ARB_multitexture has been part of core OpenGL since 1.2.1.

WhatEver
10-30-2002, 03:09 PM
http://www.opengl.org/discussion_boards/ubb/eek.gif

I wasn't aware of that http://www.opengl.org/discussion_boards/ubb/redface.gif.

Looks like they're all one step ahead of me http://www.opengl.org/discussion_boards/ubb/smile.gif.

ToolTech
10-31-2002, 04:18 AM
Ok. Some questions for you VP guys...

1. I need to use clip planes with my mirrors. How do I use vertex weights in VP combined with clip planes..

2. I want to use positional lights, fog, texgen etc. all those stuff in the FF pipeline + weights. What kind of VP do I need to generate the equal effects.

3. is there a way to just replace the vertex.position but still use the FF color, fog etc..

cass
10-31-2002, 08:17 AM
It sounds like you should do sw skinning (in object space) and use the fixed function for T&L.

ToolTech
10-31-2002, 09:42 AM
Cass. Based on you comment I can only draw the conclusion that the removal of EXT weight extension is not mature. You can not solve the above requirement with VP (yet).

At least I know 1. doesn't work. Perhaps someone has solved 2. but explain this VP to a beginner and then explain why you should remove the EXT weight extension ???

Korval
10-31-2002, 11:21 AM
1: I didn't think clip planes were a per-vertex thing. I thought that was something that happened at the pixel level.

2: Then write them in a vertex program. It turns off all per-vertex fixed-functions, so you will have to write it yourself.

3: No.


but explain this VP to a beginner and then explain why you should remove the EXT weight extension ???

Because EXT_vertex_weight is almost utterly useless, perhaps? It can't be used for any real skinning because you can only apply 2 matrices to each triangle, not vertex. Because it lacks the palette of the vertex_blend one, it requires lots of matrix state changes, thus, as Cass pointed out, slowing things down more than a software solution.

You're probably going to have to learn vertex programs sooner or later, so you may as well start now.

Coop
10-31-2002, 02:10 PM
You can quite easily emulate user clip planes with a sheared projection matrix. You need to shear it so that the near clip plane become your clip plane. Of course this works fine with vertex programs.

Kuba

V-man
10-31-2002, 04:45 PM
Originally posted by cass:

The reason for potentially dropping support for EXT_vertex_weighting is that there is a "not insignificant" burden in driver
<...>
Cass

Isn't it a better to stay backward compatible, even if it's rarely used. If you start removing extensions here and there, a whole lot of people will get upset.

At least until 2.0 drivers come up.

PS: we are approaching 2003. Any idea when 2.0 drivers will be released. Been a hell of a long time since it's anouncement you know.

V-man

Humus
10-31-2002, 05:11 PM
Well, I support the idea of dropping the EXT_vertex_weighting. If it takes quite a bit of implementation burden to support it, it will not be worth it as only a few applications use it, and they most likely have a fallback path anyway, unless we're talking about small techdemos, but those aren't particularily important to continue supporting. I wouldn't cry if ATi dropped the GL_ATI_envmap_bumpmap extension even though that will break some of my older demos. They could drop the GL_ATI_vertex_streams too, and GL_ARB_vertex_blend while we're at it.
I don't know how much it affects, but a cleaner driver will both run faster and be less buggy. Better drop unneccesary extensions sooner than later.

jwatte
11-01-2002, 06:37 PM
WhatEver,

No, that is NOT the same thing as two (arbitrary) bones per vertex, because the vertex weighting extension only allows me to establish two bones per PRIMITIVE, which means that no single triangle can be affected by more than two bones (although the weighting to each of those bones can change per vertex). At least on the nVIDIA implementation -- there may be other implementations with more bones per primitive, but it's still a dead end.

We do software skinning using optimized assembly code, and submit "fully posed" object space vertex arrays to GL. Overlap with actual rendering is good because we submit as soon as we're done skinning each mesh.

WhatEver
11-01-2002, 07:11 PM
jwatte, I was thinking about this a lot the other day and I'm starting to understand what you meant. I'm just amazed at what I can start understanding when I put my mind to it.

This whole forum is one big think tank http://www.opengl.org/discussion_boards/ubb/smile.gif.

FoZi
11-03-2002, 05:27 PM
I think AdrianD raised an interesting question:

and what about multipass rendering ???
what if you have do draw your character more
than once in a frame (ie. once per lightsource...sounds familiar ? )
We do have to run the VP for each pass, right? If so, what solutions do we have to avoid this? It feels like we are really wasting a lot of gpu/vpu power here, and multipass is inevitable in our case.

jwatte
11-03-2002, 06:08 PM
I doubt that you will be vertex program throughput limited. If you really are, then it might make sense to do skinning on the CPU while the card is rendering, and then re-use the output of that for each pass.

I recently went to an ATI event where they claimed that, on the Radeon 9700, most games will be CPU limited. Dollars to donuts I can make it go fill limited without even trying :-)

Actually, to be more accurate, I'll go fragment program limited, which is subtly different from fill limited. That might be good news, too, because it's easier to raise the speed of a core than the speed of a memory interface, and we WILL see this level of technology built into a north bridge and sharing a single (or maybe double) DDR memory channel with the CPU.