PDA

View Full Version : Open source Tri Stripper (vs NvTriStrip)



GPSnoopy
11-10-2002, 07:18 AM
As I promised I would do in a previous post a few weeks ago, I just released the source of the triangle stripper I was working on.
http://users.pandora.be/tfautre/softdev/

You'll find on that website the source and a test program.
There is also some explanations plus a comparison against the famous NvTriStrip.

Well, tell me what you think about it. http://www.opengl.org/discussion_boards/ubb/smile.gif

zeckensack
11-10-2002, 06:12 PM
*ahem*

I like it. I wonder why nobody else does ...

Good work.

zed
11-10-2002, 06:47 PM
very nice good to see.
sidenote - i recently decided to just draw everything with plain triangles (instead of the combination of strips + tris that i was using previously)
-reason being simplicity + not *that* much of a difference between the two.

Mezz
11-11-2002, 03:41 AM
Well, it looks extremely good - I found it complex to read the first time because I just dived straight into it and my STL is a little rusty, but on the whole, it seems to be a damn fine effort.

-Mezz

Nicolas Lelong
11-12-2002, 06:47 AM
Great idea !

Your lib may save my life as I've just stumbled upon a model that makes NvTriStrips crash ! I'll try it with your lib http://www.opengl.org/discussion_boards/ubb/smile.gif

I was wondering if you support 3-manifold (or more) geometry ?

Thanks,
Nicolas

GPSnoopy
11-12-2002, 06:53 AM
To answer in the order:

zeckensack,
yep, I wonder why it doesn't get more replies, 'cause there are often requests for triangle strippers on these forums (which always lead to NvTriStrip).
Thanks for your support. http://www.opengl.org/discussion_boards/ubb/wink.gif


zed,
thanks for your reply. http://www.opengl.org/discussion_boards/ubb/smile.gif
As you can see it depends of the model; I mean, with Tri Stripper using the skull model I get a 3 times increase in performance, but using a starship model I get almost no improvement.
But even if you only use triangles, re-ordering these while give better performance because of the vertex cache.


Mezz,
I heavily use the STL now, but actually it's to make it more understandable. http://www.opengl.org/discussion_boards/ubb/wink.gif
You should have seen the very first version, using only pointers and arrays... brrrr... scary!
Thanks for your input. http://www.opengl.org/discussion_boards/ubb/smile.gif

GPSnoopy
11-12-2002, 07:05 AM
Nicolas,

glad to hear my lib is saving lifes. http://www.opengl.org/discussion_boards/ubb/wink.gif

I'm not sure what you mean by "3-manifold geometry"... (excuse my not-so-complete English vocabulary, especially in math)
But Tri Stripper is designed to be stable and robust (it can handle a lot of complex cases: for example, it will accept that a triangle can have more than 3 neighbour triangles), so I hope it'll work for you.

Anyway it's the first version, 1.0 Beta 1, so if you find a problem please report it. http://www.opengl.org/discussion_boards/ubb/smile.gif

Nicolas Lelong
11-12-2002, 08:49 AM
> it will accept that a triangle can have more than 3 neighbour triangles

That's what I meant by 3-manifold geometry http://www.opengl.org/discussion_boards/ubb/smile.gif great !

By the way, your lib does not crash as NvTriStrip does http://www.opengl.org/discussion_boards/ubb/smile.gif great job, really great http://www.opengl.org/discussion_boards/ubb/smile.gif

By the way, I was wondering if there was a way to only generate triangle lists with improved vertex locality (and not strips) - that's what I used from NvTriStrip ?

Thanks for the great work,
Nicolas

GPSnoopy
11-12-2002, 09:00 AM
By the way, I was wondering if there was a way to only generate triangle lists with improved vertex locality (and not strips) - that's what I used from NvTriStrip ?

Unfortunatly not, it currently outputs strips only.
Though a similar effect can be achieved by reconverting the strips back to triangles, which is fairly easy, while keeping the same order, but then I dunno how it compares in term of efficiency next to NvTriStrip in that case.

After all, it might just be what NvTriStrip is doing, but I haven't got the time to really look into NvTriStrip code (which is not really readable by the way http://www.opengl.org/discussion_boards/ubb/wink.gif).

Nice to hear it doesn't crash. http://www.opengl.org/discussion_boards/ubb/smile.gif

[This message has been edited by GPSnoopy (edited 11-12-2002).]

titan
11-26-2002, 05:08 PM
Looks like a nice lib.

We had a couple problems with the nvstrip lib (ask for all strips shorter than 2 triangles to be put in a seperate list and give it two triangles you get an assert).

As I understand it the nv2x cards have a vertex cache of 24, NV1x/Parhelia/Xabre400 have 24, ATI R200 has 10, and ATI R100 has a vertex cache of just 6. nvstrip won't optimize for the R100...

Looks great, we'll probably switch to it for our next project.

zed
11-26-2002, 06:41 PM
i thought the cache for gf1/2 is effectively 10 not 24 (whatever it is its less than a tnt http://www.opengl.org/discussion_boards/ubb/biggrin.gif)

titan
11-26-2002, 07:29 PM
That is a typo, I meant to say they had a cache size of 16. With the GF1/2 it is effectively only 10 though.

I'm not sure how big the effective cache of the Parhelia, etc are compared to their real cache size.

GPSnoopy
11-27-2002, 05:20 AM
AFAIK NvTriStrip can optimize for the R100, but it's a bit tricky.

If you look into its source code you can see that NvTriStrip computes as follow:

Efficient_Cache_Size = Given_Cache_Size - 6;

So when you give it the cache size of a GF2 it'll correctly get the efficient cache size (10 = 16 - 6).
But this formula only works with current NVIDIA GPUs and doesn't work for other GPUs such as the R100 or maybe even future NVIDIA GPUs.

If you want to use NvTriStrip for an R100 I think you should give it a cache size of 12 (12 - 6 = 6), otherwise if you give it 6 it won't work correctly. (6 - 6 = 0 *oops* http://www.opengl.org/discussion_boards/ubb/rolleyes.gif )

Tri Stripper directly asks for the efficient cache size instead.


[This message has been edited by GPSnoopy (edited 11-27-2002).]

ImpactDNI
01-12-2003, 03:15 AM
Hey, ive been looking for a good stripper... but i always have the same problem... Someone needs to explain things to me =P... For some reason, unless the code is Uber-simple, i cant look at it and see how to use it... Do you have any docs on how to use your library?

SirKnight
01-12-2003, 09:43 AM
Originally posted by ImpactDNI:
Hey, ive been looking for a good stripper... but i always have the same problem... Someone needs to explain things to me =P... For some reason, unless the code is Uber-simple, i cant look at it and see how to use it... Do you have any docs on how to use your library?


The stripper class isn't very complicated. It's very easy to see how to use it. Try here: http://users.pandora.be/tfautre/softdev/tristripper/tristripper_manual.htm

-SirKnight

Roderic (Ingenu)
01-12-2003, 09:58 AM
Indexed Triangles are really fast, I think that you should go to Strips ONLY if you have performances problems.

GPSnoopy
01-12-2003, 10:06 AM
Thanks SirKnight http://www.opengl.org/discussion_boards/ubb/wink.gif

However I should point out that the interface described there is not up-to-date with the latest version (1.00 Beta 5) but it's relatively the same.

There should be no problem if you know how to use std::vector (cf. C++ STL).

There is also a example program on the website that shows how to use it.




tri_stripper TriStripper(m_SubMeshes[0].Indices());
tri_stripper: http://www.opengl.org/discussion_boards/ubb/tongue.gifrimitives_vector PrimitivesVector;

TriStripper.SetMinStripSize(2);
TriStripper.SetCacheSize(16);
TriStripper.Strip(&PrimitivesVector);

/* ... */


Can it be any easier? http://www.opengl.org/discussion_boards/ubb/wink.gif


EDIT: LOL @ smileys in the code. I think that the CODE tags aren't bug free

[This message has been edited by GPSnoopy (edited 01-12-2003).]

V-man
01-12-2003, 02:08 PM
If vendors provided an extension for querying the optimal cache size to use, that would go great with libs like this.

I'm surprised it hasn't been done yet.

Won
01-13-2003, 11:20 AM
Looks very useful! In browsing the code...

The singular form for "indicies" is "index" http://www.opengl.org/discussion_boards/ubb/smile.gif

Did you know that STL already has heap algorithms? Also, have you ever seen Boost Graph Library? An STL-like implementation of graphs that is quite powerful.

Good job. I think I'm going to find a use for this.

-Won

GPSnoopy
01-13-2003, 12:45 PM
V-man, yep it would be useful. Knowing a little more about the cache mechanisms would be usefull too. (For the moment I assume that the cache is a simple FIFO structure, but it could be more complex than that in reality)


Won, thanks. http://www.opengl.org/discussion_boards/ubb/smile.gif

Sorry for that English mistake (and the others). My French confused me. (and it's still confusing me by the way http://www.opengl.org/discussion_boards/ubb/wink.gif)

Yes the STL has some heap algorithms but because of the particular heap structure I need I can't really use them efficiently.

I've originaly built the graph class for a university project, then I improved it for Tri Stripper.
It's only later that I discovered the Boost library. (quite a nice library by the way http://www.opengl.org/discussion_boards/ubb/smile.gif)

There are still things that could be improved in the code structure, such as the parts extending and building strips, and also parts relative to the "cache simulator".
However the current version seems to be working well, so I'm reluctant to modify it just to get a more aesthetic code.

HamsterofDeath
01-16-2003, 02:32 AM
i've written a bruteforce-stripper that needs about a minute for 1400 triangles...designed for pre-calculating static objects.

how to optimize it for a specific vertex cache ?

GPSnoopy
01-16-2003, 05:12 AM
Originally posted by HamsterofDeath:
i've written a bruteforce-stripper that needs about a minute for 1400 triangles...designed for pre-calculating static objects.

how to optimize it for a specific vertex cache ?

Well I dunno exactly how your stripper works so I can't say exactly what you have to do.

For the vertex cache part, Tri Stripper selects the triangles that were the neighbours to the last made strip.
Then it tries to build every possible strip from those triangles and keeps the one that has the most cache hits.

EG
01-16-2003, 05:16 AM
> how to optimize it for a specific vertex cache ?

By reordering triangles, so that vertices of triangle N are also vertices of triangle N-1, or N-2 etc. until N-x gets you out of the cache range.
Obviously, that won't be possible for all triangles, but if each new triangles introduces only 1 new vertice on average, your performance will be as good as a strip or fan.
Since you can introduce less than 1 new vertice per triangle in some instances, pure vertex cache optimization can theoretically end up faster than pure stripification.

If you're not afraid of Pascal, you can find a fast (greedy) algorithm to optimize for vertex caches in GLScene (http://glscene.org, function IncreaseCoherency of MeshUtils.pas, optimizes a 2500 tris mesh in less than 9 ms on an AXP 1800+).

HamsterofDeath
01-16-2003, 09:36 PM
i bet my algo gives a better result :-P, will take some minutes....

Assassin
01-19-2003, 12:53 PM
I am very interested in this triangle-stripper, since it uses a nicer algorithm than the NvTriStrip (the NV one searches the entire edge list each time it searches for a matching edge).

However, I think that you can achieve better cache coherency if you optimize only for lists, since you don't have to share an edge with the current triangle when selecting the next one. Also, strip restarts might end up "flushing" the cache, whereas a list can go back and visit older vertices before they drop out of cache, where a strip would have to restart to get near them - introducing a degenerate or a restart. You'd also not have to worry about the vertex ordering when constructing the lists, since the triangles would just be rearranged as a whole instead of inserted into a strip[ which is sensitive to clockwise/counterclockwise winding. If you run the direct3d optimized mesh sample, you can see that lists give a slight advantage over strips.

Also, instead of using a sorted array, could you not use a hash table to store and lookup matching edges, bringing that operation from O(n log n) to O(n)?

I'm very interested in integrating this code into my engine as a run-time step when loading large meshes, such as terrain or animated models. I hope you can take it that extra step and optimize for lists rather than just doing stripification and combining the strips in an end-to-end fashion to construct a list.

[This message has been edited by Assassin (edited 01-19-2003).]

GPSnoopy
01-19-2003, 02:50 PM
When optimizing the tri stripper for vertex cache I wondered if it wouldn't be better to make an optimized triangle list instead.

But tri lists don't really have that much advantages over triangle strips.
Actually from what I've heard tri strips are slightly faster in OpenGL, so... The thing is that with lists you still have to send all the indices, which isn't the case with strips.
Also it doesn't seem there is a vertex cache flush at a strip restart. If it was the case, NvTriStrip and Tri Stripper vertex cache optimizers wouldn't work so well.

I might extend the tri stripper in the future so that it can optimize lists too, but it's not a priority right now.

However I've been thinking how I could optimize the whole algorithm.
As you pointed out a hash table would be more suited for the first part of the algorithm.
I'm also looking how to replace the heap in the second part, to also bring that part from O(n.log(n)) to O(n).
There are a few details to be sorted out but I'm hoping to make the algorithm go from O(n(log(n) + c)) to O(n.c) in theory, and of course reflect this as a serious speed boost in reality. http://www.opengl.org/discussion_boards/ubb/smile.gif

Edit: There is a lot of things to do, and so little time to do it. I can't really tell when this new version will come out, but I'm working on it.

[This message has been edited by GPSnoopy (edited 01-19-2003).]

Assassin
01-20-2003, 10:51 AM
Well, the reason I say that the strip restarts will "flush" the cache is because it seems you're selecting a new strip start triangle by criteria unrelated to the cache entries. You seem to be selecting the "loneliest" triangle, without any checks to find a triangle that still has vertices in the cache when doing a restart/jump. I couldn't quite decipher what's happening in the code, but it seems that when you run out of triangles for the current strip, the selection of a new triangle to start a strip with isn't including checks on cache coherency.

Also, if you make effective use of vertex shaders on new hardware, you only need to upload most of the data once, mitigating the benefit of the strips' smaller index list, and making the lists' (potentially) higher performance more relevant.

And I just thought I'd mention that the "perfect" vertex cache miss rate is 0.5 - so for every 2 triangles you introduce 1 new vertex. This isn't possible with a finite cache, but I think it's possible to get below 0.7 with a good algorithm.

GPSnoopy
01-20-2003, 12:25 PM
You're right, there is a case where I don't check cache to select the start triangle.

Normally I use the neighbour triangles of the latest strip to find the start triangle for the next strip, this way I know I have good chances to make "cache happy" strips.
But it might happen that all the neighbours were already taken. When that happens I pick up a start triangle in the heap (aka the loneliest triangle) without checking the cache.

Hopefully this case doesn't happen often, and the triangle on the top of the heap has chances to be cache friendly.

Thanks for pointing out that problem.
The new structure I'm working on that could replace the current triangle heap will in theory offer better performance and be a little more cache friendly.


PS: An annoying thing: std::hash_multimap is exactly what I need to make the hash table. Unfortunatly it won't be standardized before the next C++ standard in 2004 or 2005, which means that currently each compiler has its own (unportable) version. Damn...

[This message has been edited by GPSnoopy (edited 01-20-2003).]