PDA

View Full Version : GL_TRIANGLES vs GL_TRIANGLE_FAN/STRIP



Wasabi
01-25-2011, 09:44 AM
Using Vertex Index Objects, is there an efficiency difference between glDrawArrayElements(GL_TRIANGLES,...) and glDrawArrayElements(GL_TRIANGLE_FAN/STRIP,...)?

I know that in the old OpenGL (glBegin()/glEnd()), there was a big efficiency issue since, using TRIANGLES, the user'd request that OpenGL repeat a series of triangles in a mesh. However, in version 3.3, since you throw all the indexes at once, does OpenGL do some optimizations for you, removing unnecessary repetitions? I'm guessing not, but hope dies last.

If OpenGL doesn't optimize and there is a significant difference, then here's the practical problem. My program needs to display a triangle mesh that's spit out by a third-party program. The output file is nice and simple, basically in the format:

VRTX (vrtx-id) (x-coord) (y-coord) (z-coord)
VRTX (vrtx-id) (x-coord) (y-coord) (z-coord)
VRTX (vrtx-id) (x-coord) (y-coord) (z-coord)
...
TRGL (vrtx-id) (vrtx-id) (vrtx-id)
TRGL (vrtx-id) (vrtx-id) (vrtx-id)
TRGL (vrtx-id) (vrtx-id) (vrtx-id)
...
Where everything in parentheses is a place-holder for values. So, basically, it declares all the vertexes and their coordinates and then lists the different triangles to be made. This makes it quite perfect for use with OpenGL, but there's just one problem: the triangles are given in a seemingly random order, so there's no obvious way to implement it using either TRIANGLE_FAN or _STRIP. Is there some common or clever algorithm to rearrange these so that either FAN or STRIP can be used instead of TRIANGLES?

Alfonse Reinheart
01-25-2011, 10:39 AM
Using Vertex Index Objects, is there an efficiency difference between glDrawArrayElements(GL_TRIANGLES,...) and glDrawArrayElements(GL_TRIANGLE_FAN/STRIP,...)?

Um, yes. The full answer is much more complicated.

Oh, and there's no "glDrawArrayElements".

On most meshes, you will have more index data when you use GL_TRIANGLES than with GL_TRIANGLE_STRIPs (fans are almost completely useless. There are a few corner cases where they're useful, but outside of the obvious, don't bother). Naturally, this takes up more memory and therefore more bandwidth to read.

This is the only concern you have if you aren't interested in doing any processing of your data. If all you want to do is spit the triangles out exactly as they were given to you, then this is your only concern.

However, if you're interested in maximum performance, then you need to process your data. You will need to implement an algorithm that rearranges the order of triangles so that it improves vertex processing time via more efficient usage of the post-T&L cache.

There are algorithms to do so for triangle lists and for triangle strips. The stripping algorithms are not as efficient at the task however, since they still need to form effective strips. They will still be smaller in memory though, which for exceedingly large meshes can be important.

mhagain
01-25-2011, 02:45 PM
Your hardware's vertex cache can only work if it's given indexes to work from. Indexes may consume more bandwidth, but indexed triangles can give much more efficient triangle reduction in your mesh. Modern hardware is more likely to be better optimised for indexed triangles; in particular because they're what modern games use. One feeds off the other.

Run Doom 3 under GLIntercept some time and have a look at the draw calls it uses; you won't find one single strip or fan in the entire scene. Everything goes through "glDrawElements (GL_TRIANGLES, ...".

Wasabi
01-25-2011, 03:04 PM
Your hardware's vertex cache can only work if it's given indexes to work from. Indexes may consume more bandwidth, but indexed triangles can give much more efficient triangle reduction in your mesh.

Oh, I'd certainly use indexes. My question is merely if there is a considerable difference between glDrawElements(GL_TRIANGLES,...) and glDrawElements(GL_TRIANGLE_STRIP,...), but I think that has been appropriately answered.

And I have no idea why I wrote it as glDrawArrayElements before...

mhagain
01-25-2011, 03:16 PM
You can still use glDrawElements (GL_TRIANGLES,...) but order your vertexes as if they were a strip. There should be no difference at all unless the hardware has some really exotic components especially optimised for strips.

ugluk
01-25-2011, 04:25 PM
mhagain, you mean GL_TRIANGLE_STRIP? Yeah, I agree.

mhagain
01-25-2011, 04:47 PM
mhagain, you mean GL_TRIANGLE_STRIP? Yeah, I agree.
No, I don't mean it. Wherever did you get that idea from?

OK, by all means use GL_TRIANGLE_STRIP if you want, and have fun fooling around with degenerate triangles and/or primitive restart. And get a best-case vertex reuse of 1 per tri (instead of a common case of 0.65 and best of 0.5) that limits the amount of geometry you can use. And have nothing more than a heavy investment in an overly complex and fragile renderer to show at the end of it. Have fun handling complex models that need to be represented by a combination of strips and fans too.

Otherwise order your vertexes and indexes in the most appropriate layout and just use GL_TRIANGLES with no messing.

Wasabi
01-25-2011, 05:15 PM
Yes, well the problem is exactly that I don't know how to order the vertexes. The triangles are given with out-of-order indexes (1-2-3 20-40-35 2-67-9), so part of my question is exactly if there is a known algorithm to sort these triangles into handy ordered lists.

mhagain
01-25-2011, 05:46 PM
Sure; have a look here: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=257252

A Big Dirty Secret is that if you're coding on Windows you can actually use D3D to do this too, even in an OpenGL program; the D3D optimization functions don't require a device to be created. ;)

It's worth just using the data as-is and seeing if performance is acceptable enough without optimization though. You might find that you run quite well without needing to commit to ordering the triangles at all.

Alfonse Reinheart
01-25-2011, 06:11 PM
OK, by all means use GL_TRIANGLE_STRIP if you want, and have fun fooling around with degenerate triangles and/or primitive restart. And get a best-case vertex reuse of 1 per tri (instead of a common case of 0.65 and best of 0.5) that limits the amount of geometry you can use. And have nothing more than a heavy investment in an overly complex and fragile renderer to show at the end of it.

... What?

All of the stuff we're discussing is on the pre-processing side. It has nothing to do with making a "renderer", so strips vs. list would introduce neither complexity nor fragilty into the renderer. Nor would either side limit "the amount of geometry you can use."

Vertex reuse is not the end-all-be-all of vertex processing performance.

ugluk
01-25-2011, 07:35 PM
I rarely agree with Alfonse, but I do now. At runtime you aren't going to do anything with triangle lists and triangle strips other than blasting them full speed at the GPU. Well, I don't know what one could do with the models at runtime, other than blast them to the GPU.

What I meant to say with GL_TRIANGLE_STRIP was, that it is possible to reorder triangles into an optimal (cache-reusing order) and then stripify them. With triangle lists or triangle strips the vertices are sent to the GPU for processing; it can cache tristrip vertices equally well as triangle list vertices. It does take more effort to create the strips though, I agree.

Seems too bothersome to do, but in theory you could get some minimal gains by doing it (less indices to store and less indices to read for the GPU).

aqnuep
01-26-2011, 01:57 AM
What I meant to say with GL_TRIANGLE_STRIP was, that it is possible to reorder triangles into an optimal (cache-reusing order) and then stripify them.
Can you tell me where triangle strips can reuse vertices? Yes, there are some rare situations but usually it won't reuse any vertices.

Also, if you think about putting all your triangle indices as if you would render triangle strips then you would actually get the same performance as vertex cache miss would be exactly 1 like in case of strips (as mhagain mentioned), except a little bit of overhead due to usually having more indices (however, this may not even be true as you need degenerate triangles or primitive restart indices, thus this advantage is maybe lost as well). But the key point is that triangle lists allow much better reordering than strips allow thus you can reduce the vertex cache miss ratio usually downto about 0.65 (again, as mhagain already mentioned).

Finally, you mentioned to stripify the mesh after reordering the triangles into an optimal (cache-reusing order). Well, stripifying will override the whole order as strips also need special triangle order so your argument is completely invalid.

I don't say never use triangle strips, but usually you'll get better performance with triangle lists and it is much less burden from asset preparation point of view. There are, of course, some situations when triangle strips are easy, convenient and more efficient (e.g. terrain rendering), but using strips as a general method for meshes is a definitely wrong direction.

Alfonse Reinheart
01-26-2011, 09:03 AM
Can you tell me where triangle strips can reuse vertices? Yes, there are some rare situations but usually it won't reuse any vertices.

If by "reuse vertices" you mean "hit the post-T&L cache," strips don't need to explicitly hit the post-T&L cache as often because they're strips. By definition, the second triangle will use two vertices that are already in the post-T&L cache. As will the third. As will the fourth. Etc. Therefore by definition, they will "reuse" plenty of vertices.


Finally, you mentioned to stripify the mesh after reordering the triangles into an optimal (cache-reusing order). Well, stripifying will override the whole order as strips also need special triangle order so your argument is completely invalid.

Striping will not "override the whole order." It will override it partially, but that's a far cry from saying that it will have no or little cache hits. It may not be rendering in the one most optimal order, but it is still getting plenty of post-T&L cache use.

And again, there's this whole ignoring the fact that strips use no less than 50% fewer indices than lists. That's really important if you've got a lot of indices.

ugluk
01-26-2011, 09:24 AM
aqnuep, you can stripify an arbitrary sequence of triangles, even if they don't share any vertices, true you need to use tricks to do it and it may not be worth the effort, but it is possible.

So what I had in mind was stripifying the triangle sequence you would usually draw as GL_TRIANGLES. The vertices would be referenced exactly the same with GL_TRIANGLE_STRIP as with GL_TRIANGLES so there would be exactly the same number of cache hits/misses, but the storage requirements/memory accesses would not be as high.

I am not championing the use of triangle strips, but they are not obsolete, just less convenient to use.

aqnuep
01-26-2011, 09:31 AM
If by "reuse vertices" you mean "hit the post-T&L cache," strips don't need to explicitly hit the post-T&L cache as often because they're strips. By definition, the second triangle will use two vertices that are already in the post-T&L cache. As will the third. As will the fourth. Etc. Therefore by definition, they will "reuse" plenty of vertices.
Yes, exactly you've explained. It has a constant cache miss ratio of 1 (in practice more, as there are the degenerate triangles or primitive restart indices).


Striping will not "override the whole order." It will override it partially, but that's a far cry from saying that it will have no or little cache hits. It may not be rendering in the one most optimal order, but it is still getting plenty of post-T&L cache use.
Thinking about how small is the post-T&L cache, it is very optimistic to say that it will get "plenty of post-T&L cache use".

Alfonse Reinheart
01-26-2011, 10:24 AM
Yes, exactly you've explained. It has a constant cache miss ratio of 1 (in practice more, as there are the degenerate triangles or primitive restart indices).

I don't understand what you're talking about. Neither degenerate triangles nor primitive restart indices have anything to do with cache missing. Neither of these flush the cache or clear it, so whether the next vertex hits or misses the cache is a function of whether that index is still in the cache or not. And only of that.

How do you compute this "miss ratio of 1" exactly? Is that number of indices vs. number of cache misses? If so, I'm not sure how that's relevant, since strips also have fewer indices than lists. Of course lists have better a better index-per-miss ratio: they start off with 2-3x as many indices.

mhagain
01-26-2011, 02:15 PM
And again, there's this whole ignoring the fact that strips use no less than 50% fewer indices than lists. That's really important if you've got a lot of indices.
But vertexes use more bandwidth than indexes. This is true even of VBOs, you;ve still got bandwidth considerations for moving data from GPU memory to the vertex processor. Even more true if you ever have to deal with dynamic geometry. But it is use-case depedent, OK.

Another thing I forgot to mention is batching, of course.


I am not championing the use of triangle strips, but they are not obsolete, just less convenient to use.
The last time triangle strips were not obselete was 1998. Unless you're coding to very specific hardware that prefers triangle strips there is nothing to gain and everything to lose by preferring them. Otherwise I begin to suspect that you have a heavy investment in code that uses strips and you're unwilling to let it go.

In 1999 Quake 3 Arena was released. John Carmack made a document available for the benefit of driver developers that described the rendering architecture, which is well worth a read; it's still available here if you're interested: http://www.gamers.org/dEngine/quake3/johnc_glopt.html

In summary, one of the key points is that 99% of draw calls went through a single function, and THIS is the path for driver developers to optimise. Guess what that path was? "glDrawElements (GL_TRIANGLES, ..."

Since then strips have been effectively dead.

Alfonse Reinheart
01-26-2011, 03:36 PM
But vertexes use more bandwidth than indexes.

That doesn't mean indicies are free either. Index reads are typically uncached memory reads.


This is true even of VBOs, you;ve still got bandwidth considerations for moving data from GPU memory to the vertex processor.

Which is why attribute inputs are cached memory reads. A cache miss on the post-T&L cache doesn't mean that you'll miss on the pre-T&L cache.

Also, you haven't explained how strips have lots more post-T&L cache misses than lists.


Another thing I forgot to mention is batching, of course.

And what does that have to do with strips vs. lists? They both batch equally, so what's the problem?


Since then strips have been effectively dead.

Of course. They're dead because John Carmack, the lord high father of all graphics, decreed it so, not for any actual substantive reason. The source code of the Quake 3 engine is the most perfect rendering engine of all time and everyone should write their engines exactly as he did.

Might I suggest that you try an argument that is something more than an appeal to authority. Doing what Carmack did over a decade ago may be fine for you, but for some of us, we want a coherent explanation for why triangle lists are a priori superior to triangle strips.

mhagain
01-26-2011, 03:48 PM
Might I suggest that you try an argument that is something more than an appeal to authority. Doing what Carmack did over a decade ago may be fine for you, but for some of us, we want a coherent explanation for why triangle lists are a priori superior to triangle strips.
I thought I gave one upthread with the example of a model that would otherwise need to be represented with a combination of strips and fans. Or did you miss that?

Alfonse Reinheart
01-26-2011, 03:59 PM
I thought I gave one upthread with the example of a model that would otherwise need to be represented with a combination of strips and fans. Or did you miss that?

I didn't see it. Unless it was in an off-thread link or something.

Either way, the possibility of a model that strips poorly does not mean that strips are "obsolete" or otherwise not a worthwhile tool to use.

mhagain
01-26-2011, 04:07 PM
I thought I gave one upthread with the example of a model that would otherwise need to be represented with a combination of strips and fans. Or did you miss that?

I didn't see it. Unless it was in an off-thread link or something.
In fairness it was a little bit buried.


Either way, the possibility of a model that strips poorly does not mean that strips are "obsolete" or otherwise not a worthwhile tool to use.
I've no objection to the use of strips in cases where the programmer has done the proper groundwork (including profiling of properly written strip code vs properly written list code) and determined that strips are the best solution for their requirements. Likewise when coding to hardware that prefers strips. My problem is when strips are still being recommended as the best-case general solution; what I've been saying here should really be read from that perspective.

Alfonse Reinheart
01-26-2011, 06:31 PM
My problem is when strips are still being recommended as the best-case general solution; what I've been saying here should really be read from that perspective.

I agree with your intent, but that's not what you said. Your initial response to Ugluk disparaged the use of strips as "fooling around." And you outright stated that using them would lead to "heavy investment in an overly complex and fragile renderer."

There's a difference between the position that "strips shouldn't be the default case" and "using strips will destroy your engine." And your words clearly indicated the latter position more than the former.

Dark Photon
01-26-2011, 07:08 PM
I agree with your intent, but that's not what you said. Your initial response to Ugluk disparaged the use of strips as "fooling around." And you outright stated that using them would lead to "heavy investment in an overly complex and fragile renderer."
We're splitting useless hairs again here, Alfonse.

Bottom line: optimize for the post-T&L vertex cache first. That's the big fish. Then, if you really bored, knock yourself out and stripify the indices.


And again, there's this whole ignoring the fact that strips use no less than 50% fewer indices than lists. That's really important if you've got a lot of indices.
"Really important"?

Have you ever proven that you're bottlenecked on the bandwidth for these tiny extra indices that hit the post T&L vertex cache? Really?

Have you ever heard of anybody that's bottlenecked on them?

What's the "really important" gain that we're talking about here? What were the frame times, before and after stripifying the indices? On what GPU/driver?

Wasabi
04-06-2011, 09:47 AM
I am aware this has become an old thread, but I've been away from my program for a long time and just need a clarification on the subject now that I'm finally back at it.

From the discussion that ensued from my question, I take it that GL_TRIANGLE_STRIP isn't worth it: there's the risk the hardware is accelerated to run on GL_TRIANGLES and _STRIP would thus be slower and, if not, the speed gain would be marginal since the only benefit would be less calls into the index buffer. Is this correct?

Dark Photon
04-06-2011, 06:22 PM
From the discussion that ensued from my question, I take it that GL_TRIANGLE_STRIP isn't worth it: there's the risk the hardware is accelerated to run on GL_TRIANGLES and _STRIP would thus be slower and, if not, the speed gain would be marginal since the only benefit would be less calls into the index buffer. Is this correct?
As far as non-indexed TRIANGLE_STRIPS, right. Unless you've got some really junky GPU you're rendering with, you can get more bang for the buck with indexed TRIANGLES.

With non-indexed TRIANGLE_STRIPs, your ATVR (avg transform to vtx ratio) on a 2D mesh can be up around 2. But with indexed TRIANGLEs, you can get it down to closer to 1.

On that subject, I love this blog post:

* Strippers (http://home.comcast.net/~tom_forsyth/blog.wiki.html#Strippers) (by Tom Forsyth)

As he says, optimize your indexed TRIANGLES for maximal vertex cache reuse first! Then if you're just bored, you can try and save a really tiny bit of index bandwidth by compacting them using indexed TRIANGLE_STRIPs (if your triangle order optimizer lends toward producing contiguous triangles), possibly using primitive restart or degenerate tris to join multiple strips in one batch. But whatever you do, don't change the vertex order! That saves real perf!

And on the indexed TRIANGLE_STRIPS thing, I don't think I've ever seen anyone produce a test case where they were index bound, such that stripifying the indexed TRIANGLES gets them anything perf wise.

So really, I'd just forget TRIANGLE_STRIPs for TIN meshes unless you're doing something really special -- such as deving on a mobile device where memory is at a super-premium, or you're doing something special like rendering a regular grid where you can walk the mesh back-and-forth in strips that fit in the vertex cache (where in that case you wouldn't even use a triangle order optimizer). In that case, maybe you want to consider indexed TRIANGLE_STRIPs.

But non-indexed TRIANGLE_STRIPs? Nuke 'em.