PDA

View Full Version : Which way to cover with TRIANGLE_STRIPs is faster?



GLRon
11-28-2010, 06:44 PM
I have an surface that a single non-overlapping triangle strip can't cover. Which approach is more efficient?

(a) Creating two triangle strips, the first with 10 vertices, and the second with 4 verticles (10 triangles total)

(b) Creating one triangle strip with 14 vertices (12 triangles total: one tri overlaps, and one tri degenerate into a line since all points have the same y- and z-values).

If you know why, that would also be nice to know. :)

ugluk
11-28-2010, 07:43 PM
Strange question. The answer is:

b)

Why? Because calls into GL cost CPU time. If you consume too much of it, your app becomes CPU limited.

GLRon
11-28-2010, 07:47 PM
Thanks for explaining! :)

Why a strange question? I started learning OpenGL yesterday and it wasn't obvious whether to strive to optimize the # of triangles or the # of calls.

(Although I'm using immediate mode now and I realize that in itself's not ideal for performance.)

ZbuffeR
11-29-2010, 12:42 AM
b) - "one tri overlaps" this is likely to cause depth fighting and be visually problematic.

Try doing sensible things first, like avoid immediate mode, before doing more dangerous things that would bring limited improvements.

mhagain
11-29-2010, 01:37 AM
The first thing you should do is read this:

http://hacksoflife.blogspot.com/2010/01/to-strip-or-not-to-strip.html

Triangle strips are seriously old hat; they were great in 1998 but much much faster and more flexible ways of handling geometry are now available, ways which involve considerably less setup on your part, and ways which you should be using instead.

aqnuep
11-29-2010, 02:00 AM
If you want to work with triangle strips, you should rather use primitive restart insttead of degenerate or overlapping triangles.
However, I agree with mhagain that strips don't provide that much of performance nowadays. You should rather go with indexed triangles and try to depart from immediate mode as it will always result in CPU bottleneck due to the enormous number of API calls needed.

GLRon
11-29-2010, 02:03 AM
> Try doing sensible things first, like avoid immediate mode,

The context is I'm studying a computer graphics textbook and it is asking me to using triangle strips (and fans, and quads, and quad strips, and lines, and points, etc.) in immediate mode.

Textbooks usually do not cover the cutting edge of technology, but (hopefully) lay a strong foundation.

> Triangle strips are seriously old hat;

> You should rather go with indexed triangles and try to depart from immediate mode

Thanks, good to know. I read your article on indexed triangles, and will read it again once I get further along in my studies. ;)

Dark Photon
11-29-2010, 06:40 AM
> Try doing sensible things first, like avoid immediate mode,

The context is I'm studying a computer graphics textbook and it is asking me to using triangle strips (and fans, and quads, and quad strips, and lines, and points, etc.) in immediate mode.
That's great for getting started, but definitely not after a few weeks down the road when you have your "graphics legs" and want good performance. Look in the index for "vertex cache" (or "post T&L cache", though that naming is a bit antiquated) and if you don't find it, you might consider getting a newer graphics book to read along side it.

A few more excellent blog posts to read regarding triangle strips and why they're a has-been when it comes to GPU performance, and have been for many years:

* http://home.comcast.net/~tom_forsyth/blog.wiki.html#Strippers
* http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Vertex%20Cache%20Optimisation]]
* http://developer.nvidia.com/object/devnews005.html (search for cache)

ugluk
11-29-2010, 08:11 AM
ZBuffer, I got my recommendation from this document:

Rendering Huge Triangle Meshes with OpenGL: Louis Bavoil



To connect 2 strips, use degenerate triangles, or the
GL_NV_primitive_restart extension.
For example, to connect 2 strips of array of indices A and B, you can use:
... A_n-2 A_n-1 A_n-1 B_0 B_0 B_1 B_2...


But I see the use of strips is mostly pointless, nowadays.

mhagain
11-29-2010, 08:22 AM
Anyway, if one uses tristrips, can one be reasonably certain they aren't going to render slower than indexed triangles?
Strips will render slower than indexed triangles because strips are completely unable to make use of the GPUs vertex cache, meaning that duplicate vertexes will need to be retransformed and/or vertex shaders will need to be run again for them.

The only way to make use of your vertex cache is to use indexes - because the cache stores indexes. You can, of course, order your indexes to replicate a strip layout if you want, but the point is that you need to use indexes.

Note that this only applies in cases where this is actually your application bottleneck. If your bottleneck is elsewhere then you won't notice a difference. But at the same time that doesn't mean that you should feel that it's OK to use strips, because doing so could make this become your bottleneck!

ugluk
11-29-2010, 08:44 AM
I'm so used to using indices, that I've completely forgotten to mention, that yeah, I've meant indexed tristrips. Now, triangles in a strips are adjacent, so maybe they make good use of the vertex cache?

But if I extrapolate your statements, it seems, that DrawArrays() would then also render slower than, say, DrawElements(), as there are no indices to cache.

About the bottleneck, yeah, I've read about it. You need to identify and eliminate them in sequence.

mhagain
11-29-2010, 08:57 AM
Now, triangles in a strips are adjacent, so maybe they make good use of the vertex cache?
Not really. See http://home.comcast.net/~tom_forsyth/blog.wiki.html#Strippers

The ultimate stripper will get you one vertex per triangle. But even a very quick and dirty indexer will get you that, and good indexer will get close to 0.65 vertices per triangle for most meshes with a 16-entry FIFO vertex cache. The theoretical limit for a regular triangulated plane with an infinitely large vertex cache is 0.5 verts/tri.So indexes at worst will typically give you the same as strips, with a good setup giving you ~1.5 times the throughput of the best strip.


But if I extrapolate your statements, it seems, that DrawArrays() would then also render slower than, say, DrawElements(), as there are no indices to cache.
Correct, yes.

ugluk
11-29-2010, 09:18 AM
Well, there's still some hardware, that does not have vertex cache (like mobile phones and gadgets). For those the good ol' strips are still the way to go :)

GLRon
11-29-2010, 11:51 AM
> Look in the index for "vertex cache" (or "post T&L cache",
> though that naming is a bit antiquated) and if you don't find
> it, you might consider getting a newer graphics book to read
> along side it.

Thanks, I'll keep that in mind. The next chapter covers vertex arrays and retained mode so at least it's going in the right direction.

> Well, there's still some hardware, that does not have vertex
> cache (like mobile phones and gadgets).

I'm also a full-time software developer writing an iOS app, so that's a great clarification. :)

iOS programming also requires Open GL ES, a subset of OpenGL.

mhagain
11-29-2010, 01:28 PM
Indexed triangles are advantageous for more than just the vertex cache. You can, for example, use them to group multiple strips (or fans, polygons, quads, etc) sharing the same state together without needing to fool around with primitive restart or degenerate triangles.

I would seriously recommend benchmarking them against strips before making any commitment to using strips, and definitely before making any investment in stripping your geometry.

ugluk
11-29-2010, 05:27 PM
Yeah, ipod/iphone are Steve Job's fodder for the masses. No vertex cache there I think.