PDA

View Full Version : Proper usage of triangle strips



Don't Disturb
06-28-2002, 10:23 AM
I've been reading the various optimisation documents at nvidia.com, and I was getting the idea that long triangle strips were a good thing, until I read "Do not create degenerate triangle strips" in http://developer.nvidia.com/docs/IO/1295/ATT/GDC01_Performance.pdf
My program currently draws lots of triangle fans (using indexed VAR) like so:

X---X---X
| \ | / |
X---X---X
| / | \ |
X---X---XShould I stay with this method, or join the triangles into long strips, as recommended by http://developer.nvidia.com/docs/IO/1370/ATT/GDC2000_Vertex_Buffers.pdf and most of the other documentation there (although most of it seems to refer to Direct3D).

[This message has been edited by Don't Disturb (edited 06-28-2002).]

[This message has been edited by Don't Disturb (edited 06-28-2002).]

knackered
06-28-2002, 12:51 PM
It's funny - I noticed that. On the one hand nvidia are recommending degenerates, and then on another hand they advise against it. I don't see what the problem would be with degenerates, especially with indexed arrays - how difficult would it be for the driver to realise that 2 of the indices of a triangle are the same, and ignore the setup of that triangle?

BTW, for what its worth - I noticed that I get better frame rates USING degenerate triangles rather than seperate calls to drawelements.

[This message has been edited by knackered (edited 06-28-2002).]

zed
06-28-2002, 01:02 PM
i think they meat to say 'do not create them unless u haveta' translated as dont let every second triangle be one.

degenrant tri strips are the fastest on nvidia hardware for opengl + d3d

Don't Disturb
06-28-2002, 02:17 PM
I decided to do a test, and found that fans are actually faster (in the case I described).

My results for 10000 * 200 fans (each fan = 8 non-degenerate triangles, scaled to sub-pixel size):
Strip: 1.27 seconds
Fans: 1 second

That's 16 Mtris/sec for the fan, 12.6 Mtris/sec for the strip.

I was quite unimpressed by this, so I decided to set all indices to 0, to see how much degenerate triangles cost.

The results were then:
Strip: 1.27 seconds
Fans: 0.86 seconds

So it seems that (on my GeForce2 (latest drivers) at least) degenerate triangles are not culled very efficiently at all.
Could someone from nVidia comment on this?

Korval
06-28-2002, 05:05 PM
Degenerate triangles shouldn't be culled by the driver. Doing things the correct way, the driver isn't wasting precious time looking at my index data. It should just be sending vertex indices up to the hardware.


In your particular case, were the strips cache-aware strips that would take advantage of the GeForce's post-T&L cache? Also, were you using VAR? If not, I'd suggest repeating the test with VAR, as otherwise, you're seeing a lot of random overhead from the driver.

Don't Disturb
06-29-2002, 04:17 AM
Yes, I am using VAR.
It's not very time consuming to compare 3 integers.
For 31960000 indices (the same as in the triangle strip case I described previously) it takes 0.18 seconds to do 3 comparisons on each index.
Most of this time (0.1 seconds by my calculations) is spent fetching the indices from memory, so it seems to me that in many cases (such as mine) the benefits could outweigh the cost.
If a culling sheme like this were implemented, you could make more tightly winding strips that are more cache friendly.

As an example, take a 2d array of vertices (like a terrain) that are all drawn (no LOD stuff) using a triangle strip, going from side to side:
----------->\
/<-----------/
\----------->\
/<-----------/

Using this method, typically one vertex is processed for each triangle.

Instead, you could up and down in short sections (length of 5 vertices for a GPU with a 10 vertex cache e.g. GeForce2), so that vertices are cached in this order:

0 1 14 15 24 25
2 3 13 16 23 26
4 5 12 17 22 27
6 7 11 18 21 28
8 9 10 19 20 29

With the exception of the first two columns, one vertex is processed for every two triangles.
Therefore this is a good method to use.

If you consider the cost of degenerate triangles in this case, my results have shown that with the standard driver the cost is proportional to 1.27 for each degenerate triangle, while if degenerate triangles are checked for, the cost is proportional to 0.08 for non-degenerate triangles, and on average 0.04 for degenerate triangles.
There are 8 triangles and 2 degenerate triangles for each section of the strip.
Therefore the cost not culling degenerates is:
1.27 * 2 = 2.54
If degenerates were to be culled by comparing indices, the cost would be:
8 * 0.08 + 2 * 0.04 = 0.72

This means that it would be faster to cull degenerate triangles by checking the indices.
Thoughts?

[This message has been edited by Don't Disturb (edited 07-01-2002).]

Korval
06-29-2002, 12:34 PM
If you consider the cost of degenerate triangles in this case, my results have shown that with the standard driver the cost is proportional to 1.27 for each degenerate triangle, while if degenerate triangles are checked for, the cost is proportional to 0.08 for non-degenerate triangles, and on average 0.04 for degenerate triangles.

How are you culling degenerate triangles? Are you doing it manually, or do you have another way of removing degenerate triangles?


This means that it would be faster to cull degenerate triangles by checking the indices.

I believe that nVidia is quoted somewhere as saying something to the effect of, "Don't use degenerate triangles."

Unfortunately, nVidia has shown many demostrations that do use degenerate triangles.

So, I guess the conclusion is that, for nVidia hardware, you should avoid degenerate triangles, regardless of what they said elsewhere.

knackered
06-29-2002, 12:53 PM
I'm sure the LEARNING_VAR demo uses degenerate quad trips.

Don't Disturb
06-29-2002, 02:37 PM
>How are you culling degenerate triangles?
I don't. All triangles are sent in a continuous strip (nVidia recommends strips at least 200 triangles in length for optimum performance), and as I have shown, by using degenerate triangles strips can be made more cache friendly.

My point is that nVidia (and other companies) should consider culling degenerate triangles by checking for repeated indices, rather than culling them further down the pipeline.
Matt? Cass?

dorbie
06-30-2002, 05:50 AM
How about a triangle fan?

Don't Disturb
06-30-2002, 01:35 PM
Could you be more specific?

nobodii
06-30-2002, 01:55 PM
Triangle fan ? so you mean there is a way to do "degenerated triangle fans" ?