If you’re really looking for optimal performance out of strips with a post-T&L cache, you’re doing it wrong.
BTW, your numbers seem to be for an 8x8 array of vertices. If you have 8x8 quads, you actually have a 9x9 array of vertices. So, my numbers will use a 9x9 array, rather than an 8x8 array like yours does. Even with this increase in vertex count, you will see that this is faster in number of transforms.
Given a cache depth of 10, (GeForce 3’s have something like 16-20), you do this. Break the model into two columns, 5 verts each (sharing the middle column of verts).
Send the following in order:
- A degenerate strip containing the first 5 verts in the top row(5 indices). This is called seeding.
- Degenerately add (2 indices) a strip containing the top 5 verts and the next 5 (10 indices).
- Repeat 6 times (12 * 8 repetitions = 96 indices)
- Degenerately (2 indices) seed the next column (5 verts), but starting at the bottom, not the top.
- Repeat step 3 going up the second column (12*8 = 96 indices).
Now, to compute the number of transforms. Step 1 causes 5 transforms. Step 2/3 causes only 58 or 40 transforms (degenerate tris don’t count towards or against). Step 4 causes 4 transforms (one vert is in the cache already). Step 5 causes 58 or 40 transforms. Total: 89 transforms. The minimum number of transforms is 81 (9*9), so we’re running pretty close to optimally here. That’s a 10%+ improvement over your “curve” method, and it’s rendering more triangles.
Since this is just one gigantic degenerate strip, it is one function call. However, if we wanted to lose the degenerate strips, we could use glDrawMultiArraysEXT instead, were we draw several arrays in one shot. This is a pretty new extension, however.
Assuming degeneracy, this costs 206 indices. Granted, that’s more than either method given here, but it’s probably faster to do it this way. Without degeneracies (still seeding), it costs 170 indices.
With nVidia’s restarting primitive extension, the driver doesn’t even really have to be involved (like it probably does with glDrawMultiArrays, copying to multiple buffers and all). You can even put them in VAR memory with new VAR extensions.
BTW:
We reverse the drawing order (left-right, right-left) in alternate rows
That’s a state change (assuming you’re talking about changing the cull mode). Therefore, it’s likely that this will kill any performance this method would gain.