didn't know that varyings take up cache size. good info.
seems like 32 varyings = no cache
didn't know that varyings take up cache size. good info.
seems like 32 varyings = no cache
_NK47
I didn't test for such a case (with full varyings load). But cache size really decrements with varyings usage incrementing. Not uniformly, but in uneven manner. Actually, when I was testing for cache size (about year and a half ago), I didn't know about 32 attribute uniforms and I was using only standard COLOR0-1 and TEXCOORD0-7 notation.
I can only suspect, that using full varyings bulk will lead us to something like 12-16 vertices.
Also, there's a big deal with geometry shaders. Hardly using them for vertices generation seems not to be good scenario for taking benefit from post-T&L cache.
I have a really stupid question (at least I feel stupid for asking...).
I went over Tom's paper and I thought I understood what he is doing. However, looking at his "complete" source code, I find myself scratching my head. What am I supposed to do with that? It assigns a score to a vertex and then what? How do I reorder the face list? Doesn't it make more sense to assign the faces a score? Not only that, but he did not include the data structures. Should a pointer to a vcache_vertex_data structure be included in each vertex struct or is it the vertex struct? It only has two values, NumActiveTris and CacheTag?
This leaves me with more questions than answers. Please feel free to laugh at me while offering advice.
Thanks,
Mark
MarkS,
that link sould help you more: http://www.opengl.org/discussion_boa...;Number=235320
Another good read is "Optimization of Mesh Locality for Transparent Vertex Caching" by Hugues Hoppe. Hefty bit on reordering strategies.
Compute tri scores from vertex scoresOriginally Posted by MarkS
MAIN LOOP
- Pick off the highest-score triangle (or approx highest, if using high score cache)
- Add to "draw" list
- Adjust num tris for each vertex not drawn
- Add tri to vertex cache
- Update changed vertex scores
- Updated changed tri scores
You're gonna have to read it a few times to figure it out and get the O(n) perf, but you can do it.
Well, that explains my confusion. I wasn't looking at this as part of the rendering pipeline, but as a pre/post processing technique.
What I'm doing is exporting my models from Blender. However, Blender exports the faces in an odd manor. At first I thought the vertices and faces were being exported randomly, but upon further inspection, they are being exported in 4 - 6 poly clumps. What I was trying to do was write a post processing utility that would take the model and "correct" it.
Of course, I haven't the slightest idea what Blender is doing or why. It may, in fact, be exporting the faces in a cache-optimized fashion. I'm waiting for a reply in the Blender forums on this matter.
Regardless, I don't think this algorithm is what I need at this point, although I can see the need down the road.
You can still use it as a post-processing technique. You simply store the triangle's vertices in a vertex list rather than drawing it. Simple.I wasn't looking at this as part of the rendering pipeline, but as a pre/post processing technique.
It may not. It certainly isn't going to hurt to just fix it yourself.Of course, I haven't the slightest idea what Blender is doing or why. It may, in fact, be exporting the faces in a cache-optimized fashion.
No, you were on the right track before. It "is" a pre-processing technique you'd run post-modeling but pre-realtime to order the triangles in your batches for much better post-vertex-shader cache performance (i.e. fewer vertex shader runs on the graphics card).Originally Posted by MarkS
Of course you could just as well run this on a background thread if you're dynamically building batches in your realtime.
When I said add to "draw" list, I'm talking about the list of triangles/verts your building for drawing later.
By the way, if you're comfortable using DirectX, there is an API call which optimizes the order of the vertices in a mesh. You could write a very simple app that reads your model in the DirectX data structure, calls the "Optimize" method (I don't remember its name) and writes back the result to a new file.
I don't know if it uses the same algorithm as described above, but it's already all done and fully debugged.