Hello, i would like to confront my timings while rendering a single big unstructured mesh (1M triangles or much more), colors and normals per vertex, no textures, single light, backface culled.
My approach is to spatially subdivide it into small blocks (1 to 7k vertices), quantize, stripify each block and send them (Vertex array) through AGP.
For real data my result on a NVGeforce4 Ti 4200 (latest detonator driver), Athlon 1200Mh, AGP 4x is just 17M triangles per second.
On the other hand I made a testset of 1M triangles organized in small blocks each containing a cylinder, (vertices forming an helix 12 vertex/turn) and reached 40M/sec.
I noticed that:
- The size of the blocks matters (best is from 1k to 7k vertices).
- the length of the strips little changes the speed. (its important above 30M/sec).
- i could degrade performance to 30M/sec by avoiding the vertex caching. (100vertex/turn)
- if i reorder vertices in the block and performance drops considerably, depending on how much (and expecially how locally) i permutate.
Questions:
- are my number decent ones?
- did i forget some trick which could boost my numbers?
- It is correct that locality of the vertices respect to the strips is SO important?
Note 1: data formats:
glVertexPointer(3, GL_SHORT, 8, Vstart);
glNormalPointer(GL_SHORT, 8, Nstart);
glColorPointer(4, GL_UNSIGNED_BYTE,0,Cstart);
Note 2: i do not copy data into AGP each frame since all model fits in it. both of the above cited mesh were 1M triangle.
Note 3: i am using nv_fences extension.
Note 4: i use a simple stripifier of my own which seems to work faster and better than nvidia one. Probably i could not use the nvidia one properly, any suggestion welcome.
Note 4: rasterizazion seems not to be the bottleneck as i did not increase performance enabling cullface(front_and_back) or by resizing the window.
thanks,