Hi there
Since the Advanced Forum still seems to be down, i´ll post my question here. It isn´t that advanced anyway.
So, i started a new engine. I began with rendering my sectors as simple as possible. Sorted by texture and than brute-force, letting the GPU do the rest (bf-culling, depth-sorting, etc.).
Speed was as expected. For 6500 textured triangles (no shaders), rendered in 8 batches (8 texture switches) i got 190 FPS.
A z-only pass speeded it up, after i added shaders.
Now i thought i could speed up the efficiency of that z-only pass by rendering it front-to-back. No problem. No textures, no color-writes, only a few big batches, just the order of the indices changed.
However, instead of speeding up a bit, it slowed down from 125 FPS down to 35 !!!
This is all on a Radeon 9600XT.
I read ATIs SDK and there i found a passage, which says that random vertex-accesses are worse then sequentiel updates, because of the pre-T&L cache.
Anyway a slowdown of 90FPS ??? Is this still expected behaviour?
The SDK also says, that aligning data on 32 bytes will increase random access speed. My vertex-data is 64 bytes big. I use VBO, so the driver should be able to align it very well, no?
I don´t understand this heavy slowdown. Anyway, BSP-trees seem to lose their advantage in 3D rendering, because of the heavy cache misses they cause.
Jan.