I tried to implement hardware T&L in the graphicengine of Transport Magnat which will published in Europe in the States in coming October. In the GL-version we are using GlDrawArrays for the rendering of our landscape and our meshes. The performance is in my opinion much too slow, 15 frames if you are looking on a wood, 40, if you are looking just on the landscape with some flowers, trees and stones.
I experimented with glVertexArrayRange in a little demoprogram and the results there were absolutely great, 1.7 million polygons per second without VAR, 7.2 million polygons using var, so very similar results to the demo offered by NVidia (learning_var). So far so good. First of all I was very surprised that it was impossible to allocate AGP-memory, I debugged the learning_var-demo and even there it was impossible to do this. As alternative learning_var is using hardwarememory then (at least in the docu is written, that if your last param is 1.0, it’s hardware-mem). Ok, in the docu there’s written, the write access to this would be very slow, why can I write the vertexdata for 7.2 million polygons rendered through trianglestrips then, if it’s officialy slow? This is one of the facts, which are very mysteriously in my opinion, that AGP memory is anyway on absolutely no one of our PCs in the whole firm allocateable anyway. So, how ever, so we’ve allocated a block of memory now, nobody knows where it really is, but how ever. So the experiment-results were great, everybody in firm was saying “Wow, is T&L fast, I want a Geforce toooo!”.
Then the practical test to implement it directly in the gameengine. I tried to draw the trees with the help of VAR and the results were absolutely awful, without VAR 15 frames, with war 10 and my mouth hang down as if I’d seen a ghost. So far so bad. I looked for which costs frames and found out, that the enabling and disabling of GL_VERTEX_ARRAY_RANGE_NV is absolutely deadly. So I just still enabled it at the start of the program, outcommented all source, which could have problems with this and tried again. 20 frames without T&L, 19 frames with. And this on all PCs here, which are all using Geforce 2MX, but else with completely different configurations. =(
After two long neights of trieing to reach a great performance using VAR, I gave it up, absolutely no performance increase, with a lot of limitatins still the same framerate as without using VAR. The only positive thing I detected was, that the Z-buffer-test was faster and that if I set the viewrange to more far than the Z-far is set to, that the framerate really becomes a bit higher, but…that’s senseless for our engine, because we are normally anyway just drawing objects in range.
I am buffering all my objects and render them before the BufferSwap through just some less calls of DrawArrays.
So…is it really possible to increase the performance by the factor 4 through VAR, is there anyway a got solution for increaing the performance or is the learning_var-demo just a fake and gets all of it’s performance just through the matter that the most of the polygons are Z-culled as in my experiment-program as well through which it saves fillrate-performance?
Michael Ikemann / Virtual XCitement Software Gmbh.