Originally posted by Madoc:
Seems strange that drawarrays should be slower than immediate mode. It should still alleviate a good deal of CPU work and I thought it facilitated DMA transfers. With some of the HW I used to work with waaay back, drawarrays was actually the fastest method, faster than drawelements. Itās been far too long since I used it so I canāt say about any recent experiences with it.
If you render a lot of static geometry with a single DrawArrays call, it should be faster, I agree.
If youāre changing pointers frequently and rendering with lots of DrawArrays calls, it may well be slower.
If your geometry is dynamic, and you build the whole array up front beforehand, then you may not be getting the CPU/GPU parallelism that you would with immediate mode.
Mainly, I wanted to point out that āarrays are fasterā is not a simple truism. In order to make things faster, the feature/mechanism must be widening a bottleneck that is currently limiting performance.
Cass,
you made we want to bring up another question. Thereās been a few discussions about large vs many small VBOs. You said the cost was in the glpointer calls. What I didnāt find clear is whether the cost of these calls is greater when a different VBO is bound or if itās the same even under the same VBO.
In other words, as an example, would be well off binding a single VBO and then specifying different offsets through glpointer calls (possibly maintaing smaller index formats) or should we minimise the number of gl*pointer calls and use larger indices and rely on DrawRangeElements to reduce the index sizes?
This will vary some among implementations, but for NVIDIAs, the performance will be mostly driven by the number of gl*Pointer calls, not so much by how many VBOs are involved.
Too many VBOs and you pay some (marginal) penalty for more frequent VBO state changes. Too few VBOs and you pay a (potentially very high) penalty for forcing a coherent CPU/GPU view of an unnecessarily large chunk of memory. Forcing this coherency requires either synchronization stalling or lots of in-band data copying. This is a real waste if that coherency is not essential.
Small VBOs solve the coherency problem and make driver-side memory management much easier. In the long term, I expect a one or two attribs for a few hundred vertexes per VBO to be āfreeā. And it will never hurt (though it may not help much) to pack multiple attributes (perhaps from multiple objects) into a single VBO ā if they are static or nearly static. This is probably a good idea if you have lots of static objects with very few vertices - though if you donāt render these things all at the same time, immediate mode may be better still.
Does that help?
Thanks -
Cass
edit: clarification ā¦
[This message has been edited by cass (edited 12-18-2003).]