benifits of glDrawRangeElementsEXT

ive read the spec + it seems very straight forward yet i cant think of a single instance where this extension will aid performance. and my testing has shown no performance gain has any ideas where this extension will come in useful my brain seems to have gone to sleep.

Ya know for my own timings openGL calls COMMAND_LISTS , CVA, DISPLAY_LISTS are quite similar (perhaps bcoz i intensively use strips) with (FLOAT) vertex coords and so on… The BiG difference comes when using VARange ext but no surprise, GPU knows its job… Anyway, when using (SHORT) vertex coords and optimised range Normals (SHORT or CHAR) it’s another story… coz DISPLAY_LISTS tends to be really slow sunddenly?? instead of other GL calls which are a bit boosted! ) So what? i dunno… The significant performances with any extension is the VARange from Nvidia which is not a part of a standart OpenGL implementation!!

The problem with glVertexPointer and friends is that the driver doesn’t know how many vertexes there is in the array. It just knows where to start looking, and then when you call DrawElements(), it will have to add the indexes you hand it (scaled by the stride) to the base to figure out where the vertex lives.

For software T&L, here’s the implication:

This means that a driver either naively re-transforms a vertex each time it gets referenced in the index list, or that the driver needs to keep a cache of post-transform vertexes with a flag for whether that vertex has already been transformed. Drawing a vertex then adds the operation “check cache” before “transform vertex”. It is also impossible for the driver to take advantage of special memory streaming instructions, because it doesn’t know how far into the array it can index (unless it scans the index array up-front, which takes extra time).

For hardware T&L, here’s the implication:

The hardware doesn’t know how far it can fetch data; it has to do pretty much the same thing as the software driver in keeping a cache of post-transform vertexes and look up the vertex there before it decides to fetch it out of the original array – or just re-fetch and re-transform every time. Further, the driver doesn’t know how far to lock down memory to make accessing the vertex data using bus mastering easy, so it may have to spoon-feed vertex data to the hardware, or run a first pass over the index array just to figure out the minimum and the maximum for itself.

DrawRangeElements helps the driver, by telling it “look, the vertex data lives between A and B, and I promise to not go outside this range.”

For a software T&L card, this is good because the card can use SSE or 3DNow cache control to rip through all the vertexes once, storing them all in a post-transform buffer, which is significantly faster than doing them in a scattered access pattern kind of way. It can then walk the index list and just pull post-transform data for each index without having to worry about whether the vert is already transformed or not.

For a hardware T&L card, the benefit is similar; the driver can lock down the area of memory where it knows the vertexes live and just tell the card to have a go (depending on the way the hardware works, it may do the full-transform-of-everything here, or just set up bus mastering). The driver can do this without having the scan the index array for the minimum and maximum indexes used, thus saving time.

Note that the benefit (in theory) of DrawRangeElements is very similar to that of compiled vertex arrays (LockArrays). The additional savings in LockArrays is that if you re-use the same vertex data in a subsequent call to Draw{Range}Elements, the driver knows it doesn’t even need to re-transform the vertex data at all, and thus the speed-up for every pass after the first is substantial; you cannot make this optimization when using DrawRangeElements because it says nothing about what the application may do with the array after DrawRangeElements returns.

PS: all this explanation is pure speculation based entirely on my reading of the GL spec, and my intuition about how I’d design the hardware/driver interface if I worked on this kind of hardware. So I’d benefit from being proven right or wrong by someone who actually does that for a living :slight_smile:

It seems to me that glDrawRangeElements has limitted usefulness. I would think that a faster method would be to have an interface where you lay out the indices in order out in memory, then call some kind of glLock-type function that promises the driver that you will not change this array of indexed strips (or triangles, or fans) until you call glUnlock. The user would make one call to something like glDrawIndexElements, where it draws the strip at the particular pointer value you give it (also giving it the number of vertices in the strip).

Or, they could use an asynchronous methold like VAR, where the driver gets to pull the data out of the index array as needed. This, of course, requires an NV_fence to synchronize access to the strips.

In any case, I don’t see glDrawRangeElements giving a large speed increase over glDrawElements. But, as long as it doesn’t slow things down, there’s no harm in using it.

[This message has been edited by Korval (edited 04-21-2001).]

cheers guys esp jwatte for his/her info on how he/she do it ,
the reason i asked about glDrawRangeElementsEXT and its benifits is im trying to construct an very artificial case (ie no rendering just transforming) where using it gives an advantage over glDrawElements. so far im yet to see an advantage for example like CVA gives. if anything a slight decrease which contradicts a part of the spec “Performance
should be at least as good as it was calling glDrawElements alone.”
i’ll keep trying i guess

ps i havent got a GPU card thus all the talk about VAR is not very much use , for now

Our newest drivers do support this and do get a speedup from it on GF.

  • Matt

does the same apply to non gpu cards eg the tnt2 (vanta) i know it does support it but does it give any performance benifit.