I suppose that this is somewhat driver related, but I thought I'd ask your opinion...
Would it be faster to replace a tight set of glPushMatrix/glPopMatrix by just restoring the earlier matrix by loading it with glLoadMatrix?
I suppose that this is somewhat driver related, but I thought I'd ask your opinion...
Would it be faster to replace a tight set of glPushMatrix/glPopMatrix by just restoring the earlier matrix by loading it with glLoadMatrix?
You have to check that yourself, but pushing poping is very fast, loading is not quite the same procedure, Loading is something like glTrans/Rot.
Yeah, I'll run some tests on my own, I just hoped that anyone could give me a quick insight before I knack up a test bed in vain.
Thanks, Madman!![]()
Push/Pop is done directly in hardware with a hardware matrix stack (just make sure that you do not overflow it). Load is done from system memory over the AGP bus => NOT at all as fast (I even think the GL spec sais "use glLoadIdentity() instead of glLoadMatrix( my_identity_matrix ) since the latter may be slower", or something similar, which may give you a clue.
That's exactly what I wanted to hear, thanks, Marcus!![]()
Marcus,
Are you sure the push and pop are in a hardware stack? I don't have any benchmarking data that would indicate that it's done in hardware. Seems to be a thing done in the driver, as far as I can tell.
What hardware are you thinking about?
"If you can't afford to do something right,
you'd better make sure you can afford to do it wrong!"
jwatte,
I don't have any hard proofs. I think I have read it in (at least) one place before, but I may be wrong. It just seems odd to me to limit the matrix strack depth unless it's implemented in hardware. Also, it makes perfect sense to do it in hardware, since it really only requires a quite small memory buffer to implement it - and push/pops can be quite frequent in cetrain applications. You probably need a mirrored software stack too, in otder to do fast glGet:s (without having to stall the pipe or go over the bus).
Anyway, that's my 2 cents. I haven't designed any GL hardware/drivers myself.
It would make sense for a TnL card to use a hardware based stack.
On the other hand the driver has issue a command over the AGP bus anyway.
Assuming such a command would be 32bit it could be send in one AGP cycle. If you attach the matrix to the command (in case the stack is software based) you would have to transfer 32Bit + (16 * 32Bit) = 68Bytes or 17 AGP cycles for every push or pop.
So we are speaking of a difference of 3.76e-9 seconds vs. 6.39e-8 seconds at AGP4x.
Of course I dont know "how its done" thats only what my math tells me.
[This message has been edited by HS (edited 04-20-2003).]
For the decision between glPush/PopMatrix and glLoadMatrix, it doesn't really matter if Push/Pop is implemented in hardware. I would use glPush/Pop because it COULD be implemented in hardware and save bandwith, but glLoadMatrix will transphere the matrix for sure.
I am pretty sure that even on T&L cards the matrix stack is done by the driver in software, but who knows when the first card that can do matrix manipulation in hardware?
This topic is very similar to the discussion about doing T&L yourself vs. letting OpenGL doing it, knowing that with both methods end up using CPU power. Now, a few years later, it is obvious which method to prefer, because we have hardware T&L cards.
The point is, if OpenGL can do something for you, DON'T try to implement it in software, because the worst case is that the driver does it equally fast/slow, but it could be that it is faster.
Exactly my opinion (it goes for many other things than OpenGL too, but OpenGL is a very good example of this philosophy).Originally posted by Overmind:
The point is, if OpenGL can do something for you, DON'T try to implement it in software, because the worst case is that the driver does it equally fast/slow, but it could be that it is faster.
There is usually very little point in trying to optimize something in software that can be done by drivers/APIs. If something is done in software in the drivers, chanses are very good that the driver writers are very competent and likely to do a better job than yourself anyway. Even IF you can do a better job, that marginal performance gain is usually wiped out in 6 months or so due to improved drivers, better hardware, new PC configurations etc. etc.
...and if people ask more from the drivers, HW vendors will be forced to do better drivers.