speed of GL_PROJECTION vs GL_MODELVIEW?

I have observed on my system (nVidia low-end Doom 3 card) that changing GL_PROJECTION seems to be much slower than changing GL_MODELVIEW.

This makes sense to me because most renderers set up the GL_PROJECTION space about once per frame but change GL_MODELVIEW about once per model.

But I’m curious about what specific overheads are likely to be going on in the card/driver during the GL_PROJECTION setup that makes it so much slower than GL_MODELVIEW.

by changing what do u mean?
ie by calling glMatrixMode( GL_PROJECTION )
or (matricx mode is already set)
LoadMatrix() or mult etc

if the latter then changing the projection matrix will alter the number of ‘contained’ vertices (actually so will changing the modelbiew matrix).

are u testing without drawing anything?

3e8 m/s

I’m sorry someone had to say it. :stuck_out_tongue:

No, but seriously, how are you doing your updates? If, like you say, you’re doing the modelview modelview modelview for most of the frame then perhaps the data you’re storing is forcing the data that form the projection matrix out of the cache–or a few caches somewhere between point A(CPU) and B(cathode ray tube/LCD pixel thingy). Also, when you set your projection matrix, are you using a call like gluPerspective or glFrustum? Those have to do a little more complicated math and work than glRotate, glTranslate, glScale, glAdIDidntSayBanana.

Perhaps what you’re actually seeing is the time spent by the previous frame swapping buffers and/or clearing buffers and not really the projection transform?

I’m not a doctor, I really don’t know what’s causing the “slow down” you describe; how massive is it? Do you think we should alert the military? Just kidding, sorry, it’s just that I have sarcasm compiled into my mental kernel. Seriously though, how much of a slowdown is it?

You can write your code using Vertex Programs only, so you don’t have to use the GL_MODELVIEW / GL_PROJECTION matrices.

Originally posted by execom_rt:
You can write your code using Vertex Programs only, so you don’t have to use the GL_MODELVIEW / GL_PROJECTION matrices.
Would a high-end GPU of today run (all of) e.g. DOOM 3? I don’t think I’ve seen any pins on the chips used for real-time input, such as kbd or mouse. :wink:

But seriously, even if I don’t see the reason a project matrix switch would take any longer than modelview matrix switch, is there even a place (on the web) where the “cost” of different operations is recorded, both in terms of GPU and CPU time used, and in case of GPU both the time for the call and the time for the finished operation, for different chips/cards, drivers, busses and so on, wrt OpenGL?

I know I drift OT now, but I have been thinking of this a bit, and as it still bugs me I suspect it could bug more people out there.

I myself have many times been thinking “is it really worth setting up this client state, or even server state, for just these n stray vertices, or would I be better off using plain glVertex etc calls?” (used just as an example to communicate the idea. In general it often involves way more state changing and setup).

Doing local profiling I can only observe the timings for my particular setup(s). It could be that what’s good for speed for chip X is bad for Y, or that Y on bus B2 is faster for that op than X on bus B1, or that X is CPU bound while Y is I/O bound. So without a publically available source listing these numbers (with CPU usage numbers for the calls), aren’t we all basically left to profile ourselves on all cards/machine combo’s we can get our hands on (if we care about quality of what we release), and therefore almost always are left with incomplete data to make a decision, and unsatisfied users as a result?

    for( int ii = 0; ii < 100000; ++ii )
    {
        glMatrixMode( GL_MODELVIEW );
        glLoadIdentity();
        glScalef( 1.0f, 1.0f, 0.5f );
    }

My PC can complete this loop once per frame for 104 frames per second. Change the matrix mode line to GL_PROJECTION, and I only see 63 frames per second.

I must get to the bottom of this!!!

Obviously a different, lazier, driver writer worked on the GL_PROJECTION path. This person must be found and punished.

How often do you actually change the projection matrix in a frame??

I think driver developers have more important things to do…

I’m currently, in my renderer, doing this:

for every model
    push modelview
    mult model matrix
    draw model (with a vertex shader)
    pop modelview
end for loop

I haven’t tested it, but, is sending the matrix as uniform, faster? Like this:

for every model
    load model matrix as uniform
    draw model (with a (diferent) vertex shader)
end for loop

In the second I save a lot of state changes and function calls, but the vertex shader is slower. Can anyone give me an hint before I go and change a lot of code for nothing? :smiley:

Originally posted by Overmind:
[b]How often do you actually change the projection matrix in a frame??

I think driver developers have more important things to do…[/b]
Is this to do with the depth slice rendering topic? In which case, you’d change the projection matrix as many times as world slice passes * 2 if running in stereo.
Even so, it’s still bugger all work on the scale of things.