display list causing CPU cycles

hi,
I have an OpenGL program that I put the entire rendering into a single display list. So my rendering loop is literally three lines like:

glClear
glCalllist
glutSwapBuffers

I noticed that my CPU is still 100% busy, like my other version that doesn’t use DL.
They run at the same rate too.
The only difference I noticed is that with DL, 50% are kernel time, vs. almost 100% user time without DL. Nothing else was running when I did my test.

I have the impression that display lists are precompiled streams and should not cost CPU cycles. So why is the CPU still so busy?

My rendering contains a lot of multitexturing and register combiner code (written through OGL calls) – everything should be hardware accelerated.

any help/suggestions are greatly appreciated.

Ruigang

p.s. I have a GF4 Ti4600 with the latest driver from Nvidia (30.82)

Can you make sure vertical sync is enabled?

  • Matt

Vertical sync is on. But I don’t think it is related. My program can only run at 10-15 fps, so the GPU is really busy, but I don’t understand why the CPU is busy too.

Isn’t that once the CPU sends out the list ID to the graphics card, its job is done?

thanks

Ruigang

Isn’t that once the CPU sends out the list ID to the graphics card, its job is done?

Assuming the graphics card has the ability to natively parse all GL commands that can be copied into a DL.

I don’t have Matt’s experience in this field, but I want to know precisely what is in this display list. It is entirely possible that some OpenGL commands that are done on the CPU (fast) are being copied into the DL. This would require the CPU to stall until that DL command is reached, then process it, and continue on. For example, if you’re using the matrix stack, you may need CPU intervention to do the matrix stack processing, depending on whether or not the matrix stack is in hardware. There are other things that you may do that require CPU intervention that has to be copied into DL’s.

Originally posted by Korval:
…I want to know precisely what is in this display list.

There are many of the following commands

glCominberXXXX
glTexEnv
glTexParameter
glVertex
glBindTexture
glActiveTextureARB
glMultiTexCoord2fARB
glTexGen
glCopyTexSubImage2D
glEnable
glDisable
glColorMask
glBlend
glLoadMatrixf
glMatrixMode
glViewport

Basically, I am doing multi-pass texturing with frequent changes of the register combiner and the texture environment.

Is there anyway to tell which command(s) is causing the stall? Or is it documented anywhere?

Thanks a lot

Ruigang

Is there anyway to tell which command(s) is causing the stall? Or is it documented anywhere?

It’s implementation-dependent. Maybe Matt can give you some tips about which ones are and are not CPU, but I think that might be propriatery information (and certainly of little user when it comes to ATi cards).

In general, however, you will probably be safe putting pure-geometry in display lists. That is, glDrawElements, things that go between glBegin/glEnd. You probably can’t get away with glBindTexture, as this could cause an AGP copy and probably requries some CPU interference to get it to work properly. Anything else would be nothing more than speculation about the internals of a GeForce/drivers.

The only things you should use display lists for are those that will never change, or at least not often. They do take a lot of time to build, and it would be much faster to just outright call the commands each frame if you’re creating the list every frame. Also, your CPU usage while using any 3D app that doesn’t call Sleep(0) or something after flipping the front/back buffers should use all of your CPU time. Need more info…

I only create the display list once. It is WITHIN the list that I have to change the texture environment frequently.

Maybe I can put my question this way: given the constraint that my app has to change the texture environment a lot within a rendering pass, is there anyway to reduce the CPU usage?

I don’t know if I agree with your last sentence. It is only the case if vertical sync is off.

They’ve got a pretty good point, yang11. Where is the CPU busy, in the glCallList(), or the glutSwapBuffers?

If it is in glutSwapBuffers, then it can be ignored. This call will block the CPU until rendering is finished, so that it can then swap the frame buffers. After all, swapping the buffer makes no sense if it happens before rendering is finished.

If the time is in glCallLists, then it is what I said before: some of your GL commands are being done in software, so the call must block until those GL commands can be processed.

given the constraint that my app has to change the texture environment a lot within a rendering pass, is there anyway to reduce the CPU usage?

That depends, as I said. Assuming that the blocking is happening in glCallLists, then, as I said before, we can only speculate on which GL calls are causing the blocking. Put only geometry in display lists, and call other functions outside of DL’s. This probably won’t actually help, however. Maybe your rendering data is just CPU limitted. If possible, eliminate redundant state changes/texture bindings.

I notice that you do a glCopyTexSubImage2D() inside your display list. This may cause the CPU to convert that texture image, depending on formats.

Another method: Try taking out one thing after the other until you find the one that causes the problems.

sorry to be a bit obvious here, but if his frame loop is a frame loop - ie: “process all messages, draw a frame, loop to process all messages…” - it’ll take 100% of cpu time if all other apps are waiting on user messages, stuff like explorer, notepad, outlook, word etc (as they receive messages comparatively rarely)… it simply happens to be the most cpu-intensive task on the system (relative to other tasks). [edit: bad typos]

[This message has been edited by mattc (edited 10-28-2002).]

So you’re asking whether OpenGL allows you to change texture environments in a display list? If what you change them to has only to do with position within that display list, then, in theory, you should be able to. I’m not exactly up-to-date on the GL spec, and know nothing of these kinds of technicalities; I suggest dropping the display list altogether, and using something like glDrawLists(). Or you could always create “sublists,” between which you may change the texture environments… do whatever works for you.

It seems that I can make calls to texenv within a display list – becuase my rendering results are correct.

I have to change the tex environment frequently within a DL for my app. So I guess there is really no way to get down the CPU time… that’s really bad.

yang11,

Your impression of how display lists work is somewhat incorrect. They are pre-compiled streams, but they are not necessarily compiled in a way that no CPU work is required.

GL implementations are completely free to implement display lists as a list of opcode/data groups (like GLX protocol) and play them back like:

unsigned int *dlptr;
while (*dlptr != END_OF_LIST) {
switch (*dlptr) {
case DLIST_GL_BEGIN:
glBegin(dlptr[1]);
dlptr += 8;
break;
case DLIST_GL_VERTEX_3F:
glVertex3fv((GLfloat *)(dlptr+1));
dlptr += 4;
break;

}
}

The sgi sample implementation looks very much like that if I recall correctly.

The fact that commands are placed into display lists doesn’t mean that the CPU doesn’t have to do anything with them.

Often times, commands (particularly state changing commands) need CPU intervention. There are a number of things that drivers may have to worry about – queries, interaction of various state configurations on how the hardware is programmed, and what not.

Even if a display list could be fully eaten by graphics hardware, CPU intervention may be desirable. Some “name-brand” OpenGL apps do things like query matrices or other state fairly frequently. If all the commands were eaten by hardware, implementing a query basically require a Finish() so the CPU can get the proper state from the hardware.

A couple other considerations:

(1) A 100% CPU utilitization metric means very little by itself. If your immediate mode code runs at 50 fps, while your DL code runs at 100 fps, the DL code uses far less CPU per frame.

(2) The 50% kernel time may be system idle time, where the CPU is just plain bored. If that’s the case (don’t know from your data), tou may be hitting some other bottleneck where your app is limited by transform, setup, or fill-rate characteristics of your GPU. Or some other limiting factor. When all is said and done, on most devices the main job of the CPU is to tell the GPU what to do. (On software T&L devices, the CPU does transform before telling the “GPU” what to do.) If the GPU keeps getting further and further behind, it’s pointless for the CPU to give it more work.

I’m going to have to try that last analogy on my manager…

Hope this helps,
Pat

i just wanna point out that even dragging a window around causes the cpu usage to jump… plus, as long the card as can handle everything you throw at it, you will be working all the time to keep it happy (so you get higher fps).

[edit]50% kernel time when using display lists is cos some part of them is executed in software (in the ring 0, kernel mode driver); the cpu time for the list execution is now being shared between user- and kernel-mode code…

[This message has been edited by mattc (edited 10-30-2002).]