yang11,
Your impression of how display lists work is somewhat incorrect. They are pre-compiled streams, but they are not necessarily compiled in a way that no CPU work is required.
GL implementations are completely free to implement display lists as a list of opcode/data groups (like GLX protocol) and play them back like:
unsigned int *dlptr;
while (*dlptr != END_OF_LIST) {
switch (*dlptr) {
case DLIST_GL_BEGIN:
glBegin(dlptr[1]);
dlptr += 8;
break;
case DLIST_GL_VERTEX_3F:
glVertex3fv((GLfloat *)(dlptr+1));
dlptr += 4;
break;
…
}
}
The sgi sample implementation looks very much like that if I recall correctly.
The fact that commands are placed into display lists doesn’t mean that the CPU doesn’t have to do anything with them.
Often times, commands (particularly state changing commands) need CPU intervention. There are a number of things that drivers may have to worry about – queries, interaction of various state configurations on how the hardware is programmed, and what not.
Even if a display list could be fully eaten by graphics hardware, CPU intervention may be desirable. Some “name-brand” OpenGL apps do things like query matrices or other state fairly frequently. If all the commands were eaten by hardware, implementing a query basically require a Finish() so the CPU can get the proper state from the hardware.
A couple other considerations:
(1) A 100% CPU utilitization metric means very little by itself. If your immediate mode code runs at 50 fps, while your DL code runs at 100 fps, the DL code uses far less CPU per frame.
(2) The 50% kernel time may be system idle time, where the CPU is just plain bored. If that’s the case (don’t know from your data), tou may be hitting some other bottleneck where your app is limited by transform, setup, or fill-rate characteristics of your GPU. Or some other limiting factor. When all is said and done, on most devices the main job of the CPU is to tell the GPU what to do. (On software T&L devices, the CPU does transform before telling the “GPU” what to do.) If the GPU keeps getting further and further behind, it’s pointless for the CPU to give it more work.
I’m going to have to try that last analogy on my manager…
Hope this helps,
Pat