Why wouldn’t the driver be able to do frustum culling for display lists?
Because, in particular with vertex programs, vertices may not be transformed the way you expect them to, for one. The position data with a vertex program is just a group of floats; there is no guarentee that the user isn’t doing something silly like using them for texture coordinates.
Granted, that’s an outside chance, but it is very reasonably possible that the user isn’t using the OpenGL modelview or projection matrices with a vertex program. Certainly, they aren’t going to run a vertex program on the data on the CPU to determine where the output positions come out.
Of course, the driver can just turn off this frustum culling if the user binds a vertex program, but then I’d be at least partially correct. And, as we choose to do more vertex-program based operations, optimizations for the non-vertex program case become less and less important.
Driver issues aside, indexed VBOs can take advantage of the post T&L vertex cache.
Technically, there’s no reason why indexed data rendered into a display list can’t use the post-T&L cache either. After all, the display list code could easily just create a VBO itself.
That doesn’t mean that the post-T&L cache is being used in current DL implementations. It just means that, technically, it should be possible.
Old Nvidia cards took a lot of cycles rendering DLs. Newer Nvidia cards and ATI cards took very, very few cycles.
Were these geometry-only display lists, or did they include state changes as well?