About a decade ago, I tried as an experiment to use C++ inlining to re-encapsulate the X-Window API, taking a couple of levels out, along with lots of redundant error checking. The result was that performance rose by a modest 10%, but CPU involvement of the client dropped a lot (probably about 50% as I recall).
Out of curiosity, I built Mesa 5.1 on my Linux box with debugging and stepped through to see what the average glVertex and glColor calls would generate. My conclusion is that by providing some higher-level object-oriented calls, you could flatten out some of the call hierarchy, eliminate redundant error checking, flag-setting, etc. and probably speed the client by a factor of 5, as the amount of manipulation in OpenGL per unit operation is quite a bit higher than X.I'm assuming you are creating a quad or polygon, and that you are setting colors on the vertices. There are enough calls in the sequence that there is a lot of room to squeeze. It's possible that the array-based stride functions provide a way to set the attributes, avoiding the overhead in a similar way, I intend to take a look.
The question is, would this have a significant impact on the throughput of the system? It's complex enough that I don't fully understand the call sequence yet, so I thought I would just post this to get anyone's opinion. If the bottleneck is at rendering, then all optimization would do is dramatically lower CPU utilization and slightly increase performance.
I was also curious if there are any resources explaining the limitations and performance characteristics of graphics cards and of Mesa. In the literature from hardware vendors, you see numbers like 50 million polygons/second, but when I build a simple system with 5 spheres, each with 48x48 quads,on an NVIDIA FX 5200 card it doesn't seem very fast -- the only place there is a sleep in my code is in the keyboard routine, I assume the display is automatically limited by the refresh rate, not the polygons?
On the other hand, the card has 128Mb on it, but when I tried to map a 10Mb texture, it crashes. When I reduced the bitmap to ~1.5Mb, it worked. I haven't zeroed in on the limit. Does anyone know what it is? Then, mapping a 1.5Mb image on a sphere, the refresh rate noticably slows down. The window is 1280x1024 for this manipulation, and clearly the size of the window has something to do with it.