OpenGL Performance boost and questions

About a decade ago, I tried as an experiment to use C++ inlining to re-encapsulate the X-Window API, taking a couple of levels out, along with lots of redundant error checking. The result was that performance rose by a modest 10%, but CPU involvement of the client dropped a lot (probably about 50% as I recall).

Out of curiosity, I built Mesa 5.1 on my Linux box with debugging and stepped through to see what the average glVertex and glColor calls would generate. My conclusion is that by providing some higher-level object-oriented calls, you could flatten out some of the call hierarchy, eliminate redundant error checking, flag-setting, etc. and probably speed the client by a factor of 5, as the amount of manipulation in OpenGL per unit operation is quite a bit higher than X.I’m assuming you are creating a quad or polygon, and that you are setting colors on the vertices. There are enough calls in the sequence that there is a lot of room to squeeze. It’s possible that the array-based stride functions provide a way to set the attributes, avoiding the overhead in a similar way, I intend to take a look.

The question is, would this have a significant impact on the throughput of the system? It’s complex enough that I don’t fully understand the call sequence yet, so I thought I would just post this to get anyone’s opinion. If the bottleneck is at rendering, then all optimization would do is dramatically lower CPU utilization and slightly increase performance.

I was also curious if there are any resources explaining the limitations and performance characteristics of graphics cards and of Mesa. In the literature from hardware vendors, you see numbers like 50 million polygons/second, but when I build a simple system with 5 spheres, each with 48x48 quads,on an NVIDIA FX 5200 card it doesn’t seem very fast – the only place there is a sleep in my code is in the keyboard routine, I assume the display is automatically limited by the refresh rate, not the polygons?

On the other hand, the card has 128Mb on it, but when I tried to map a 10Mb texture, it crashes. When I reduced the bitmap to ~1.5Mb, it worked. I haven’t zeroed in on the limit. Does anyone know what it is? Then, mapping a 1.5Mb image on a sphere, the refresh rate noticably slows down. The window is 1280x1024 for this manipulation, and clearly the size of the window has something to do with it.

The million polygon/sec numbers for hardware are for the internal pipeline. Sometimes these mean the peak rate the chip could process if it works in an infinitely fast environment.
In practice you can reach those numbers only if you put small geometry where the read is fast and generate no (color mask, culling) or single pixel triangles.
It’s reachable, but normally other bottlenecks add up before and limit the performance.

The monitor refresh has an effect on animation if you sync to the vertical blank during SwapBuffers. Then you cannot get fps greater than the refresh rate.
The refresh rate also effects effective video memory bandwidth. Higher refresh rates and display or color resolutions will result in lower performance if you need a lot of fillrate.

Video memory size is no indication for texture placement by the driver, it could also end up in AGP if the video memory is used for other buffers.
Check the GL_MAX_TEXTURE_SIZE.
If enabling texturing slows down your rendering and window size too, you see a fillrate limit kicking in.
Your wallet told you that the FX5200 is not a high-end board.

There are different dispatching methods now than itterating through vertices. The basic idea being to set the data up in memory in such a way that it can be DMA’d to the card. Ideally you want to be on the card. Look at using VBOs for this. There is a whole distory of improving dispatch performance and eliminating application involvement with vertex data each time it’s sent. Vertex arrays vertex array range etc.

Even in the good old days vertex dispatch was often tuned carefully, for (i = 0; i<numverts; i++){ glVertex4fv(verts[i]) } would have been considered poor code unless you were simply filling a display list. Unrolled loops for various mesh lengths were the weapon of choice.

If you want to understand how Mesa calling works with hardware you’d better look into the DRI (direct rendering infrastructure) but AFAIK it can call through very efficiently and supports efficient dispatch if you use the right calls.

Mesa is non-accelerated. In other words, it’s a software implementation of OpenGL.

If you want to use the hardware of your computer to draw using OpenGL, you need to install the correct drivers and headers. The Nvidia-drivers has with a “–opengl-headers” parameter, for installing header-files.

Follow the guide on nvidia’s homepage for installing drivers on linux.

\hornet

Originally posted by hornet:
Mesa is non-accelerated. In other words, it’s a software implementation of OpenGL.
\hornet

Thanks, I have installed drivers from NVIDIA and thought Mesa was using them. I just found the section in the install guide that describes the library situation.

ldd myapp

is showing me that I am using mesa, not the libGL that I want. I am trying to put NVIDIA’s code first. But I don’t see glut code, is that still the same?

thanks to all for the excellent (and amazingly fast) response.

[This message has been edited by dovkruger (edited 01-16-2004).]

The right way to link to OpenGL and even Mesa based software rendering on Linux (which you get these days if you have no hardware), is to use the standard ABI. Link to the standard gl libs using the standard headers and it’ll just work. The DRI is built into XFree86 with a Mesa based software path AFAIK. Linking direct to Mesa cuts out all of that and calls straight to a software implementation, so you were almost right.

If you really want to start analyzing performance you’d be better off with another card though. The NVIDIA drivers are excellent but they don’t use the standard DRI so you’re really dealing with a black box w.r.t. their implementation. There’s no issue with development or linking though the ABI is identical for all OpenGL.