OS X OpenGL performance

Hi!

I am intrigued about the OpenGL performance in OS X. Somehow, I get very bad performance, much lower than when using the corresponding hardware under Windows and/or Linux.

After trying several MACs, I am starting to believe it may be an OS/driver problem. This can be seen by running trispd.c, a nice benchmark written by B. Paul that renders triangles of similar size by using large triangle strips.

(The code is part of Mesa, but for easy of access, I placed a copy, including a Mac OS X binary, under http://www.cse.ogi.edu/~csilva/trispd)..)

On a PB G4 I get 2.7 Mtri/s for “-size 50” and even with “-size 1” the triangle rate is no higher than 2.8 Mtri/s. The pixel performance seems fine though. The maximum I get is around 400 Mpixel/s.

Is this a known problem ? Is it supposed to be solved in the next release of the operating system ?

Thanks,

Claudio.

As far as I know, there are no known performance problems with “standard” paths through OpenGL.

I’ve always found OpenGL performance on MacOSX to feel (ie subjectively) better than on PCs with similar graphics hardware.

There are some performance problems with glReadPixels and display lists in 10.1; my understanding is that those are vastly improved for 10.2.

The MacOSX GLUT implementation calls the display function twice per idle if idle calls glutPostRedisplay.

It looks like that demo is testing immediate mode submission speeds (!), so you’re probably actually profiling the cost of calling an OpenGL function rather than the cost of rendering.

Nobody’s pretending that calling OpenGL functions isn’t expensive; looks like it’s more expensive on PPC/Mach0 than on either Linux or Windows. That doesn’t surprise me.

Nobody wanting performance would ever actually use immediate mode for the bulk of their rendering, though, so the benchmark is meaningless.

I have a G4733/GF2MX here; that program is reporting 1.7MTris/s where I know for a fact that it is possible to get more than 6MTris/s on the same hardware (I’ve done it).

That theory is borne out by further experimentation – by adding
#include <OpenGL/OpenGL.h>
#include <OpenGL/CGLMacro.h>
to the top of the file, and
CGLContextObj CGL_MACRO_CONTEXT = CGLGetCurrentContext();
to the top of every function making OpenGL calls (thus eliminating a significant part of the function-call overhead), I increased my performance according to the program to 2.4MTris/s.

Adding -O3 -funroll-loops got me to 2.5MTris/s

All that means is that basically, this isn’t a test of OpenGL performance; it’s a test of overhead in calling an OpenGL function, and to a lesser extent, a test of the compiler.

All that means is that basically, this isn’t a test of OpenGL performance; it’s a test of overhead in calling an OpenGL function, and to a lesser extent, a test of the compiler.[/b]

I get your point, but this is still quite disappointing. Using linux with a GF3, I’ve been able to see rates of over 12 Mtri/s on an AMD 900Mhz PC. In my opinion, it should not be necessary to re-write all of one’s code to achieve decent performance on OS X.

How did you get 6 Mtri/s ? Using vertex arrays ?

Thanks,

Claudio.

It shouldn’t be a rewrite – you should be using vertex arrays/display lists already. Immediate mode is for low-geometry situations or quick experimentation only.

My peak performance has been achieved with pre-release versions of 10.2, so I don’t know whether that kind of performance is possible with 10.1.

It’s not in Apple’s interests to spend their limited resources optimizing for a case that will never see heavy use in a shipping, commercial product.

Although I see your point, and I appreciate your help, I don’t really agree with you with regards to immediate mode performance. I believe that immediate mode is quite important, and something that is worth optimizing for. I would certainly encourage Apple and their vendors to optimize that path.

Also, my further experimentation does not indicate that other OpenGL paths are optimized either. For instance, I changed the code to use display lists, and although it got faster by an impressive 50%, that only took me to 4Mtri/s on 10.1.5.

Anyway, it seems 10.2 would probably be better optimized, since OpenGL is such a major part of it. I am looking forward to trying it out.

Also, I think all of us should pressure Apple to really optimize the performance, and improve the quality of their drivers. I really like their machines, and I hope to use them for serious development.

Claudio.

In order to close this thread, I just wanted to say that I got a nice message from people at Apple that explained the performance issues related to driving OpenGL at full speeds. The information correlates quite well with the previous explanation (from OneSadCookie) that the problem is the overhead of making the immediate mode calls. One good news is that there will be more efficient mechanisms for driving the 3D graphics hardware in 10.2 through the use of GL_APPLE_vertex_array_range.

The bottom line is that it is possible to achieve supert 3D graphics performance on OS X, although the techniques might be slightly different than those available elsewhere.

Finally, I plan to update the trispd.c code when 10.2 becomes available, and post it here for reference.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.