Hello to whole GL community,
This post is not a question, but a try to help beginners to take an overview of OpenGL performance and to demystify some of common delusions. Everything written here is related to NVIDIA's implementation in 19x.xx drivers for Windows. I'll be glad if you would take a part and broaden this overview to other vendors (AMD/ATI at first place).
The first three delusions I'll comment are:
1. Using OpenGL 3.2 Core Profile significantly boosts the application speed
2. Shaders are faster than fixed functionality
3. The Bindless Graphics boosts up speed to the order of magnitude
Some months ago, I have read on one of the posts that GL 3.2 Core Profile enables much cheaper function-calls. I was very excited with that, and that was prime reason to switch to a "new technology". But, after some time spent in "porting" an application to GL 3.2, I have realized that the boost is meaningless small, if it exists at all. A few percents of change were due to code reorganization, but not from cheaper function-calls. To be more direct: Using OpenGL 3.2 Core Profile on NVIDIA currently does not change the speed even a bit! Of course, my intent is not to turn you from the new programming model. Further more, I, personally, am using it. But just to emphasize that switching to GL 3.2 Core does not mean any speed boost. I have measured the speed of execution of glMultiDrawElements() function, and noticed no speed change, as well as the frame-rate of the application.
Because fixed functionality is still supported (no matter how it is implemented), it is unlikely that those functions take other paths through the pipeline that the shaders do. It is also unlikely that you can better implement any of the functionality than the system developers can. Further more, shaders are usually used for extending standard functionality, which means more computation in the shaders. To be more direct: Shaders are at least as slow as fixed functionality if not slower! On the other hand, we can use some tricks and skip some calculations to boost the speed of shaders, but the implementation of the full fixed functionality in shaders will certainly be slower or in the best case equal to standard fixed functionality.
The Bindless Graphics is one of the best thing happened in the OpenGL world in the previous year. Porting application to it was a very pleasant experience (although until I have resolved some bugs in my application, some very severe application crushes happened). Because I'm using (tens of) thousands VBOs in the scene, I thought the Bindless extensions were something I had to try, and I haven't regret. With 65025 VBOs in the scene (and from 600K to 10M triangles), the speed gain was from 50% to 70%. The greatest speed gain achieved with bindless extensions (in all test cases) was 2 times. Although I didn't achieved 7.5x, with "just" 1.5-2x I'm very satisfied. (For less than 1K VBOs there is no speed gain at all). Another great feature of Bindless extensions is a support for both fixed functionality and shaders.
Table 1. shows results of the testing on the textured and lighted terrain. The values in the gray columns are pseudo frame-rates (reciprocal value of the rendering time), and the greater values are better. Values in the yellow columns are speed gain factors.
Table 2. shows how triangles and VBOs count maps to LODx and block size(64, 128 or 256) values.