1k triangles at 60fps - acceptable?
Yesterday I've played around a little bit with OpenGL and one thing astounded me: I can only draw 1k triangles before I reach 60fps border. That's only 60k triangles per second, not much in my book.
Is this acceptable/expected performance and I should strive to reduce the number of vertices that are processed? Or I am doing something terribly wrong with the API?
* Vertices in buffer: 3000
* Vertices type: 3 x GL_FLOAT
* Draw method: glDrawArrays (no indexing)
* Buffer usage type: GL_STATIC_DRAW (not updated every frame, obviously)
* Shaders: simplistic (pass through + constant color)
* Textures: no
* Multisampling: no
* Blending: no
* Depth: no
* Explicit synchronization: no
* GPU Perf Monitor CPU time: 0.2%
* GPU Perf Monitor GPU time: 99%
* GPU Perf Monitor GPU utilization: low!
* Windowed: yes (640x480)
* Machine: Radeon HD 4850 (512 MB), AMD Phenom X2, 6GB
* OpenGL version: Core OpenGL 3.3
Last edited by red1939; 08-23-2013 at 05:12 AM.
Reason: Too small vertices count.
It depends on what else you're doing. There is absolutely no problem dropping the performance of your program way below 60 fps with only a screen-aligned quad which consists of only two triangles.
However, given your hardware setup and your obviously trivial shader code and very low resolution 60k seems ridiculously low. Are you sure you are above 60 fps at any time and not capped by vsync the entire time?
The problem is that I am not doing anything at all: no application logic, empty (trivial) shaders, no multisampling, etc.
It's not capped at 60 fps (I believe), as when I decrease the number of triangles, the fps goes higher.
I am using statically linked glfw v.3.0 and glew 1.1.0, together - of course - with opengl32.lib.
The only hint I could get from GPU Perf is that the almost all of the GPU time is taken by so called "Interpolator". Sure, the triangles are huge as a whole window, but it doesn't explain such low performance.
Actually, the visible area covered by primitives does matter a great deal. The more fragments are generated, the more interpolation has to take place. With a growing number of exports (i.e. the stuff you pass out of one stage into another, e.g. vertex shader -> fragment shader) this overhead increases - not to mention, there are values that are always interpolated, such as the depth value. Plus, the more fragments, the more fragment shader invocations. (Probably not a problem in your case though).
Just as a comparison: If I'm not mistaken, a few years back I was able to yank around 3M vertices through a GeForce 8600M GS at approx. 30fps - which, even at the time, was pretty crappy hardware. Also with very simple shaders. Can post some code? Shaders, state inits, rendering loop? Do you actually have the depth test disabled?
Have you tried some OpenGL based game to see if you get bad performance there?
I've tried with the depth test disable/enabled, but I don't see any real difference in terms of performance impact, also I didn't seem to have too much problems with OpenGL titles.
As the lovely anti-link prevention system blocks me from sending urls, you will have to do manually create pastebin links from these:
Oh, it does. 640x480 x 500 quads x 60 fps = 9.2e9 pixels/second.
Originally Posted by red1939
That's 28 Gbyte/sec at 3 bytes/pixel or 37 Gbyte/sec at 4 bytes/pixel; depending upon the width of the memory bus and the clock rate that could realistically be saturating the memory bandwidth. That could explain why the "GPU utilization" says "low". Early-depth optimisation won't help in this case, as you'd just be replacing writes to the colour buffer with reads from the depth buffer.
If you're comparing the triangle counts against a game, games aren't drawing a thousand 640x480 triangles per frame.
I see, so in other words, a more realistic scenario would be to draw these triangles in some distance, or at least smaller?
Well, the bigger question is why you're drawing so many triangles, all of which are overlapping, that close to the screen? Or more to the point: what are you drawing?
Yes. A more realistic test would keep the overdraw (the average number of times any given pixel is drawn) in single figures.
Originally Posted by red1939
Real programs make some effort to ignore parts which can't be seen, so the total fill rate is limited to some low multiple of the number of screen pixels.