1k triangles at 60fps - acceptable?

Hello everyone!

Yesterday I’ve played around a little bit with OpenGL and one thing astounded me: I can only draw 1k triangles before I reach 60fps border. That’s only 60k triangles per second, not much in my book.

Is this acceptable/expected performance and I should strive to reduce the number of vertices that are processed? Or I am doing something terribly wrong with the API?

Details:

  • Vertices in buffer: 3000
  • Vertices type: 3 x GL_FLOAT
  • Draw method: glDrawArrays (no indexing)
  • Buffer usage type: GL_STATIC_DRAW (not updated every frame, obviously)
  • Shaders: simplistic (pass through + constant color)
  • Textures: no
  • Multisampling: no
  • Blending: no
  • Depth: no
  • Explicit synchronization: no
  • GPU Perf Monitor CPU time: 0.2%
  • GPU Perf Monitor GPU time: 99%
  • GPU Perf Monitor GPU utilization: low!
  • Windowed: yes (640x480)
  • Machine: Radeon HD 4850 (512 MB), AMD Phenom X2, 6GB
  • OpenGL version: Core OpenGL 3.3

It depends on what else you’re doing. There is absolutely no problem dropping the performance of your program way below 60 fps with only a screen-aligned quad which consists of only two triangles.

However, given your hardware setup and your obviously trivial shader code and very low resolution 60k seems ridiculously low. Are you sure you are above 60 fps at any time and not capped by vsync the entire time?

The problem is that I am not doing anything at all: no application logic, empty (trivial) shaders, no multisampling, etc.

It’s not capped at 60 fps (I believe), as when I decrease the number of triangles, the fps goes higher.

I am using statically linked glfw v.3.0 and glew 1.1.0, together - of course - with opengl32.lib.

The only hint I could get from GPU Perf is that the almost all of the GPU time is taken by so called “Interpolator”. Sure, the triangles are huge as a whole window, but it doesn’t explain such low performance.

Actually, the visible area covered by primitives does matter a great deal. The more fragments are generated, the more interpolation has to take place. With a growing number of exports (i.e. the stuff you pass out of one stage into another, e.g. vertex shader -> fragment shader) this overhead increases - not to mention, there are values that are always interpolated, such as the depth value. Plus, the more fragments, the more fragment shader invocations. (Probably not a problem in your case though).

Just as a comparison: If I’m not mistaken, a few years back I was able to yank around 3M vertices through a GeForce 8600M GS at approx. 30fps - which, even at the time, was pretty crappy hardware. Also with very simple shaders. Can post some code? Shaders, state inits, rendering loop? Do you actually have the depth test disabled?

Have you tried some OpenGL based game to see if you get bad performance there?

I’ve tried with the depth test disable/enabled, but I don’t see any real difference in terms of performance impact, also I didn’t seem to have too much problems with OpenGL titles.

As the lovely anti-link prevention system blocks me from sending urls, you will have to do manually create pastebin links from these:
pastebin.com/rUHYYmBd main.cpp
pastebin.com/yzZJ19xx Context.hpp
Context.cpp - Pastebin.com Context.cpp
vs.glsl + ps.glsl - Pastebin.com Shaders

Oh, it does. 640x480 x 500 quads x 60 fps = 9.2e9 pixels/second.

That’s 28 Gbyte/sec at 3 bytes/pixel or 37 Gbyte/sec at 4 bytes/pixel; depending upon the width of the memory bus and the clock rate that could realistically be saturating the memory bandwidth. That could explain why the “GPU utilization” says “low”. Early-depth optimisation won’t help in this case, as you’d just be replacing writes to the colour buffer with reads from the depth buffer.

If you’re comparing the triangle counts against a game, games aren’t drawing a thousand 640x480 triangles per frame.

I see, so in other words, a more realistic scenario would be to draw these triangles in some distance, or at least smaller?

Well, the bigger question is why you’re drawing so many triangles, all of which are overlapping, that close to the screen? Or more to the point: what are you drawing?

Yes. A more realistic test would keep the overdraw (the average number of times any given pixel is drawn) in single figures.

Real programs make some effort to ignore parts which can’t be seen, so the total fill rate is limited to some low multiple of the number of screen pixels.

I am just trying to prepare a simple framework (i.e. base-point) for further performance analysis of various shader code, GL settings, draw methods, etc.

Yes. Small triangles will be vertex-limited. But large triangles will be fragment-limited. One triangle has 3 vertices, but up to 1 million fragments, each of which takes separate attention from the gpu. Try making tiny triangles, and see if the speed shoots up.

Your application will be limited by vertex processing if the overhead of vertex processing exceeds any other form of processing done by the GL or the application. Saying “small triangles will be vertex-limited” is kind of nonsense. And even if your triangles are small, the likelihood of becoming limited by vertex processing for a few hundred triangles is still very low unless your application is very, very trivial and your fragment shaders do absolutely nothing than export a constant color. You cannot be sure of anything unless you get hard numbers, especially when stuff seems to be trivial.

Why up to 1 million? What if the triangle is large enough that it simply cover the whole screen after clipping? Does a full-HD fragment buffer only consist of 1 mio pixels? No.

[QUOTE=thokra;1254477]…
Why up to 1 million? What if the triangle is large enough that it simply cover the whole screen after clipping? Does a full-HD fragment buffer only consist of 1 mio pixels? No.[/QUOTE]

Oh. Newbie question: What is the maximum number of pixels a triangle could put to a “full-HD fragment buffer”? -thanks

A full-HD buffer usually has a resolution of 1920 x 1080 pixels. That’s a little higher than a million.

Are you using 1000 draw calls?

Nope. 1000 x 3 triangles via one DrawArrays.