Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 5 of 5

Thread: How to assess performance of OpenGL program?

  1. #1
    Junior Member Newbie
    Join Date
    Jun 2012
    Location
    London, UK
    Posts
    15

    How to assess performance of OpenGL program?

    Hello Gurus,

    I am learning OpenGL, with a hope of employment in the games industry.

    I think I am getting to grips with it, including storing standard geometries on the GPU & re-using them, same for texture maps, varying level of detail by depth, drawing objects ordered by shader & texture etc. I am getting to grips with off-screen drawing for shadows & glows.

    My question: how can I assess the performance of my programs? I have a laptop (Intel HD Integrated graphics) and a desktop (NVidea GeForce GPU), I can look up specs, but I have no idea how to tell if I am getting anywhere near the performance envelope of the chips or if I have missed something & still have an inefficient implementation (& therefore still have things to learn ..)

    Any suggestions? I don't feel I need to wring every last ounce of performance out, but it would be nice to know that I am in the right order of magnitude of throughput / framerate.

    Any suggestions please ?

    Thanks

  2. #2
    Senior Member OpenGL Pro
    Join Date
    Jan 2012
    Location
    Australia
    Posts
    1,097
    for nVidia with Visual Studio 2008/2010 you can use Nsight from nVidia

  3. #3
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,123
    Quote Originally Posted by 2Wheels View Post
    My question: how can I assess the performance of my programs? I have a laptop (Intel HD Integrated graphics) and a desktop (NVidea GeForce GPU). I can look up specs, but I have no idea how to tell if I am getting anywhere near the performance envelope of the chips or if I have missed something & still have an inefficient implementation...
    Well, part of learning to optimize GPU rendering is getting familiar with the various ways you can get bottlenecked and what to do about them. You can get bottlenecked on your app side (solution: profile/optimize your app-side code). You can get bottlenecked doing state changes in the GL driver (solution: do more with fewer state changes). You can get bottlenecked on the GPU for various reasons. etc. And your bottlenecks will change over the course of a frame.

    Re your NVidia GeForce desktop GPU, which GPU? As an example of a good "performance wall" to see if you can get close to with your rendering code (and one way you can get bottlenecked on the GPU), NVidia GeForce GPUs have a limit on the number of tris (triangles) they can setup per GPU core clock cycle. IIRC with anything GTX285 and newer, you get one tri setup per clock max for standard rasterization. Also, with Fermi+ (e.g. GTX4xx+), you seem to have near-free triangle frustum culling so those tris apparently don't count toward the limit. So assuming your GeForce GPU is recent... take your GPU core clock rate, and that's about how many std tris/sec you can push through the card.

    Now render a non-trivial scene start to finish and time it (with glFinish() inside the beginning and ending of your timing interval to ensure there's no stray GPU work leaking in/out). Note the tri count you throw at the GPU over that interval. Compute tris/sec over the interval, and divide that by the theoretical max tris/sec for your GPU that we established above. That'll give you a sense of what percentage of maximum GPU triangle throughput you're utilizing. Note: that's a separate question from whether you really need to be sending all those tris down the pipe in the first place, but it's a useful GPU utilization benchmark.

    Also note that on Fermi+ (GTX480+) you can push 4 tris/clock with tesselation (and you allegedly can hit this rate even with std triangle rasterization on Fermi+ Quadros). But if you're just doing std tri rasterization on a GeForce GTX285+, 1 tri/clock is a good benchmark to compare against.
    Last edited by Dark Photon; 04-11-2013 at 07:19 PM.

  4. #4
    Junior Member Newbie
    Join Date
    Jun 2012
    Location
    London, UK
    Posts
    15

    Thanks

    Quote Originally Posted by Dark Photon View Post
    Well, part of learning to optimize GPU rendering is getting familiar with the various ways you can get bottlenecked and what to do about them. You can get bottlenecked on your app side (solution: profile/optimize your app-side code). You can get bottlenecked doing state changes in the GL driver (solution: do more with fewer state changes). You can get bottlenecked on the GPU for various reasons. etc. And your bottlenecks will change over the course of a frame.

    Re your NVidia GeForce desktop GPU, which GPU? As an example of a good "performance wall" to see if you can get close to with your rendering code (and one way you can get bottlenecked on the GPU), NVidia GeForce GPUs have a limit on the number of tris (triangles) they can setup per GPU core clock cycle. IIRC with anything GTX285 and newer, you get one tri setup per clock max for standard rasterization. Also, with Fermi+ (e.g. GTX4xx+), you seem to have near-free triangle frustum culling so those tris apparently don't count toward the limit. So assuming your GeForce GPU is recent... take your GPU core clock rate, and that's about how many std tris/sec you can push through the card.

    Now render a non-trivial scene start to finish and time it (with glFinish() inside the beginning and ending of your timing interval to ensure there's no stray GPU work leaking in/out). Note the tri count you throw at the GPU over that interval. Compute tris/sec over the interval, and divide that by the theoretical max tris/sec for your GPU that we established above. That'll give you a sense of what percentage of maximum GPU triangle throughput you're utilizing. Note: that's a separate question from whether you really need to be sending all those tris down the pipe in the first place, but it's a useful GPU utilization benchmark.
    ...
    Great reply ... a simple number like triangles compared to clock speed is just what I meant. Thanks very much. I have a GTX440 btw.

  5. #5
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,123
    Quote Originally Posted by 2Wheels View Post
    Great reply ... a simple number like triangles compared to clock speed is just what I meant. Thanks very much. I have a GTX440 btw.
    GTX440? I don't think there is such a beast, is there? Maybe you mean GT440. To see which you have, bring up "nvidia-settings" (or NVidia control panel). The GPU 0 tab will tell you what you have. Further, select the PowerMizer tab, and look at the highest clock rate in the Graphics Clock column to get your GPU core clock.

    If it is a GT440, looks like there are two versions: retail (810 MHz core freq) and OEM (594 MHz core freq). Assuming you're running SYNC_TO_VBLANK with a 60Hz LCD monitor in-tow, then you're talking theoretical max throughput for std tri rasterization of about 13.5 Mtris/frame (retail) or 9.9 Mtris/frame (OEM).
    Last edited by Dark Photon; 04-13-2013 at 05:19 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •