Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 9 of 9

Thread: Unified API for GPU performance counters/queries/monitors

  1. #1
    Junior Member Newbie
    Join Date
    Jul 2013
    Posts
    6

    Unified API for GPU performance counters/queries/monitors

    Hello all.
    It would be nice to have unified API for performance counters in OpenGL. It can help to tune applications and write cross-vendor tools to profile OpenGL programs.
    Today we have different vendor specific extensions from AMD (AMD_performance_monitors) and Intel (INTEL_performance_query) and NVidia provides perf counters via NVPerfKit what works only on Windows.
    Hardware is very different, yes, but maybe we can provide common interface (based on existing extensions, for example) similar to ARB_texture_compression and/or ARB_get_program_binary and query supported counters in runtime with glGetIntegerv?

    Something like this:
    Code c:
    GLuint total;
    glGetIntegerv(GL_NUM_PERFQUERY_COUNTERS, &total);
     
    GLuint counters[total];
    glGetIntegerv(GL_PERFQUERY_COUNTERS, counters);
     
    GLuint perf;
    glGenPerfQueries(1, &perf);
    glBeginPerfQuery(perf);
    // opengl calls here...
    glEndPerfQuery(perf);
     
    // for loop on available counters array to find index of some interesting counter here...
     
    GLenum type;
    glGetPerfQueryCounterType(perf, counters[required_index], &type);
     
    GLsizei length;
    glGetPerfQueryCounterLength(perf, counters[required_index], &length)
     
    GLuint data_uint[length];
    GLfloat data_float[length];
     
    switch (type) {
    case GL_UNSIGNED_INTEGER:
        glGetPerfQueryCounterData(perf, counters[required_index], data_uint);
        break;
    case GL_FLOAT:
        glGetPerfQueryCounterData(perf, counters[required_index], data_float);
        break;
    // other cases here...
    }
     
    glDeletePerfQueries(1, &perf);
     
    // do something with data

  2. #2
    Junior Member Newbie
    Join Date
    Jul 2013
    Posts
    6
    I spent a little time studying the AMD(spec) and Intel(spec) extensions and seems Intel's perfquery extension can be easy mapped into AMD's perfmon. I have also found what Intel support AMD_performance_monitor in their open source mesa driver for Linux instead of own extension (http://www.phoronix.com/scan.php?pag...tem&px=MTUyMjQ) and seems to Qualcomm supports it too, with some additions(spec). I'm not hardware guy, but seems AMD perfmon extension is a good start.

    Also I hope if performance monitor extension (or similar) goes to core it will be available in OpenGL ES too (and maybe in OpenCL too). If every vendor starts to write each own perf extension from scratch... I don't think what fragmentation is a true way. Hope this helps to write (or upgrade already available) OpenGL/GLES/CL-related tools and hope vendors provide as much information as is possible.

  3. #3
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,144
    Quote Originally Posted by oh-la-la View Post
    I spent a little time studying the AMD(spec) and Intel(spec) extensions and seems Intel's perfquery extension can be easy mapped into AMD's perfmon... I'm not hardware guy, but seems AMD perfmon extension is a good start.
    I completely agree that having a unified performance monitoring API could be very useful. So, you have my vote.

    But, instead of reading AMD_performance_monitor, did you try to use it?
    Try and you'll find it totally useless (at least I found it two years ago). Take a look at the post. Even AMD discourages usage of that extension by hiding meaning/names of the counters. Also take a look at the status of the extension.

    Did you try Intel's extension? I didn't, and I'm not sure whether it is supported and in which drivers. I saw it for the first time in December last year, but at that time it was not supported in the newest HD2500 drivers (if I remember correctly).

  4. #4
    Super Moderator Frequent Contributor Groovounet's Avatar
    Join Date
    Jul 2004
    Posts
    934
    AMD doesn't hide the counters. The counter can't be specify because the counters are different even between a Radeon 7800 and a 7900.

    The counters reflect hardware blocs.

    Even with a standardize extension, it will take per vendor efforts to get something useful out of them.

    Regardless, I agree would be nice to have such extension.

  5. #5
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,144
    Quote Originally Posted by Groovounet View Post
    AMD doesn't hide the counters. The counter can't be specify because the counters are different even between a Radeon 7800 and a 7900.
    Why then there is no way to retrieve some meaningful names for each ID?
    It is not a problem to have different counters in different hardware, but there should be a way to know what they mean.

  6. #6
    Advanced Member Frequent Contributor arekkusu's Avatar
    Join Date
    Nov 2003
    Posts
    782
    Quote Originally Posted by Aleksandar View Post
    Why then there is no way to retrieve some meaningful names for each ID?
    Retrieving the names is not the same thing as retrieving the meaning.

    If you program a bunch of counter-collection logic and then actually use it to adjust your workload dynamically, what happens when your application runs on a brand new driver for the first time and encounters a bunch of completely different counter names?

    Specific low-level performance counters should be reacted to in a debug/design session, where you have appropriate documentation explaining what the counters mean for that specific driver. Not as a baked-in runtime query. And then you end up performance tuning different performance aspects on every driver.



    For an analog, take a look at some low-level Intel CPU performance counters. Now try using them on a CPU from three years ago, or three years from now.



    The ARB_texture_compression example pointed out in the OP is actually a good example of how not to do this. So you get query-able properties, like NUM_COMPRESSED_TEXTURE_FORMATS, and COMPRESSED_TEXTURE_FORMATS. Great. Other than listing them in a GPU-Info page, what do you do with that? If COMPRESSED_TEXTURE_FORMATS returns 0xDEADBEEF, do you trust that enum as a run-time compression format for your artwork? Not without reading the extension that introduces that enum, and understanding the artifacts introduced by that particular compression scheme!

  7. #7
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128
    Quote Originally Posted by Grouvoonet
    Even with a standardize extension, it will take per vendor efforts to get something useful out of them.
    Isn't that the case with OpenGL in general?

    But seriously, it takes some effort to figure out which counters are exposed in hardware and if a meaningful mapping between vendors is possible so an extension can provide a standardized enum to identify the counter.

    Quote Originally Posted by Grouvoonet
    The counter can't be specify because the counters are different even between a Radeon 7800 and a 7900.
    Why is that? I can understand that a 7800 may expose less counters, but different counters? Could you be more specific as to what different means? Also, what is the difference between the 7000-series counter VSBusy and the universally supported counter ShaderBusyVS.

    Quote Originally Posted by arekkusu
    Retrieving the names is not the same thing as retrieving the meaning.
    Intuitively I'd say, the semantics of some counters and the very existence of the counter is so obvious and essential that is doesn't (or shouldn't) change - independent of the chipset and vendor. For instance, every single vendor wants to provide a counter indicating how much time vertex shading takes, or how busy the ALU or the tex units are, or how many cache hits/misses you got. Why wouldn't it be possible for vendors to simply agree on calling the corresponding counters VS_BUSY, ALU_BUSY, CACHE_MISSES and so on ... ? How they handle such names internally is a completely different matter, of course.

    There are so many common concepts, it shouldn't be a problem to at least come up with counters for the most common subset.

    Quote Originally Posted by arrekusu
    Now try using them on a CPU from three years ago, or three years from now.
    The temporal argument isn't really valid, IMHO, since we'd start with hardware that is [i]definitely[i] exposing some counters. For instance, GPUPerfApi returns a status value indicating whether the counter in question is available. There is no reason OpenGL couldn't provide such an API once a uniquely identifyable subset of counters has been established. It's the application developers responsibility to keep up with the GPU features when supporting so means of performance measurement - however, I'd much rather learn about a load of new counters added in version (n + 1) relative to version n than check out three different docs to be able to check out counter for Intel, AMD and NVIDIA. It's simply a huge pain in the rear.

  8. #8
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    592
    This is my two cents on performance counters:
    1. Performance counters should be used by tools, not end applications, for the purpose of optimizing an application
    2. Hardware varies a great deal, what counters are available and what exactly they are counting really depends on the hardware. Some hardware had unified shaders archs, others do not. Some hardware might implement certain fixed function bits in dedicated hardware other might tag at the end of shaders. And so on.


    To that end a performance query API that exposes:
    • For each counter, it's type (int, float, etc)
    • Name of counters for a simple short description
    • Long description that tries to explain what the counter is counting


    In that light the Intel extension does the above. Also, there are patches in flight (i.e. not accepted yet) adding the feature to Mesa and then quite likely i965, the Intel DRI Mesa driver.

  9. #9
    Junior Member Newbie
    Join Date
    Nov 2013
    Posts
    8
    What I'm looking for is simply a counter that can tell me the time between two frames.

    Maybe have a millisecond, ?microsecond? and ?nanosecond? timer and a high precision timer.
    (Not sure if the above should be seperate timers or consolidated into fewer timers.)
    And a timer for very long time (hours - years) for e.g. off screen rendering that is non realtime and can take a long time.
    (Programmatically of crouse, hardware can be the same of course.)
    With different precision.
    Use the type that is natural for the counter. Conversion to other types can be done by type conversion.
    Uniform, same for all vendors, simple to use.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •