gpu_performance_state_info

It would be very useful if we could directly read GPU performance state from within the OpenGL. The extension could look like this:

Name Strings

[b]GL_gpu_performance_state_info[/b]

Overview

Modern GPUs usually implement a very aggressive power management policy. In order to have better insight during application profiling we need to know in which performance state a GPU is currently in. Although it can be done through other API, having a native support in OpenGL is more convenient.

New Procedures and Functions

[b]none[/b]

New Tokens

Accepted by the <pname> parameter of GetIntegerv:

    [b]GPU_PERFORMANCE_INFO_GPU_FREQUENCY
    GPU_PERFORMANCE_INFO_MEMORY_FREQUENCY
    GPU_PERFORMANCE_INFO_SHADER_FREQUENCY

    GPU_PERFORMANCE_INFO_GPU_UTILIZATION
    GPU_PERFORMANCE_INFO_FB_UTILIZATION
    GPU_PERFORMANCE_INFO_VID_UTILIZATION[/b]

Everything exposed here is already implemented in NVIDIA drivers, but it is not documented. I wouldn’t mind if it was implemented as a vendor specific extension, although an ARB/EXT would be preferable. In any case having an OpenGL extension would make access to this information cleaner than it is now.

such info is too hardware-specific and does not fit well with abstract api as opengl.

in opengl you can not even ask about the size of the video memory.
even the notion of “video memory” does not exist. and you want to know it’s frequency?

sometimes vendors provide such info by their proprietary means e.g. nvapi

…and the implementation of OpenGL is not hardware specific… :slight_smile:
Please, don’t say that. There are only two real players on the stage (for desktop GL). Both of them have ability to expose frequencies and already have implemented access to that information.

Wrong. The total amount of “video memory” can be read by GL_NVX_gpu_memory_info (GPU_MEMORY_INFO_DEDICATED_VIDMEM_NVX) on NV, or by WGL_AMD_gpu_association (wglGetGPUInfoAMD(…WGL_GPU_RAM_AMD,…) on AMD. There are also other values related to memory state that can be retrieved with GL_ATI_meminfo or GL_NVX_gpu_memory_info.

I know that. Furthermore I’m using that. But, as I’ve already said, it is not documented, requires usage of other APIs, … it is not the part of OpenGL.

The performance state is the last link in the chain to have a whole picture of what is going on inside the GPU (the other two are: performance counters and memory states that are already available).

So it won’t be in the core spec. Who cares.

I think this idea fits well in the category of hardware-specific extensions, like ATI_meminfo or AMD_performance_monitor or AMD_gpu_association.

then its ok

personally i don’t like such extensions which are doomed to remain vendor specific and every time one wants to make use of them, he will have to write per-vendor code paths

So it won’t be in the core spec. Who cares.

Then why is this in the forum labeled “Suggestions for the next release of OpenGL”? Wouldn’t this suggestion make more sense for the drivers forum instead?

A suggestion that may solve both your problem and mine:

It would be helpful if there was a cross-platform way via OpenGL to just tell the GPU to give you max performance (i.e. disable power management).

I think this would solve your problem too, without having to expose specific clock freqs, power state, and etc. Just a hint (GL_MAX_PERFORMANCE_HINT) that says “stop screwing with my clocks and give me full GPU power right now!”.

I say this, having just spend/wasted a few hours at work the last 2 days trying to figure out a whacky frame rate hickup. Every 45 seconds the app would break frame for ~0.5 seconds, and then go back to making frame for the next 45 seconds. Eyepoint was fixed, same frustum every frame, no loading, nothing really changing at all that might cause this. Odd. This was NOT on a laptop, but on a rack-mount server system running GeForce 9800GTs.

Naturally, I presumed it was something we were doing (culling, drawing, message processing, fighting for the single-core processor (yeah, old system) with some background process, buggy ethernet driver locking strangely in the kernel, or even NVidia driver doing housekeeping every 45 seconds).

After ruling out our app, and ruling out background processes, GPU dynamic clock tweaking occurred to me. Yep, that’s it. Every 45 seconds the NVidia driver was throttling back the clocks from 550Mhz to 300Mhz, and then realizing after 0.5 sec that it couldn’t get away with it, and throttling them back up! NVidia calls this PowerMizer, and it is enabled by default. Of course that had to stop. This ain’t no laptop! We can’t break frame.

Now how to disable? Well, that comes down to OS-specific vendor-specific hacks to tell the driver to knock it off and give you full perf. Annoying.

In this case, you can hack the following cryptic NVidia-specific and OS-specific directive into the kernel module configuration file, /etc/modprobe.d/options, and bring down the X display server/unload/reload the nvidia kernel module/restart X server or reboot:


options nvidia NVreg_RegistryDwords="PerfLevelSrc=0x2222"

or arrange to have this NVidia-specific command run every time after bringing up the X server:


nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"

These work, but are both vendor-specific and OS-specific.

So for OpenGL applications that either always want max performance (or selectively want it, for profiling), it sure would be nice to have a cross-vendor and cross-vendor way via OpenGL to say “GIVE ME MAX GPU PERFORMANCE!”.

Definitely what Dark Photon said, and the idea of making it a glHint is both nice and consistent with the design of previous features in the API. Vendor-specific extensions are the wrong way to go for any program that you intend to release and have used by real people who’s hardware you have no control over.

It would be helpful if there was a cross-platform way via OpenGL to just tell the GPU to give you max performance (i.e. disable power management).

You can’t actually do that. Especially on some of the more powerful high-end cards.

Power management on GPUs isn’t just for saving electric bills these days. In the case of many high end cards from both NVIDIA and AMD, it keeps the GPUs from pulling more current than the PCIe specification allows. Once a card breaks 300W of power draw, which is the limit that PCIe allows, many things become possible.

Unpleasant things. Like setting someone’s computer on fire. Now yes, most high-end motherboards can handle more than 300W going to the PCIe card. But I’m fairly sure that customers won’t be appreciative if you start a fire in their computers, just because they spent extra money on a GPU and skimped a bit on their motherboard.

Also, this idea assumes that turning off power management would result in increased performance. It may not, depending on where the bottleneck is. For example, Furmark is able to screw with power management, drawing far more current than virtually any other rendering application. I think AMD called it a “power virus” or something to that effect.

If one application can pull more current than another, it is entirely possible that the lower-power application is lower-power simply because it can’t keep the GPU fed as efficiently. Dumping more Wattage down the GPU’s throat won’t change that.

I also came to my mind long time ago, but since it is impossible to do it from the application, I decided to keep track of current state. There is a setting in NV drivers for preferred performance state. It is PREFERRED_PSTATE_ID, but that is only a preferred state (meaning a “hint”). I dislike hints since they don’t oblige vendors to implement particular behavior; like with VBOs usage hints.

The clock frequencies are already exposed. I’ve asked just to make them readable through OpenGL API. On the other hand, hint is just a hint. It does not oblige implementator to anything.

NV’s power management system is much faster than 45 sec. If the utilization in the current state is below (let’s say) 25% and it lasts about 15 sec, the graphics card changes its state to lower performance. If the utilization is above 60% it turns to higher performance. It seems that in your case the utilization fluctuates from 20% to 30%. That’s why the state is not changed after 15 sec. Immediately after changing state, utilization jumps to 60% or more. The reason is probably because thresholds are not properly set, and in the lower state GPU utilization cannot stay below 50% (as it should).

By the way, it is interesting that 9800GT has performance state management. The model I’ve tested did not have (it has just one state with 738MHz/900MHz/1836MHz GPU/Mem/Shader freq.). The Fermi has the most aggressive power state policy. For example, GTX470 may change GPU frequency from 607.5MHz to only 50.625MHz. 12 times less! The difference is greater on higher models, because “idle state” is the same for all.

Disabling power-state management is more radical request than allowing OpenGL to read frequencies, that are already exposed through the drivers. Furthermore, it can be dangerous to disable P-States on different machines. Third, if the algorithm is so efficient that can achieve interactive frame rate with lower frequencies and power consumption, that is great. I just need to identify state and know under what conditions something is happened. If power state can be disabled through the API, it must be done explicitly. Not with the hint, but with explicit command, and enable also explicitly with the command or automatically if power consumption or heat crosses some threshold. It requires more work for the driver developers, and I don’t think they are willing for that. :slight_smile:

P.S. If anyone is interesting in testing P-States on his/her graphics card, I could share a tiny application I’ve made for collecting information from some friends of mine. It is a Windows application and requires NV card with 256+ drivers (maybe it could work with 195+, but I cannot guarantee).

If power state can be disabled through the API, it must be done explicitly. Not with the hint, but with explicit command, and enable also explicitly with the command or automatically if power consumption or heat crosses some threshold. It requires more work for the driver developers, and I don’t think they are willing for that.

The problem is this: what happens if you don’t do it right?

If you render incorrectly with buffer objects, you’re subject to slower framerates. If you take the long route to making your texture objects, you might be subject to more draw-time checks. Or whatever.

If you give an application direct, explicit control over the power management of a GPU, and it does the wrong thing with that control, you can damage the user’s hardware. And who will be blamed for that? It may be your application, but you can bet that there will be flak given to the hardware maker too.

The absolute most you would want to have is a switch or hint that says, “I’m a serious, high performance rendering application. Spin up the GPU as much as you can, and leave it there.” And even then, what about if you’re drawing a GUI? StarCraft II and NVIDIA got in some trouble when SC2’s GUI was running exceedingly fast (hundreds of FPS, maxing out the GPU). You could have something that you turn on and off, but again, what happens if the programmer screws it up? This isn’t like a bad rendering path; this could damage hardware.

No, I think it’s best to leave that stuff to the graphics card themselves. AMD has some nice user-based power-management features for their Cayman cards. Things that allow them users fine-grained power management control. Just let the user overclock their GPUs with the tools available if they want faster performance. Giving applications the ability to do it is just putting it in the wrong place.

As I’ve said, drivers must control power consumption and prevent overheating. We didn’t required overclocking, just defining current P-State. It many cases it has to be even some low performance state in order to decrease heating in some embedded systems. I just said I don’t like hints since they are not obligatory.

NV’s power management system is much faster than 45 sec. If the utilization in the current state is below (let’s say) 25% and it lasts about 15 sec, the graphics card changes its state to lower performance.[/QUOTE]
I acknowledge that you have seen it behave that way on some GPU and driver combo.

However, I can tell you most definitely that is not how it works on this 9800GT. Even with essentially no load (clear screen, draw a few lines for a frame rate graph, swap buffers), it was waiting a solid (and exact) 45 seconds between each attempt to throttle the clocks down. Each time it gave up after ~0.5 sec. Querying the driver clocks, I could see it change the clocks down and back up each time.

Disabling PowerMizer by changing the setting from 0 (Adaptive) to 1 (Prefer Maximum Performance) most definitely got rid of that annoying problem.

As to Alfonse’s concern about burning up everyone’s PC, there is already a setting (albeit vendor-specific) for disabling power management, which doesn’t even require root/admin access to change. This would just make it cross-vendor and cross-platform.

What Alfonse is concerned about (melting everyone’s PC) is taken care of by the thermal detection. Obviously if the system was built like crap and the GPU pushed past some thermal limits trying to max the clocks, then it would scale back regardless. That doesn’t invalidate the request.

GL_MAX_PERFORMANCE_HINT would max the clocks, subject to thermal limits.

That is very interesting. I don’t have any experience with Unix/Linux drivers. I’ve tested several dozens NV graphics cards on Windows platforms and all the drivers/cards behave the same way.

The absolute most you would want to have is a switch or hint that says, “I’m a serious, high performance rendering application. Spin up the GPU as much as you can, and leave it there.” And even then, what about if you’re drawing a GUI? StarCraft II and NVIDIA got in some trouble when SC2’s GUI was running exceedingly fast (hundreds of FPS, maxing out the GPU). You could have something that you turn on and off, but again, what happens if the programmer screws it up? This isn’t like a bad rendering path; this could damage hardware.

One answer here: vsync. The cause of the SC2 fail was that vsync was off, the load to render the UI was really, really lite and so the framerate really went high and that made the chips hot. For what is worth, on the embedded world it can be much worse. A chip can easily go past its TDP, a nice talk from this year’s SIGGRAPH, (pdf warning): http://bps11.idav.ucdavis.edu/talks/03-powerOf3DRendering-BPS2011-koduri.pdf

All that remains is to allow one to not wait for vsync if the vsync was missed anyways. Really, there is no point in non-3D glasses to run past 60Hz, and with 3D glasses no point running past 120Hz. Anything more is for benchmark monkeys.

You can say “max power usage” but leave vsync on. Not rocket science.