[shaders] glQueryProgramPerformance()

This function would be useful, as the new shading language allows implicit multipass.
Developers could test their shaders for performance with it:

glLinkProgram(theProg);
GLfloat perf = glQueryProgramPerformance(theProg);
if (perf<.8f) {
// returned performance value is below 80%, switch to a lighter shader or something else…

}
else {
glUseProgramObject(theProg);

}

Implementations should be required to return either GL_ONE if the shader would be 100% hardware accelerated or GL_ZERO if software emulated. However, they should be free to implement intermediate values (such as the .8f in my example) with the method of their choice.

What do you think of that ?

Julien.

[This message has been edited by deepmind (edited 02-27-2002).]

So, what is hardware acceleration then? A vertex/pixel program is a piece of software that is runnin on a processor. It’s just that the processor happens to not be the main CPU. As the GPU gets more and more advanced, the edge between hardware and software rendering gets more and more blurry.

The GPU is not a generic arithmetic processor: it has various capabilities. The API exposes these caps and many others. Using all the API offers, ignoring what the hardware actually supports, can result in very poor performances.

The problem with OpenGL 2.0 is it’s getting a bit more high-level, enabling developers to use multipass rendering without knowing they actually do it (as far as I understand the specs). Multipass algorithms just suck. They’re not a feature but a need, in certain circumstances, and result in a framerate divided by the actual amount of passes.

The purpose of such a function would be to inform developers of their shaders’ caveats, and to allow them to develop alternative algorithms when the hardware does not suit their needs.

Julien.

You can query if your shader would be multi-passed, or not, by the OpenGL implementation. If so it will return how many passes it would take. You do that by querying SHADER_RELATIVE_SIZE. This is described in the Objects white paper.

Barthold
3Dlabs

Originally posted by barthold:
You can query if your shader would be multi-passed, or not, by the OpenGL implementation.

Next time I will read the whole specs before I post
However, I was thinking of a general performance query functionality, not just determining the passes amount. Specifying it as a float result would grant implementators the right to achieve proprietary checks and evaluations of the shader program…

Julien.

Actually, what would be nice is a cycle-count of how long each pass (if it does multipass) will take. Or, put another way, a count of how long a particular pass given the setup of hardware will take.

Here’s my view on estimating performance. This can easily drag into a long long discussion though.

People often ask ‘is this API call / feature implemented in hardware’ while they really are asking ‘how fast is it’. If parts of OpenGL are implemented in software or hardware shouldn’t matter, only what the performance is, and its effect on the whole system performance. OpenGL has never allowed any kind of queries to find out what is in hardware or software, or how long a certain operation takes. Instead, I do think that if you really care about how long a certain operation with a certain OpenGL state on a certain CPU with a certain amount of RAM and a certain version of a driver and a certain revision of a graphics board takes, then you just measure it in your code. Once you have that number, you can decide if it is fast enough, or not, for your purpose.

Just getting a performance number out for one little part of a whole OpenGL system is fairly useless. You’ll need to know how fast each part is in the whole system to be able to identify where the bottleneck is. For example, just knowing if a vertex shader can process 10 or 11 million vertices per second is useless if the AGP bus limits the throughput to 4 million per second. But if the driver would store your vertices on the graphics card, the AGP bus is out of the picture, and the throughput of the vertex shader becomes interesting again. In this case you might want to know the memory bandwidth of the graphics card to decide where the bottleneck is instead. Now, as an ISV you can use a tool like vTune to get a lot of information about the various parts of a system, except the OpenGL driver and the underlying hardware. SGI has some profiling extensions for their Infinity Reality Engine that could tell you if you’re rasterizer or geometry limited, for example. However, these kind of OpenGL performance measuring tools are going to be very IHV specific, simply because they are so low level. It’ll be a challenge for sure to come up with a uniform set of API calls to be able to profile the OpenGL pipeline. It would be great though if such an API would exist!

Barthold

Barthold, I bow down to you, master

Let me add a few things:
The ability to query the number of passes is already there, why do you need more? Pass count is an integer, what are floats supposed to mean then?

Clock cycles are moot as well, suppose you have hardware that takes 100 cycles for both the most basic operations and the most sophisticated shader you can come up with in your app. What will you do then? Reject the shader because some other hardware can do it in 5 cycles? If you take pipelining into account, a 100 cycle, fully pipelined operation (drool ) might be faster than a 5 cycle serialized operation because of higher througput, despite of the higher latency. These specifics are so low-level that it just wouldn’t make any sense to expose them.