PDA

View Full Version : How to measure timing in OpenGL?



Liufu
07-13-2010, 09:20 AM
Hello, folks!
I need timing some drawing function. But, if I just use time() provide from c++, I find the timing is incorrect. Sometimes timing is 0, sometimes timing is 15ms.
I found someone did in this way:
glFlush(); //or glFinish();
float sTime=(GLfloat)glutGet(GLUT_ELAPSED_TIME);
glDrawElements(...);//drawing function
glFlush(); //or glFinish();
float eTime=(GLfloat)glutGet(GLUT_ELAPSED_TIME);
t = eTime - sTime;
Is this way correct?
Thanks!

dorbie
07-13-2010, 11:45 AM
That is the correct way if your use glFinish, but you need to draw a lot of stuff for a good measurement.

Also things are pipelined, so there is a degree of parallelism when drawing stuff that you defeat when measuring like this and drawing many things together will be more efficient than the total time through the pipeline for a smaller set of primitives.

There are advanced mechanisms where you can send a 'fence' or similar timing barrier down the pipeline and measure when it completes. This allows you to get transport delay / draw time through the pipeline without stalling most of it through emptying it with a finish.

You will also find that some vendors in some markets just ignore your finish because it's considered bad practice for applications, by some driver developers who care about benchmarks and raw numbers rather than for example the quality of user experience a developer might be shooting for through latency control.

Basically because you have a pipeline with multiple stages with buffers and resource caching it is inherently difficult to measure, but it helps if you at least have a mental picture of the graphics pipeline and understand that you're only measuring things at one end and you empty it with glFinish and effectively cripple performance for small batches of data.

Rosario Leonardi
07-13-2010, 01:10 PM
Also check this GL_ARB_timer_query extension (or this GL_EXT_timer_query, same extension with a different name).
With this extension probably you have more precise results (even then performance timer), cause the time is measured using the internal GPU clock.
They should be supported from both ATI and nVidia with the latest driver.

For test porpoise is excellent, if you want to release your code in the wild use the technique described by dorbie.

Liufu
07-13-2010, 09:43 PM
Thanks a lot!
I will check the GL_ARB_timer_query extension.

Liufu
07-14-2010, 10:36 AM
I tried to use these functions:
glBeginQuery(GL_TIME_ELAPSED, queries[0]);
glEndQuery(GL_TIME_ELAPSED);
glGetQueryObjectui64v(queries[0], GL_QUERY_RESULT, &timeElapsed);
But, I got GL_TIME_ELAPSED and glGetQueryObjectui64v undefined.
Even though I download the latest driver support OpenGL4.0, and glext.h and add this headfile to vc2005 include folder.
Who can tell me how to use these function? And how to update OpenGL to latest version?
Thanks a lot.

Rosario Leonardi
07-15-2010, 06:33 AM
http://developer.nvidia.com/object/opengl_extensions_tutorial.html

Aleksandar
07-15-2010, 07:15 AM
...
But, I got GL_TIME_ELAPSED and glGetQueryObjectui64v undefined.
Is your GL context active in the moment when those functions are called? If it is not, that's the reason.


Who can tell me how to use these function?

This is a pretty complete code that demonstrates usage of these functions.
http://www.opengl.org/discussion_boards/...8719#Post278719 (http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Main=53798&Number=2787 19#Post278719)



And how to update OpenGL to latest version?

Just install the latest drivers. For GL 4.0 you should have a new hardware, like ATI 5xxx or NVIDIA 4xx. glext.h serves only to know how function signatures look like. Everything is in drivers.

Liufu
07-16-2010, 07:34 AM
Thanks a lot!
My video card is Qudaro 5800, driver 257.21.
But, I still can't use GL_TIME_ELAPSED and glGetQueryObjectui64v.
GL_TIME_ELAPSED is still undefined.

Aleksandar
07-16-2010, 09:34 AM
Unfortunately Qudaro 5800 is SM4 card, so it does not support OpenGL 4.0. Nevertheless, glGetQueryObject should work. Post a part of your code where the measurement is taken. So far I haven't got any trouble even with much cheaper/older cards.

Liufu
07-17-2010, 12:00 AM
GLuint timeElapsed = 0;
My code snippet:
// Create a query object.
glGenQueries(1, queries);
// Query current timestamp 1
glBeginQuery(GL_TIME_ELAPSED, queries[0]);
glDrawElementsInstancedEXT(GL_QUADS, 150*4, GL_UNSIGNED_INT, 0, nBodies);
glEndQuery(GL_TIME_ELAPSED);
// See how much time the rendering of object i took in nanoseconds.
glGetQueryObjectui64v(queries[0], GL_QUERY_RESULT, &timeElapsed);

Thanks.

Liufu
07-17-2010, 12:01 AM
GLuint queries[1];
GLuint timeElapsed = 0;
My code snippet:
// Create a query object.
glGenQueries(1, queries);
// Query current timestamp 1
glBeginQuery(GL_TIME_ELAPSED, queries[0]);
glDrawElementsInstancedEXT(GL_QUADS, 150*4, GL_UNSIGNED_INT, 0, nBodies);
glEndQuery(GL_TIME_ELAPSED);
// See how much time the rendering of object i took in nanoseconds.
glGetQueryObjectui64v(queries[0], GL_QUERY_RESULT, &timeElapsed);

Thanks.

Aleksandar
07-17-2010, 12:26 AM
Fist, why don't you use glGetQueryObjectuiv instead of glGetQueryObjectui64v? If it really makes a problem, 32-bit integer version may succeed.

Second, you have to wait until the result is available. Take a look at my code snippet. It is usually suggested to do something else before querying the result, or you should block execution until it is available.

Liufu
07-17-2010, 12:56 AM
GLuint m_iTimeQuery;
GLuint timeElapsed = 0;
// Create a query object.
glGenQueries(1, &m_iTimeQuery);
GLint available = 0;
glGetQueryObjectiv(m_iTimeQuery, GL_QUERY_RESULT_AVAILABLE, &available);
// See how much time the rendering of object i took in nanoseconds.
if(available){
glGetQueryObjectuiv(m_iTimeQuery, GL_QUERY_RESULT, &timeElapsed);
glBeginQuery(GL_TIME_ELAPSED, m_iTimeQuery);
}
glDrawElementsInstancedEXT(GL_QUADS, 150*4, GL_UNSIGNED_INT, 0, nBodies);
if(available)
glEndQuery(GL_TIME_ELAPSED);

I modified my code according to your code snippet.
But, it still has an error C2065: 'GL_TIME_ELAPSED' : undeclared identifier.

Aleksandar
07-17-2010, 04:24 AM
Oh, ok! That's the problem.
You haven't included glext.h properly.

You can just add the following code, so that timer query constant are defined.

#ifndef GL_ARB_timer_query
#define GL_TIME_ELAPSED 0x88BF
#define GL_TIMESTAMP 0x8E28
#endif

Or use value 0x88BF instead of GL_TIME_ELAPSED.

Liufu
07-17-2010, 09:00 AM
Thanks very much!
Now, I can run the code.
But, I find glGetQueryobjectiv() will return available = 0.
Is that to say my video card doesn't support GL_ARB_timer_query?

Aleksandar
07-17-2010, 10:06 AM
No, it doesn't mean that your card does not support it, but only that the result is not available in the moment. The code you have written is not correct. Take a look to my code again and you'll probably realize where the problem is. I'm displaying results from the previous frame.

For the beginning try the following:


glBeginQuery(GL_TIME_ELAPSED, query1);

// Draw object
....

glEndQuery(GL_TIME_ELAPSED);

available = 0;
// Wait for all results to become available
while (!available) {
glGetQueryObjectiv(query1, GL_QUERY_RESULT_AVAILABLE, &available);
}
glGetQueryObjectuiv(query1, GL_QUERY_RESULT, &timeElapsed);

Dan Bartlett
07-17-2010, 12:54 PM
Querying GL_QUERY_RESULT_AVAILABLE is only really useful if you're performing some work in the loop while waiting for the result to become available. If you just query GL_QUERY_RESULT, then OpenGL will wait till the result becomes available anyway:



glBeginQuery(GL_TIME_ELAPSED, query1);
// Draw object
glEndQuery(GL_TIME_ELAPSED);
//slow - can't do anything more until result is available
glGetQueryObjectuiv(query1, GL_QUERY_RESULT, &timeElapsed);

or if you have something else you can perform while waiting for results:


glBeginQuery(GL_TIME_ELAPSED, query1);

// Draw object
....

glEndQuery(GL_TIME_ELAPSED);

available = 0;
// Wait for all results to become available
while (!available) {
DoSomeWork(); // <== This way we do some extra work while waiting
glGetQueryObjectiv(query1, GL_QUERY_RESULT_AVAILABLE, &amp;available);
}
glGetQueryObjectuiv(query1, GL_QUERY_RESULT, &amp;timeElapsed);

Liufu
07-17-2010, 11:47 PM
Thanks Aleksandar and Dan Bartlett.
Now, the code works with the loop while waiting.
// Wait for all results to become available
while (!available) {
DoSomeWork(); // <== This way we do some extra work while waiting
glGetQueryObjectiv(query1, GL_QUERY_RESULT_AVAILABLE, &amp;available);
}

Now, I found the timing of glDrawElementsInstancedEXT is longer than glCallList(torus_display_list). DrawInstanced should be faster. I need draw 16k torus. I make the same tessellation for DrawElements and displaylist. 150 quadrilaterals for a torus. Who can give me any advice? Thanks.

Aleksandar
07-18-2010, 02:04 AM
Querying GL_QUERY_RESULT_AVAILABLE is only really useful if you're performing some work in the loop while waiting for the result to become available.
Usually it is better to set all queries during the drawing and read it before the next pass (next drawing cycle), because swapbuffers forces everything to finish and there is no busy-waiting.


If you just query GL_QUERY_RESULT, then OpenGL will wait till the result becomes available anyway
Untrue! OpenGL just retrieves the state, and if it is not available GL won't wait to become available. Just reads the value. That's the reason why we have to check if it is available prior to reading. But if you read the values from the previous pass, than querying availability almost have no meaning. I have proposed busy waiting just to illustrate the process of reading and make Liufu clear how to implement the simplest way of reading, but certainly it is not a good practice. Check my previous example for the most efficient way of measuring drawing time.

Dan Bartlett
07-18-2010, 03:17 AM
I'm fairly sure that GL_QUERY_RESULT_AVAILABLE isn't required to get valid results with every GL_QUERY_RESULT call. If it were to more accurately describe what it does, it should perhaps have been named "GL_GETTING_QUERY_RESULT_WILL_CAUSE_DELAY".



If pname is QUERY_RESULT, then the query objectís result value is returned as
a single integer in params. If the value is so large in magnitude that it cannot be
represented with the requested type, then the nearest value representable using the
requested type is returned. If the number of query counter bits for target is zero,
then the result is returned as a single integer with the value zero.
There may be an indeterminate delay before the above query returns. If pname
is QUERY_RESULT_AVAILABLE, FALSE is returned if such a delay would be required;
otherwise TRUE is returned. It must always be true that if any query object
returns a result available of TRUE, all queries of the same type issued prior to that
query must also return TRUE.
Querying the state for any given query object forces that occlusion query to
complete within a finite amount of time.


I shouldn't have said it's only useful with a loop though, because as you said, it's better to check the result much later (next frame) if possible. If it's a non-critical query, and you determine by using GL_QUERY_RESULT_AVAILABLE that querying the result will cause a delay, then it's possible to skip it, and maybe check it the next frame instead. If the query is critical however, then using GL_QUERY_RESULT without GL_QUERY_RESULT_AVAILABLE will have the same effect as looping until GL_QUERY_RESULT_AVAILABLE returns GL_TRUE.

ps. Spec should probably get rid of the "occlusion query" mention in the last part of that description, and change it to "query".

Aleksandar
07-18-2010, 09:39 AM
The only sentence that could be understood as blocking-wait until object is available is:

There may be an indeterminate delay before the above query returns.
But all code examples in timer_query.txt(http://www.opengl.org/registry/specs/ARB/timer_query.txt) waits until GL_QUERY_RESULT_AVAILABLE is true. One vague sentence against three pretty clean code examples. We could try to find out how some specific implementation (with drivers that we are currently using) deals with that, but it will not prove anything. Nevertheless, I'll try it. ;)

Aleksandar
07-19-2010, 01:20 AM
Excuse me for my temper, Dan. :(
You are right about NV 257.21 drivers. glGetQueryObject*() is a blocking function call with a severe performance penalty.



// First approach

CCounter count;
count.StartCounter();
//-------------------------------------------------
GLint available = 0;
glGetQueryObjectiv(m_iTimeQuery, GL_QUERY_RESULT_AVAILABLE, &amp;available);
if(available)
{
glGetQueryObjectuiv(m_iTimeQuery, GL_QUERY_RESULT, &amp;m_timeElapsed_ns);
glBeginQuery(GL_TIME_ELAPSED, m_iTimeQuery);
}
//-------------------------------------------------
DrawScene();
//-------------------------------------------------
if(available)
glEndQuery(GL_TIME_ELAPSED);
//-------------------------------------------------
cpuTime = count.StopCounter(RTT_SEC);
gpuTime = m_timeElapsed_ns;

cpuTime < 1.1ms
gpuTime = 12.2ms



// Second approach

CCounter count;
count.StartCounter();
//-------------------------------------------------
glBeginQuery(GL_TIME_ELAPSED, m_iTimeQuery);
//-------------------------------------------------
DrawScene();
//-------------------------------------------------
glEndQuery(GL_TIME_ELAPSED);
glGetQueryObjectuiv(m_iTimeQuery, GL_QUERY_RESULT, &amp;m_timeElapsed_ns);
//-------------------------------------------------
cpuTime = count.StopCounter(RTT_SEC);
gpuTime = m_timeElapsed_ns;

cpuTime = 13.8ms
gpuTime = 12.2ms

As we can see, in the first approach CPU utilization is very low (about 1ms for the whole drawing cycle plus four additional function calls and two condition checkings), although GPU drawing time is about 12ms. In the second approach, CPU needs more than 13ms.

Is it the same with ATI?

Nicolas Lelong
11-18-2010, 07:05 AM
Is it the same with ATI?

FWIW so late after this post, from memory, yes, querying the result also forces a sync on ATI.