How to measure timing in OpenGL?

Hello, folks!
I need timing some drawing function. But, if I just use time() provide from c++, I find the timing is incorrect. Sometimes timing is 0, sometimes timing is 15ms.
I found someone did in this way:
glFlush(); //or glFinish();
float sTime=(GLfloat)glutGet(GLUT_ELAPSED_TIME);
glDrawElements(…);//drawing function
glFlush(); //or glFinish();
float eTime=(GLfloat)glutGet(GLUT_ELAPSED_TIME);
t = eTime - sTime;
Is this way correct?
Thanks!

That is the correct way if your use glFinish, but you need to draw a lot of stuff for a good measurement.

Also things are pipelined, so there is a degree of parallelism when drawing stuff that you defeat when measuring like this and drawing many things together will be more efficient than the total time through the pipeline for a smaller set of primitives.

There are advanced mechanisms where you can send a ‘fence’ or similar timing barrier down the pipeline and measure when it completes. This allows you to get transport delay / draw time through the pipeline without stalling most of it through emptying it with a finish.

You will also find that some vendors in some markets just ignore your finish because it’s considered bad practice for applications, by some driver developers who care about benchmarks and raw numbers rather than for example the quality of user experience a developer might be shooting for through latency control.

Basically because you have a pipeline with multiple stages with buffers and resource caching it is inherently difficult to measure, but it helps if you at least have a mental picture of the graphics pipeline and understand that you’re only measuring things at one end and you empty it with glFinish and effectively cripple performance for small batches of data.

Also check this GL_ARB_timer_query extension (or this GL_EXT_timer_query, same extension with a different name).
With this extension probably you have more precise results (even then performance timer), cause the time is measured using the internal GPU clock.
They should be supported from both ATI and nVidia with the latest driver.

For test porpoise is excellent, if you want to release your code in the wild use the technique described by dorbie.

Thanks a lot!
I will check the GL_ARB_timer_query extension.

I tried to use these functions:
glBeginQuery(GL_TIME_ELAPSED, queries[0]);
glEndQuery(GL_TIME_ELAPSED);
glGetQueryObjectui64v(queries[0], GL_QUERY_RESULT, &timeElapsed);
But, I got GL_TIME_ELAPSED and glGetQueryObjectui64v undefined.
Even though I download the latest driver support OpenGL4.0, and glext.h and add this headfile to vc2005 include folder.
Who can tell me how to use these function? And how to update OpenGL to latest version?
Thanks a lot.

http://developer.nvidia.com/object/opengl_extensions_tutorial.html

Is your GL context active in the moment when those functions are called? If it is not, that’s the reason.

This is a pretty complete code that demonstrates usage of these functions.
http://www.opengl.org/discussion_boards/…8719#Post278719

Just install the latest drivers. For GL 4.0 you should have a new hardware, like ATI 5xxx or NVIDIA 4xx. glext.h serves only to know how function signatures look like. Everything is in drivers.

Thanks a lot!
My video card is Qudaro 5800, driver 257.21.
But, I still can’t use GL_TIME_ELAPSED and glGetQueryObjectui64v.
GL_TIME_ELAPSED is still undefined.

Unfortunately Qudaro 5800 is SM4 card, so it does not support OpenGL 4.0. Nevertheless, glGetQueryObject should work. Post a part of your code where the measurement is taken. So far I haven’t got any trouble even with much cheaper/older cards.

GLuint timeElapsed = 0;
My code snippet:
// Create a query object.
glGenQueries(1, queries);
// Query current timestamp 1
glBeginQuery(GL_TIME_ELAPSED, queries[0]);
glDrawElementsInstancedEXT(GL_QUADS, 150*4, GL_UNSIGNED_INT, 0, nBodies);
glEndQuery(GL_TIME_ELAPSED);
// See how much time the rendering of object i took in nanoseconds.
glGetQueryObjectui64v(queries[0], GL_QUERY_RESULT, &timeElapsed);

Thanks.

GLuint queries[1];
GLuint timeElapsed = 0;
My code snippet:
// Create a query object.
glGenQueries(1, queries);
// Query current timestamp 1
glBeginQuery(GL_TIME_ELAPSED, queries[0]);
glDrawElementsInstancedEXT(GL_QUADS, 150*4, GL_UNSIGNED_INT, 0, nBodies);
glEndQuery(GL_TIME_ELAPSED);
// See how much time the rendering of object i took in nanoseconds.
glGetQueryObjectui64v(queries[0], GL_QUERY_RESULT, &timeElapsed);

Thanks.

Fist, why don’t you use glGetQueryObjectuiv instead of glGetQueryObjectui64v? If it really makes a problem, 32-bit integer version may succeed.

Second, you have to wait until the result is available. Take a look at my code snippet. It is usually suggested to do something else before querying the result, or you should block execution until it is available.

GLuint m_iTimeQuery;
GLuint timeElapsed = 0;
// Create a query object.
glGenQueries(1, &m_iTimeQuery);
GLint available = 0;
glGetQueryObjectiv(m_iTimeQuery, GL_QUERY_RESULT_AVAILABLE, &available);
// See how much time the rendering of object i took in nanoseconds.
if(available){
glGetQueryObjectuiv(m_iTimeQuery, GL_QUERY_RESULT, &timeElapsed);
glBeginQuery(GL_TIME_ELAPSED, m_iTimeQuery);
}
glDrawElementsInstancedEXT(GL_QUADS, 150*4, GL_UNSIGNED_INT, 0, nBodies);
if(available)
glEndQuery(GL_TIME_ELAPSED);

I modified my code according to your code snippet.
But, it still has an error C2065: ‘GL_TIME_ELAPSED’ : undeclared identifier.

Oh, ok! That’s the problem.
You haven’t included glext.h properly.

You can just add the following code, so that timer query constant are defined.

#ifndef GL_ARB_timer_query
#define GL_TIME_ELAPSED 0x88BF
#define GL_TIMESTAMP 0x8E28
#endif

Or use value 0x88BF instead of GL_TIME_ELAPSED.

Thanks very much!
Now, I can run the code.
But, I find glGetQueryobjectiv() will return available = 0.
Is that to say my video card doesn’t support GL_ARB_timer_query?

No, it doesn’t mean that your card does not support it, but only that the result is not available in the moment. The code you have written is not correct. Take a look to my code again and you’ll probably realize where the problem is. I’m displaying results from the previous frame.

For the beginning try the following:


        glBeginQuery(GL_TIME_ELAPSED, query1);

        // Draw object
        ....

        glEndQuery(GL_TIME_ELAPSED);

        available = 0;
        // Wait for all results to become available
        while (!available) {
            glGetQueryObjectiv(query1, GL_QUERY_RESULT_AVAILABLE, &available);
        }
        glGetQueryObjectuiv(query1, GL_QUERY_RESULT, &timeElapsed); 

Querying GL_QUERY_RESULT_AVAILABLE is only really useful if you’re performing some work in the loop while waiting for the result to become available. If you just query GL_QUERY_RESULT, then OpenGL will wait till the result becomes available anyway:


glBeginQuery(GL_TIME_ELAPSED, query1);
// Draw object
glEndQuery(GL_TIME_ELAPSED);
//slow - can't do anything more until result is available
glGetQueryObjectuiv(query1, GL_QUERY_RESULT, &timeElapsed);

or if you have something else you can perform while waiting for results:


        glBeginQuery(GL_TIME_ELAPSED, query1);

        // Draw object
        ....

        glEndQuery(GL_TIME_ELAPSED);

        available = 0;
        // Wait for all results to become available
        while (!available) {
            DoSomeWork(); // <== This way we do some extra work while waiting
            glGetQueryObjectiv(query1, GL_QUERY_RESULT_AVAILABLE, &available);
        }
        glGetQueryObjectuiv(query1, GL_QUERY_RESULT, &timeElapsed);

Thanks Aleksandar and Dan Bartlett.
Now, the code works with the loop while waiting.
// Wait for all results to become available
while (!available) {
DoSomeWork(); // <== This way we do some extra work while waiting
glGetQueryObjectiv(query1, GL_QUERY_RESULT_AVAILABLE, &available);
}

Now, I found the timing of glDrawElementsInstancedEXT is longer than glCallList(torus_display_list). DrawInstanced should be faster. I need draw 16k torus. I make the same tessellation for DrawElements and displaylist. 150 quadrilaterals for a torus. Who can give me any advice? Thanks.

Usually it is better to set all queries during the drawing and read it before the next pass (next drawing cycle), because swapbuffers forces everything to finish and there is no busy-waiting.

Untrue! OpenGL just retrieves the state, and if it is not available GL won’t wait to become available. Just reads the value. That’s the reason why we have to check if it is available prior to reading. But if you read the values from the previous pass, than querying availability almost have no meaning. I have proposed busy waiting just to illustrate the process of reading and make Liufu clear how to implement the simplest way of reading, but certainly it is not a good practice. Check my previous example for the most efficient way of measuring drawing time.

I’m fairly sure that GL_QUERY_RESULT_AVAILABLE isn’t required to get valid results with every GL_QUERY_RESULT call. If it were to more accurately describe what it does, it should perhaps have been named “GL_GETTING_QUERY_RESULT_WILL_CAUSE_DELAY”.

I shouldn’t have said it’s only useful with a loop though, because as you said, it’s better to check the result much later (next frame) if possible. If it’s a non-critical query, and you determine by using GL_QUERY_RESULT_AVAILABLE that querying the result will cause a delay, then it’s possible to skip it, and maybe check it the next frame instead. If the query is critical however, then using GL_QUERY_RESULT without GL_QUERY_RESULT_AVAILABLE will have the same effect as looping until GL_QUERY_RESULT_AVAILABLE returns GL_TRUE.

ps. Spec should probably get rid of the “occlusion query” mention in the last part of that description, and change it to “query”.