Nested Timer Queries

skynet · August 1, 2011, 5:50am

I’m currently playing around with ARB_timer_query in order to profile some parts of my rendering code. Unfortunately, I hit a certain limitation quite fast: You cannot nest timer queries, i.e. you cannot wrap a timer around some code that is using a timer inside for itself. That makes the Timer API hard to use in a general way for code parts that are generally independent.

The current ARB_timer_query doesn’t allow to call glBeginQuery(GL_TIME_ELSAPSED, timer1) if you’ve called glBeginQuery(GL_TIME_ELSAPSED, timer2) before.

The only reason for this limitation is, because glEndQuery() does not take the query object as parameter (I guess this is due to the occlusion query legacy of the query API).

I proppose to change the API to:
glEndQuery(GLenum target, GLuint query)
This way the problem would go away and the specs could easily allow nested timer queries. Also, as a bonus, it would turn the query API 100% ‘DSA-style’

As as additional feature, I guess calling glBeginQuery() on a query that has already been started could be allowed. This should just restart the query. This is no big deal, though, I can code around this limitation.

mfort · August 1, 2011, 6:21am

I’ve made a successful wrapper using glQueryCounter(q, GL_TIMESTAMP); You just call it twice at the begin/end of the section and compute the difference.

skynet · August 1, 2011, 6:32am

Mhhhm, so in order to keep the ability to have asynchronous, non-blocking timings, you’d need two query objects per “MFORTTimer” object, right? One query object that stores the beginning timestamp of an operation, one that stores the end timestamp.
Nice idea

aqnuep · August 1, 2011, 6:47am

Great idea, however, I agree, it would be nice to have nested query support in GL.

mfort · August 1, 2011, 7:26am

Yes.

Actually my own wrapper makes much query objects then just two (one pair). The problem arise when you want to measure tight loops. You can never wait for the query result immediately at the end of the measured section. So it is better to make a vector of begin/end query pairs. Later, at some appropriate place (e.g. after you render the whole scene), I generate all the statistics.

mfort · August 1, 2011, 7:41am

I’d rather remove the glBeginQuery/glEndQuery altogether. It just pollutes the API. When you do CPU benchmarking, you do not have any begin/end functions. You can only get the current time with declared precision.

aqnuep · August 1, 2011, 9:54am

Well, actually you’re right. Maybe it is better to resolve these kind of problems in the application code as the application writer knows the best how he/she wants to measure. Especially in that loop case as nested queries wouldn’t help that much either

l_belev · August 23, 2011, 3:42pm

I totally agree, the best APIs are the simplest possible that still can fully do their job.
While at it, they may also look at the fence sync objects - they should not be objects at all (and glDeleteSync should not exist). Instead glFenceSync should give just an order number which can later be asked about or waited for. Like in xbox360. This is just the value of an internal implementation’s counter that increments by one with each gl command sent. This way the implementation does not need to manage a pool of the alive sync objects. It only needs to compare the asked number with the index of the last executed command.

Alfonse_Reinheart · August 23, 2011, 4:21pm

I totally agree, the best APIs are the simplest possible that still can fully do their job.

No, the best APIs are the APIs that do their job and can’t be used wrongly. Simplicity is a secondary concern.

Also, you’re not getting rid of glBegin/EndQuery. It may not be absolutely necessary for timers, but it’s still vital for occlusion queries.

Instead glFenceSync should give just an order number which can later be asked about or waited for. Like in xbox360. This is just the value of an internal implementation’s counter that increments by one with each gl command sent. This way the implementation does not need to manage a pool of the alive sync objects.

You’re right. And obviously, you know that NVIDIA and Intel’s hardware works exactly the same way. As does PowerVR and anyone else who might someday want to implement OpenGL.

Because if they don’t, you’re effectively saying that they can’t implement ARB_sync at all. So you wouldn’t make this suggestion unless you had certain knowledge that all hardware platforms implement this feature the same way. Without that knowledge, such a suggestion would be both inappropriate and shortsighted.

So I’m curious: how did you manage to obtain internal information about NVIDIA, Intel, PowerVR, etc’s hardware from?

l_belev · August 24, 2011, 7:51am

Sorry korval, i’m not going to waste time with you.