Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 18

Thread: When exactly does glBeginQuery set the counter to zero?

  1. #1

    When exactly does glBeginQuery set the counter to zero?

    Hi, I'm trying to determine the time it takes from the moment I call a glQueryCounter on the CPU and the moment it is actually executed on the GPU. The reference pages say that when you use glBeginQuery with GL_TIME_ELAPSED, the time counter is set to zero. Is that exactly when the glBeginQuery is called on the CPU? Or is it set to zero once the call is executed on the GPU? If the latter is correct, then using glBeginQuery and glEndQuery will return the same result as using the difference between two glQueryCounter; which also means I can't measure the time between CPU call and GPU execution. If this is correct, is there any other way of measuring that delay?

    Normally, an OpenGL application would look like this:

    Code :
    CPU: A------B---------------
    GPU: ------A---------------B

    A is glBeginQuery, B is glEndQuery.

    The CPU calls A, then calls a bunch of other opengl calls and finally calls B. Meanwhile, the GPU is still taking care of opengl calls previous to A and the opengl calls between A and B take more time to be executed in the GPU then they took to be called in the CPU. This means CPU time and GPU time are unsynchronized. However, GL_TIMESTAMP and GL_TIME_ELAPSED can be used to measure the time distance between A and B on the GPU. So right now I can measure the CPU time between A and B with a chrono::high_precision_clock and the GPU time with a query, but how can I get the time from the CPU A to the GPU A?
    Last edited by FrankBoltzmann; 08-29-2014 at 08:50 AM.

  2. #2
    No experts on this? I was hoping to hear from Alfonse or someone else too bad you can't tag forum members!

  3. #3
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,159
    Alfonse is not active on the forum for a long time. It's a pity. The forum has lost much of its liveness with his retreat.

    Back to the question. Although I'm not an expert, let's try to find an answer to your question.

    Yes, timer_query measures GPU time. The time elapsed from the moment all instruction prior to some query instruction are executed to the time all instruction prior to the next query are fully executed (finalized). The specification is pretty clear about this. By the way, I wouldn't recommend the usage of glBeginQuery/glEndQuery since they don't allow overlapping.

    It is very hard to determine latency of the command execution on the GPU regarding the time when CPU issues it. As far as I know it is impossible in a real-time. Why do you need that? You can hardly imagine what's actually happening in the mean-time. It is not just the consequence of GPU/CPU asynchronous execution, but the ultimate-fight between different processes for the GPU, scheduling, multi-level command-queues writing/reading, user-mode graphics driver execution, kernel-mode graphics driver execution, DMA transfers, a trip through PCIE, through several levels of memory and caches, etc.

    At best, you could estimate the time by offline analysis of the time-span between writing to a context CPU queue and removing from the hardware queue. On Windows it can be accomplished by the usage of Windows Performance Toolkit (WPT). The analysis of the event log can be done with GPUView.

  4. #4
    Advanced Member Frequent Contributor
    Join Date
    Apr 2010
    Posts
    786
    Couldn't you at least approximate it by issuing a glFenceSync after the command and then do a glWaitSync()?

  5. #5
    Hi guys, thanks for the responses! My replies are in blue.

    Quote Originally Posted by Aleksandar View Post
    Alfonse is not active on the forum for a long time. It's a pity. The forum has lost much of its liveness with his retreat.

    Back to the question. Although I'm not an expert, let's try to find an answer to your question.

    Yes, timer_query measures GPU time. The time elapsed from the moment all instruction prior to some query instruction are executed to the time all instruction prior to the next query are fully executed (finalized). The specification is pretty clear about this. By the way, I wouldn't recommend the usage of glBeginQuery/glEndQuery since they don't allow overlapping.

    This seems to be wrong, according to what I found out.

    From the specification: using glQueryCounter(GL_TIMESTAMP) will get a time point (GPU time) at the moment where glQueryCounter is executed after all previous OpenGL calls have already been executed by the pipeline.

    From what I read: glBeginQuery(GL_TIME_ELAPSED) will measure the time since glBeginQuery enters the pipeline until glEndQuery leaves the pipeline, which means the CPU time from when glBeginQuery is called by the CPU to when glEndQuery is executed on the GPU. This means using glBegin/glEnd and glQueryCounter will result in 2 different durations: the first includes the delay that I wanted, the second doesn't.

    Is this correct or does glBeginQuery actually wait for all other calls to be done?


    It is very hard to determine latency of the command execution on the GPU regarding the time when CPU issues it. As far as I know it is impossible in a real-time. Why do you need that? You can hardly imagine what's actually happening in the mean-time. It is not just the consequence of GPU/CPU asynchronous execution, but the ultimate-fight between different processes for the GPU, scheduling, multi-level command-queues writing/reading, user-mode graphics driver execution, kernel-mode graphics driver execution, DMA transfers, a trip through PCIE, through several levels of memory and caches, etc.

    This is exactly what I want to determine. I want to be able to pull out time graphs of how the CPU and GPU work in parallel. I know I wont be able to determine what exactly is causing the delay (scheduling, BUS, etc) but at least I'll see how long that delay is and I'll be able to visualise the asynchronous timeline.

    At best, you could estimate the time by offline analysis of the time-span between writing to a context CPU queue and removing from the hardware queue. On Windows it can be accomplished by the usage of Windows Performance Toolkit (WPT). The analysis of the event log can be done with GPUView.
    Quote Originally Posted by carsten neumann View Post
    Couldn't you at least approximate it by issuing a glFenceSync after the command and then do a glWaitSync()?

    How would this help me measure the delay? Care to explain thoroughly? I don't want to block the CPU.

  6. #6
    Advanced Member Frequent Contributor
    Join Date
    Apr 2010
    Posts
    786
    Code :
    /* issue GL command */
    cpu_timer.start();
    GLsync fence = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
    glWaitSync(fence, 0, GL_TIMEOUT_IGNORED); // or is glClientWaitSync more useful here?
    cpu_timer.stop();

    I think this gives an approximation of how long it takes from the moment the command is issued until the GL says it is complete. Sorry, no idea how to do that non-blocking.

  7. #7
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,159
    Quote Originally Posted by FrankBoltzmann View Post
    From what I read: glBeginQuery(GL_TIME_ELAPSED) will measure the time since glBeginQuery enters the pipeline until glEndQuery leaves the pipeline, which means the CPU time from when glBeginQuery is called by the CPU to when glEndQuery is executed on the GPU. This means using glBegin/glEnd and glQueryCounter will result in 2 different durations: the first includes the delay that I wanted, the second doesn't.
    We are not reading the same spec probably.
    Quote Originally Posted by ARB_timer_query Rev.13
    When BeginQuery and EndQuery are called with a <target> of
    TIME_ELAPSED, the GL prepares to start and stop the timer used for
    timer queries. The timer is started or stopped when the effects from all
    previous commands on the GL client and server state and the framebuffer
    have been fully realized.
    What you could eventually do is to measure time between reaching the server (not quite clear what does it mean, i.e. which command-queue has been reached) and execution time.

    Quote Originally Posted by ARB_timer_query Rev.13
    The current time of the GL may be queried by calling GetIntegerv or
    GetInteger64v with the symbolic constant TIMESTAMP. This will return the
    GL time after all previous commands have reached the GL server but have
    not yet necessarily executed. By using a combination of this synchronous
    get command and the asynchronous timestamp query object target,
    applications can measure the latency between when commands reach the GL
    server and when they are realized in the framebuffer.
    Considering glClientWaitSync(), it could measure some time, but I'm not sure whether it is what you need. It is a total CPU time from issuing to signaling back, but I'm not sure how precise it is. It just guaranties the previous commands are finished, but what time has passed since that event. And, yes, it is a blocking call. glWaitSync() is used for inter-context synchronization. It cannot be used in this case.

  8. #8
    Quote Originally Posted by Aleksandar View Post
    We are not reading the same spec probably.


    What you could eventually do is to measure time between reaching the server (not quite clear what does it mean, i.e. which command-queue has been reached) and execution time.


    Considering glClientWaitSync(), it could measure some time, but I'm not sure whether it is what you need. It is a total CPU time from issuing to signaling back, but I'm not sure how precise it is. It just guaranties the previous commands are finished, but what time has passed since that event. And, yes, it is a blocking call. glWaitSync() is used for inter-context synchronization. It cannot be used in this case.
    Thanks for that quote from the spec, I couldn't find that anywhere! I think this should also be in the man page explaining the use of glBeginQuery (https://www.opengl.org/sdk/docs/man3...BeginQuery.xml) and why is that specification still ARB? Aren't timer queries already in core?

    I guess what I want to do isn't really possible. And how about if I issue a glQueryCounter with GL_TIMESTAMP in the beginning of the application, then record the CPU time when the query result is available and record the query result as well? I would have a CPU and a GPU timestamps of (approximately) the same time point, would I not? Then I could fetch other OpenGL timestamps and CPU time_points and use the difference between them and the initial timestamps as a timestamp relative to the application's initialization. Does this make any sense?

    EDIT: I mean, if I do something like this:
    Code :
    glQueryCounter(gpu_start)
    glFence
    glClientWaitSync
    cpu_start = clock::now()
    {
       do a lot of OpenGL stuff
    }
    glQueryCounter(gpu_stop)
    glFence
    glClientWaitSync
    cpu_stop = clock::now()
    won't the duration from cpu_start to cpu_stop always be the same (with a tiny bit of jitter) as the duration from gpu_start to gpu_stop, since they are synchronized?

    If so, I could just get an also synchronized timestamp at the beginning of my application and use it to calculate the (correlated) gpu and cpu timestamps relative to the start of the application. Am I missing something?
    Last edited by FrankBoltzmann; 09-02-2014 at 08:39 PM.

  9. #9
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,159
    Quote Originally Posted by FrankBoltzmann View Post
    ... why is that specification still ARB? Aren't timer queries already in core?
    Yes, timer_query is a part of the core specification since ver. 3.3, but I like to use Registry instead of searching through the core specification.

    Quote Originally Posted by FrankBoltzmann View Post
    I guess what I want to do isn't really possible.
    Probably it is a quite correct statement, bur let's see what can be done.

    Quote Originally Posted by FrankBoltzmann View Post
    And how about if I issue a glQueryCounter with GL_TIMESTAMP in the beginning of the application, then record the CPU time when the query result is available and record the query result as well? I would have a CPU and a GPU timestamps of (approximately) the same time point, would I not?
    The CPU time-stamp will be behind the GPU time-stamp for the undetermined amount of time. But OK, let's say it is the same moment.

    Quote Originally Posted by FrankBoltzmann View Post
    Then I could fetch other OpenGL timestamps and CPU time_points and use the difference between them and the initial timestamps as a timestamp relative to the application's initialization.
    I don't get this point. Tgpu2-Tgpu1 if it is perfectly synchronized with the CPU, should be equal to Tcpu2-Tcpu1. If they are not, you'll actually measure latency of the results reading, not waiting in commands queues or something that precedes a GPU execution. Or maybe I missed something... Also, the precision of the counters is not the same. I don't know how clock::now works.

    Also, I don't get why you are using glFenceSync/glClientWaitSync. It can be used glGetQueryObjectui64v instead with probably less overhead.

  10. #10
    Response and explanation of my logic:

    Quote Originally Posted by Aleksandar View Post
    The CPU time-stamp will be behind the GPU time-stamp for the undetermined amount of time. But OK, let's say it is the same moment.
    Exactly, let's say you capture GPU time at program start and it is A. You then capture CPU time when A is available, but there will always be a small delay from the time A was captured and the time its value is stored into C++ code, and lets call it B. A and B have no correlation whatsoever, but we know they were taken at almost the same absolute real time, except B was taken X nanoseconds after A. Now, our GPU and CPU timestamps are calculated by subtracting A and B respectively. So Tgpu = gpu_now - A and Tcpu = cpu_now - B. If the GPU and CPU are synchronized (let's ignore the delay I'm trying to find for now) Tgpu will be Tcpu+X, am I right?

    Quote Originally Posted by Aleksandar View Post
    I don't get this point. Tgpu2-Tgpu1 if it is perfectly synchronized with the CPU, should be equal to Tcpu2-Tcpu1. If they are not, you'll actually measure latency of the results reading, not waiting in commands queues or something that precedes a GPU execution.
    According to what you say, Tgpu2-Tgpu1 = Tcpu2-Tcpu1, every time, if they both have the same nanosecond precision, right? That said, I can calculate the delay I wanted by doing Tgpu1-Tcpu1 because Tgpu=gpu_time-A and Tcpu=cpu_time-B, where A and B are different counter values that refer to the same absolute time point. Do you understand? Tgpu1 and Tcpu1 are actually durations from the start of the program, not durations from 2 different counters that have been initiated at 2 different+unrelated+undefined time points. Am I making any sense?

    Quote Originally Posted by Aleksandar View Post
    Or maybe I missed something... Also, the precision of the counters is not the same. I don't know how clock::now works.
    My CPU timer has a 1 ns precision, so does the GPU timer, I hope.

    Quote Originally Posted by Aleksandar View Post
    Also, I don't get why you are using glFenceSync/glClientWaitSync. It can be used glGetQueryObjectui64v instead with probably less overhead.
    You are right, but it was just for you to understand there was a sync. Basically:

    DELAY CALCULATION:

    Program initialization:
    Code :
    glQueryCounter(id,GL_TIMESTAMP);
    glGetQueryObjectui64v(id,GL_QUERY_RESULT,gpu_start_time);
    cpu_start_time = clock::now();

    During execution:
    Code :
    cpu_begin_time = clock::now();
    glQueryCounter(id2,GL_TIMESTAMP);
    {
          // gl calls being timed
    }
    glQueryCounter(id3,GL_TIMESTAMP);
    cpu_end_time = clock::now();

    When results are available:
    Code :
    // get query results
    glGetQueryObjectui64v(id2,GL_QUERY_RESULT,gpu_begin_time);
    glGetQueryObjectui64v(id3,GL_QUERY_RESULT,gpu_end_time);
     
    // calculate relative timestamps from start of program
    cpu_begin_time -= cpu_start_time;
    cpu_end_time   -= cpu_start_time;
    gpu_begin_time -= gpu_start_time;
    gpu_end_time   -= gpu_start_time;
     
    // calculate durations
    gpu_duration = gpu_end_time - gpu_begin_time;
    cpu_duration = cpu_end_time - cpu_begin_time;
     
    // calculate OpenGL pipeline delay (from call to execution)
    async_delay = gpu_begin_time - cpu_begin_time;

    Makes sense?
    Last edited by FrankBoltzmann; 09-04-2014 at 02:08 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •