Response and explanation of my logic:
Exactly, let’s say you capture GPU time at program start and it is A. You then capture CPU time when A is available, but there will always be a small delay from the time A was captured and the time its value is stored into C++ code, and lets call it B. A and B have no correlation whatsoever, but we know they were taken at almost the same absolute real time, except B was taken X nanoseconds after A. Now, our GPU and CPU timestamps are calculated by subtracting A and B respectively. So Tgpu = gpu_now - A
and Tcpu = cpu_now - B
. If the GPU and CPU are synchronized (let’s ignore the delay I’m trying to find for now) Tgpu will be Tcpu+X, am I right?
[QUOTE=Aleksandar;1261431]
I don’t get this point. Tgpu2-Tgpu1 if it is perfectly synchronized with the CPU, should be equal to Tcpu2-Tcpu1. If they are not, you’ll actually measure latency of the results reading, not waiting in commands queues or something that precedes a GPU execution.[/QUOTE]
According to what you say, Tgpu2-Tgpu1 = Tcpu2-Tcpu1, every time, if they both have the same nanosecond precision, right? That said, I can calculate the delay I wanted by doing Tgpu1-Tcpu1 because Tgpu=gpu_time-A and Tcpu=cpu_time-B, where A and B are different counter values that refer to the same absolute time point. Do you understand? Tgpu1 and Tcpu1 are actually durations from the start of the program, not durations from 2 different counters that have been initiated at 2 different+unrelated+undefined time points. Am I making any sense?
[QUOTE=Aleksandar;1261431]
Or maybe I missed something… Also, the precision of the counters is not the same. I don’t know how clock::now works. [/QUOTE]
My CPU timer has a 1 ns precision, so does the GPU timer, I hope.
[QUOTE=Aleksandar;1261431]
Also, I don’t get why you are using glFenceSync/glClientWaitSync. It can be used glGetQueryObjectui64v instead with probably less overhead.[/QUOTE]
You are right, but it was just for you to understand there was a sync. Basically:
DELAY CALCULATION:
Program initialization:
glQueryCounter(id,GL_TIMESTAMP);
glGetQueryObjectui64v(id,GL_QUERY_RESULT,gpu_start_time);
cpu_start_time = clock::now();
During execution:
cpu_begin_time = clock::now();
glQueryCounter(id2,GL_TIMESTAMP);
{
// gl calls being timed
}
glQueryCounter(id3,GL_TIMESTAMP);
cpu_end_time = clock::now();
When results are available:
// get query results
glGetQueryObjectui64v(id2,GL_QUERY_RESULT,gpu_begin_time);
glGetQueryObjectui64v(id3,GL_QUERY_RESULT,gpu_end_time);
// calculate relative timestamps from start of program
cpu_begin_time -= cpu_start_time;
cpu_end_time -= cpu_start_time;
gpu_begin_time -= gpu_start_time;
gpu_end_time -= gpu_start_time;
// calculate durations
gpu_duration = gpu_end_time - gpu_begin_time;
cpu_duration = cpu_end_time - cpu_begin_time;
// calculate OpenGL pipeline delay (from call to execution)
async_delay = gpu_begin_time - cpu_begin_time;
Makes sense?