The problem: as it is now (at least on NVIDIA) the driver implements glClientWaitSync with busy-wait instead of releasing the CPU.
I know that releasing the CPU imposes context-switching, which is heavy operation and has bigger latency but sometimes it is really needed.
For example in one my application i need to have a “waiter” thread with the sole purpose to block on fences and raises flags when fences are passed while consuming as little CPU as possible,
whereas various other threads are doing hard work on the CPU (the opengl drawing is done by another thread with shared context).
The working threads needs all the available CPU and wasting it for busy-waiting is extremely unwanted, it degrades the overall performance a great deal.
In contrast, the bigger latency of glClientWaitSync if it was blocking instead of busy-waiting would be completely ok.
My suggestion: Please define a new flag for glClientWaitSync that forces the driver to block the thread (release the CPU) instead of doing busy-wait.
Also it would appear that the driver is doing other internal busy-waits. This is seen by the abnormal CPU consumption by internal driver
threads for no apparent reason.
Again, there are cases when the latency of the wait operaions is less important than the CPU utilization.
Please provide means for the application to express it’s preferences between lower-latency or lower CPU wastage by the driver. Maybe use the opengl hint mechanism.