RFC by the ARB synchronization working group

In the ARB synchronization working group, we have been developing a new extension intended to support a general model of synchronization objects associated with different types of events. Initially only fence events are defined, analogously to the NV_fence extension, but the framework we’re defining is intended to include other events such as display retrace, timers, etc.

We would like to solicit comments and feedback on this extension from the opengl.org developer community. Please followup to this thread after reviewing the current extension specification, which is located here .

Thanks!

Jon Leech (for the arb-async working group)

I only skimmed through the spec but this looks very interesting, especially that this aims at being generic.

If I understood well, this would add a fine grained glFlush, depending on several conditions.

I am wondering if this proposed extension, including a display retrace event, would be able to provide the following feature “vsync only if faster than refresh rate” :

  • if at least one display retrace occured, draw as fast as possible (ie. no vsync)
  • else wait for next retrace (vsync)

Pros :

  • no resources wasted to display faster than screen
  • no tearing if graphic card is fast enough
  • much faster framerate when just below display refresh rate
    Cons :
  • still tearing if card too slow

Another question, would this extension be able to provide the D3D10 “predicated rendering” ? Or would that need an evolution of occlusion queries ?
It is mentionned here :
ftp://download.nvidia.com/developer/presentations/2006/gdc/2006-GDC-DX10-Prepping-Your-Engine.pdf
Quoting Zeross :
“It’s exactly like occlusion query excepted that instead of polling the results on the CPU, everything is done on the GPU : the results of the query is inserted in the command queue. If the result is not available at the time of the drawing command then the GPU considers the result to be true and execute the command anyway.”

I am very excited to get this functionality in GL. Would it be possible to add unique support for texture loading? A common use for fences are to perform synchronized paging of textures. How would this extension be affected by multiple GPUs?

Originally posted by ZbuffeR:

I only skimmed through the spec but this looks very interesting, especially that this aims at being generic.

If I understood well, this would add a fine grained glFlush, depending on several conditions.

I am wondering if this proposed extension, including a displyer retrace event, would be able to provide the following feature “vsync only if faster than refresh rate” :

  • if at least one display retrace occured, draw as fast as possible (ie. no vsync)
  • else wait for next retrace (vsync)
    That would depend on how the retrace event was constructed. We haven’t thought about this a lot yet in the context of this extension, although 3Dlabs worked through similar issues in their proposed GL2_async_core extension. It is likely that a retrace event would be a “pulsed” event that simply toggles the sync object status on and off (which makes polling basically useless for these types of events, but would release any ClientWaitSyncs blocked on a retrace sync object).

To do what you’re suggesting, it seems like the retrace event would need to place some associated data in the sync object, like a frame count (in OpenML dmedia terminology, the “SBC” or “Swap Buffer Count”). Then you could compare the latest SBC recorded in the sync object and decide what to do based on that.

Another question, would this extension be able to provide the D3D10 “predicated rendering” ? Or would that need an evolution of occlusion queries ?
Again this extension could be the basis of this functionality, but we’d need more. In the future we already expect to have WaitSync, which is an equivalent of ClientWaitSync but blocks the GPU command stream, instead of the application. But WaitSync doesn’t seem like the right thing for predicated rendering, instead we’d want some way to (a) query the sync object status in the command stream (TestSync?) and (b) execute (or not) some block of commands / display list based on that status.

There is some ongoing discussion of predicated rendering for OpenGL ES, although I haven’t been following that closely. I believe Khronos considers it a bigger win on the types of hardware found in mobile devices.

Occlusion queries could be merged into the sync object framework easily. I suspect that will happen downstream in the “OpenGL 3.0” timeframe.

Jon

Originally posted by jtipton:
I am very excited to get this functionality in GL. Would it be possible to add unique support for texture loading? A common use for fences are to perform synchronized paging of textures. How would this extension be affected by multiple GPUs?
Do you mean support for asynchronous texture loading, so you can download a large texture and go off and issue other commands while that’s happening? If so then that is definitely on the agenda - there are a lot of different vendor-specific extensions for this today. With the base async functionality, you could load a texture in one thread, issue a fence right after the TexImage, and block on that fence in the rendering thread when that texture is required. In the future, something more like an explicit TexImageAsync() might be useful, so this can all be done within a single thread. Some of the pixel buffer object APIs may be useful here, too. This is discussed some in the issues list but we don’t have a complete answer yet.

For multiple GPUs, it depends on how they’re wired up. In something like SLI, where the driver really hides the fact that there are multiple independent GPUs from the app, then sync object implementation would be a lot more complicated under the hood - but it would be up to the driver to sort that out, not the app. With something like Chromium or OpenGL Multipipe, where the multiple GPUs are running under different graphics drivers and a frontend layer does command redistribution, implementing sync objects transparently to the app may be quite difficult. The command redistributor doesn’t have access to driver internals and would have to do all the notification using only the public sync APIs. TBH I’m not sure that’s even possible, and if it is, it seems likely to introduce a good deal of latency to the use of sync objects.

Jon

It is likely that a retrace event would be a “pulsed” event that simply toggles the sync object status on and off (which makes polling basically useless for these types of events, but would release any ClientWaitSyncs blocked on a retrace sync object).
Yes, exactly, this is what we need, the availability of a superior mechanism to avoid CPU polling (very high CPU usage or needless thrashing of the CPU). Which is just a crude and terrible waste of valuable processing resources and can cause performance problems, overheating and also hamper multitasking.

The sooner we get a retrace event the better for OpenGL, it is long overdue in my opinion.

Anyway, brilliant stuff and keep up the excellent work!

Originally posted by ZbuffeR:
Another question, would this extension be able to provide the D3D10 “predicated rendering” ? Or would that need an evolution of occlusion queries ?
It is mentionned here :
ftp://download.nvidia.com/developer/presentations/2006/gdc/2006-GDC-DX10-Prepping-Your-Engine.pdf
Quoting Zeross :
“It’s exactly like occlusion query excepted that instead of polling the results on the CPU, everything is done on the GPU : the results of the query is inserted in the command queue. If the result is not available at the time of the drawing command then the GPU considers the result to be true and execute the command anyway.”

Predicated rendering has actually been implemented in Nvidia hardware since NV40. They even have an extension for it called GL_NVX_conditional_render, but for some reason a spec has not been released. It’s not hard to figure out how it works, though, by looking at the driver binaries.

Originally posted by Eric Lengyel:
They even have an extension for it called GL_NVX_conditional_render, but for some reason a spec has not been released. It’s not hard to figure out how it works, though, by looking at the driver binaries.
You mean you are using it successfully ?

[quote]Originally posted by ZbuffeR:
[b]

  void render_query()
  {
    glBeginQueryARB(GL_SAMPLES_PASSED_ARB, m_query_obj);
    render();
    glEndQueryARB(GL_SAMPLES_PASSED_ARB);
  }

  void render_conditional()
  {
    glBeginConditionalRenderNVX(m_query_obj);
    render();
    glEndConditionalRenderNVX();
  }  

The first function do the occlusion query, the second one renders if the result of the query is that it has render 0 samples.

Hope this helps.

Yep – that’s it. The conditional render will happen, though, if the occluder rendered at least 1 fragment without failing the depth test.

Can I talk you into markers?
Roughly speaking, the changes are:
[ul][li]Fence() would return a marker as an int value.[]ClientWaitSync would additionally take a marker and return when the most recently received marker satisfies is newer.[/li]( newer = (receivedMarker - clientMarker) => 0 )[li]Add a query for the most recently received marker.[/ul][/li]My motives are:
[ul][li]No race conditions in “stacking” [/li](easier to specify, implement and use)
(Adresses issues 8 and 21.)[li]Can easily monitor how much is buffered to the GPU, since I can have multiple markers on a single sync object.[
]Easily extends to other uses, such as keeping track of vertical retrace count, using a specialized sync object.[/ul][/li]We did something like this in RealTimeX (at Concurrent Compter) about 12 years ago, and it worked well.


On a different train of thought, can you provide a way to bind a sync object to one of the three major subsystem sections - download, draw and upload? This would let me monitor asynchrounous uploads and downloads more conveniently within a single context. (This adresses Issue 16)

Thanks, Ray Tice

I haven’t yet read the proposal, but I will soon. I was just wondering if it would be possible to have a ms or better time stamp for when the fence was complete. This would allow asynchronous notification via a synchronous function call interface. Kinda the best of both worlds for application developers and driver writers. No callbacks, but reasonably exact timings. Just a thought…

Originally posted by ccbrianf:
I was just wondering if it would be possible to have a ms or better time stamp for when the fence was complete.
We know what the API for adding timing information to sync objects would look like - basically you would specify an additional flag at sync creation time that says “record event time”, and there would be an additional queryable property that returns the timestamp of the last event that signalled the sync. However, we chose not to support that in the initial extension. I suspect it will come as a layered extension on top of the base async extension fairly soon, though.

Jon Leech

Thanks for update. This is interesting stuff which might come in very handy.

Is the first implementation of sync objects already meant for use by multiple threads? So would it be possible that a one thread puts fences into the command stream which are queried by another thread?

Is the first implementation of sync objects already meant for use by multiple threads? So would it be possible that a one thread puts fences into the command stream which are queried by another thread? [/QB]
Yes, that is the idea. One thread can insert a fence into a command stream for one context while another thread (using a second OpenGL context) waits for that fence to complete.

Barthold
ARB Async workgroup chair

Finnaly… I really need this. Is this extension related only for new upcoming GPU’s or it can be implemented on older hw?