avoiding the default framebuffer blit overhead
First I will describe the problem.
As we know the default framebuffer (0) is remnant from the past which for some mysterious reason opengl is still dragging along like a bag with stones.
It is very un-flexible and totally alien to many modern-day ways of doing things, e.g. deferred rendering.
One would often need to be able to combine freely various color/depth/stencil buffers, which is easy with the FBO infrastructure.
But when we need to display something there is a problem. The final image to be displayed is often not generated in the default framebuffer,
because we need the flexibility of FBOs. For example we may need the depth buffer used to render the scene available as a texture or something.
Then we need to blit to the default framebuffer. This adds overhead, which may be something like 1-2 milliseconds per frame.
In direct3d the colorbuffer that can be displayed (swapchain) is a pure colorbuffer-only object from the POV of the renderer and can be combined with other buffers just like the non-displayable ones.
This is unlike the opengl default framebuffer, which drag it's own depth buffer (or has none) and can not be changed.
I experimented a bit with the nvidia WGL_NV_DX_interop2 extension.
I created some d3d11 device with it's swapchain, then using the extension, setup a opengl renderbuffer that corresponds to the swapchain backbuffer.
Then i did some rendering on the opengl while using the d3d's way of presenting image to a window.
After some tweaking i managed that to run faster than opengl's own way using blit.
All the rendering was just a glClear(GL_COLOR_BUFFER_BIT) and then present the result.
I tested 3 cases:
a) opengl clear + opengl present (using blit to the default fb)
b) opengl clear + d3d present
c) d3d clear + d3d present.
b) and c) are equally fast and a) is noticeably slower than them.
The mentioned tweaking included removing of the synchronization calls (wglDXLockObjectsNV and wglDXUnlockObjectsNV)
I only call wglDXLockObjectsNV once and the objects stays locked all the time (otherwise opengl generates GL_INVALID_FRAMEBUFFER_OPERATION)
the render loop is basically
glClearColor(0, rand()%256*(1.0f/256), 0, 1);
the backbuffer of the swapchain is bound to the opengl draw framebuffer.
Also when the swapchain is created, the BufferUsage must include the DXGI_USAGE_RENDER_TARGET_OUTPUT flags, otherwise the performance is crippled.
It is a shame that this ugly hack actually outperforms the opengl's native way to output it's graphics.
I think it is about time they get rid of the default framebuffer.
They can look at the ipad for an idea how to do it.