I know, I know. This is a topic that has been discussed a lot. I've never seen a solution suggested like the one I'm trying to do though, so bare with me.

In the past I did all the GL calls and SwapBuffers() calls in a single thread and everything was great. However my windows were actually created in another thread so that all the windows event and the message pump is handled by that other thread.
When Kepler cards came along under certain conditions (certain Quadro profiles + Mosaic) my SwapBuffers() calls would hang. Nvidia told me this was because DeviceContexts are actually thread-affine, and GDI functions should only be called on them in the threads that created them. Quote from Nvidia "Calling GDI functions using an HDC from the non-hdc-affine thread has always been wrong (OpenGL is a GDI api and GDI objects like HDCs are thread-affine), but the failure cases are in general hard to repro."

So I moved my SwapBuffers() to the window's thread with it's own GL context, and using GLsync objects and a renderbuffer I do all my rendering in the main thread still, and blit a framebuffer onto the DeviceContext before doing the SwapBuffers() in the window's threads. This works, but seems to have a large performance impact.

My new idea is to actually only do the SwapBuffers() in the window's thread, and not have a 2nd GL context at all. My theory is that since SwapBuffers() is actually a GDI function, and doesn't care about GL at all.
So I've tried it out and it actually works. I now do all my rendering in my main thread with a single context, including blitting the final image to the DC, then I call SwapBuffers() from the window's thread. No GL context is ever made current in the window's thread. I had update issues to start, but as a test I used glFinish before I sent the message to the window's thread telling it to swap the buffers, and the update issues went away, which makes sense.

The question is:
1. Am I just lucky that this is working, and will it not work on other random drivers/GPUs?
2. How do I properly synchronize this method, so SwapBuffers() happens after the blit is finished, and I don't start blitting before the previous SwapBuffers() has finished? Since I don't have a 2nd GL context in the window's thread I can't use GLsync objects there, although I can use them in the main thread. For the 2nd part, will attempting to blit to the DC from the main thread implicitly synchronize with the SwapBuffers()?

Thoughts? I'm also asking Nvidia this and will respond with anything they give me.