LP: Multi-threading?

One area that I haven’t seen mentioned anywhere but that I think is important is multi-threading. With the rise of the multi-cores LP would be a good time to make it clear what can and cannot be done with multiple threads (which is not well defined in 2.1).

I can see a few areas where MT might be interesting, mainly creating expensive objects (shaders etc.) and filling buffers. The latter might not even need a full context in the thread, there just need to be some guarantees about visibility of mapped memory in different threads. For the former and for more generality a context might be needed in each thread, but it might not have to be a full context, as the secondary threads will never try to draw anything and will not need direct access to the hardware (besides memory).

Any other ideas of what you would like to see in that context? Any comments from the ARB on what the current thoughts on that are? Or is that something that will have to wait for ME?

Thanks

Dirk

The Longs Peak object model was designed with consideration for multithreading. While GL2 has an all-or-nothing sharing model, LP allows objects to be shared (or not shared) on a per-object basis. For example you might wish to share one buffer but not all buffers. In addition, the objects themselves have some interesting properties:

  • state objects (e.g. shaders, texture filters) are immutable, thus no race conditions are possible with their usage.

  • data objects (e.g. buffers, images) have immutable structure but mutable data (e.g. texels), thus no “dangerous” race conditions are possible, although a badly written app can get unexpected results.

  • container objects (e.g. VAOs, FBOs) are inherently unsafe to share, thus we disallow sharing of these objects. They are inexpensive to create however, so creating a copy per context (in the uncommon case where this is required) should be efficient.

The class of multithreaded operations you describe (e.g. object creation and data propogation) sound like reasonable approaches. We will document the steps required to ensure an object modified in one context is “ready to use” in another.

One of the nice things about moving to the new object model is that it gives us the opportunity to precisely define sharing behavior that is undefined by the 2.1 spec and de facto vendor-dependent. We don’t want that to happen again, so are being careful to fully spec out behavior of shared objects and operations on them.

I am sure, this will be possible, but just to let you know, what my intended usage-pattern would be:

I think in the future it will be more and more important to stream data to the GPU, for faster loading times at startup and for more detailed worlds in general.

Therefore, i’d like to be able to have a streaming-thread, that loads textures and meshes, while the rendering-thread continues to render. One big burden in the past was, that you needed to bind objects (like textures) to load them, which could mess up your state-machine. That one will be gone, i am happy about that.
However, i don’t know if i want my objects to be fully shareable, if all i want to do is to upload it once from a separate thread. If sharing objects means no penalty, or at least not, as long as i don’t use it on several contexts simultaneously i don’t mind. If it incurs a big penalty in general, i’d like to have a special “upload in parallel”-flag so that i don’t need to make it shareable.

Jan.

I´d like to see that rendercontexts, which belong to different GPUs (but same vendor/opengl driver), can share objects, too. This would make working with multiple GPUs so much easier…

I know, it might be problematic to share anything that the GPU renders into (RTT targets, transform feedback buffers etc). But this can be solved by disallowing sharing of such objects or by providing functions where my application can tell OpenGL “now please synchronize the following objects between context A and B (maybe C…)”

I think multi-GPU stuff is an entirely different conversation.

Sorry, didn’t want to hijack this thread… just thought it might fit.

Nothing in the Longs Peak design precludes that which you ask. In fact that’s a benefit of per-object sharing, in contrast to per-context sharing. Shadowing resources across GPUs can be expensive, so allowing it to be done selectively and at user control is a Good Thing.

However as Korval said, that’s a different conversation. Longs Peak itself will not provide APIs for assigning contexts to specific GPUs.

Therefore, i’d like to be able to have a streaming-thread, that loads textures and meshes, while the rendering-thread continues to render.
Are contexts per-thread or per-process? It seems a little silly to create a whole context for a thread that’s never actually going to bind anything to the context. Since LP is object-based, as long as the rendering thread is not aware of/using the texture until the upload is done, I don’t see the point of making a new context when it can just say, “lpImageData2D”.

Should I be wrong in any of the following, I expect a slap on the head. :slight_smile:

Korval, a context is a per-process object. Isn’t this one of the intended benefits with LP, that you can effectively just hand off an LP object to any of a number of worker threads and let them “do their thing” with it, whether it’s creating textures, uploading vertices or compile programs. So long as all that processing is done at point-of-usage (but that, usage, can still only be done from the thread “owning” the context - just like it is in 1.2+).

Also, when it comes to mapping/mapped data, it’s pretty obvious it’s process-wide. The address-space for a thread is shared by the process it belongs to (obviously). Hardware usually won’t remap anything of the process’ address space for each thread (except maybe for a page or two of thread<->operating system communication area, but that’s about the extent of it and way off topic).

This is AFAICT the only reasonable way forward. Compare it with e.g. C++, and replace “LP handle” with “C++ object”. They are thread oblivious (by design!), and if you need concurrent access from multiple threads it’s up to your code to provide that synchronization (to not slow down the cases not needing the synchronization).

Again, slap me on the head if I’m wrong.

Originally posted by tamlin:
Also, when it comes to mapping/mapped data, it’s pretty obvious it’s process-wide. The address-space for a thread is shared by the process it belongs to (obviously). Hardware usually won’t remap anything of the process’ address space for each thread (except maybe for a page or two of thread<->operating system communication area, but that’s about the extent of it and way off topic).
That’s what I would think, too, but in a multi-processor system with multiple graphics cards, I wouldn’t bet on having everything mapped into each thread’s address space. That’s a major point I would like to see some official clarification for…

Dirk

I wouldn’t bet on having everything mapped into each thread’s address space.
That’s (one of) the difference between a thread and a process: threads don’t have address spaces. Processes do. There is such a thing as “thread-local data”, but it only exists to the extent that you don’t pass pointers from one thread to another. If you do, then any thread can access it.

If a GL implementation maps a buffer, and you pass that pointer to another thread, that thread can access the memory just fine. Now granted, this is incredibly dangerous, since that other thread can unmap the buffer at any time. But it is legal and there’s nothing the GL implementation can do to stop you.

If a GL implementation maps a buffer, and you pass that pointer to another thread, that thread can access the memory just fine.
People were even encouraged to do so. For instance in order to decode video into a PBO in a separate thread while the main thread would continue rendering (and of course, not touching the mapped PBO while the other thread writes data into it). IMHO a very useful thing.

Here is an excerpt from the ARB_vertex_buffer_object extension.

More restrictive rules were considered (for example, “after calling MapBuffer, all GL commands except for UnmapBuffer produce errors”), but this was considered far too restrictive. The expectation is that an application might map a buffer and start filling it in a different thread, but continue to render in its main thread (using a different buffer or no buffer at all). So no commands are disallowed simply because a buffer happens to be mapped.