OpenGL thread loading

Is it possible to implement thread loading in OpenGL ?

I want my worker thread to do the following while the main thread (rendering thread)
draw some simple preloaded looping animated texture.

1.Load texture from disc using PBO (need GL call).
2.Load model data (no thing to do with GL).
3.initialize VAO , VBO data from data load from disc (need GL call).

since OpenGL context will have to be make current in single thread at a time.
How am I going to handle the concurrency ?

Thank in advance.

Instead of dealing with a single OpenGL context to share between two threads,

  1. you create one OpenGL context per thread that share the same resources (PBO,VBO,texture objects,display lists) and you make each of them current to the corresponding thread.

  2. You still need some thread synchronization code. If you have at least OpenGL>=3.2, then you can use the fence sync objects

  3. Creation of OpenGL contexts sharing the same resources:

With GLX you can do that with glXCreateNewContext() with the “share_list” argument:

GLXContext glXCreateNewContext(Display *dpy,
GLXFBConfig config, int render_type,
GLXContext share_list, Bool direct);

GLXContext mainContext=glXCreateNewContext(dpy,config,render_type,NULL,direct);

GLXContext workerContext=glxCreateNewContext(dpy,config,render_type,mainContext,direct);

On Windows, you can use wglShareLists(): wglShareLists function (wingdi.h) - Win32 apps | Microsoft Learn

On Mac,

  • Use the share argument for the initWithFormat:shareContext: method of the NSOpenGLContext class.
    or
  • Use the share parameter for the function CGLCreateContext.

Scroll down to the end, Sharing Rendering Context Resources:
http://developer.apple.com/mac/library/d…1987-CH216-SW12

Look also at the end of this page about “Use Multiple OpenGL Contexts”:

http://developer.apple.com/mac/library/d…01987-CH409-SW1

  1. Thread synchronization. If you have at least OpenGL>=3.2, then you can use the fence sync objects otherwise you limited to traditional thread synchronization.

For details, see the OpenGL spec 4.1 core ( http://www.opengl.org/registry/doc/glspec41.core.20100725.pdf ), page 312, section “5.3 Sync Objects and Fences”. See also Appendix D “Shared Objects and Multiple Contexts”, page 409.

Not sure if I understand this right.

  1. My application is OpenGL 3.2 core profile and doesn’t use any display list. Is the wglshareList method still apply (The reference doesn’t mention any shareable thing beside display list)?

2.Is this order of operation correct ?

1.create a worker thread (pass main thread’s HGLRC as thread argument)

2.in worker thread function before I do any thing create new rendering context with the same parameter as main thread (This gonna be a problem since I use GLFW/GLEW to initialize old opengl context then delete it to create OpenGL 3.2 context).

3.call wglShareList in worker thread.

4.do loading , GL buffer filling in worker thread.

5.delete worker thread context before exit worker thread

Thank

@samboon

stay away from having OpenGL worker thread.

You do not need OpenGL context to fill the PBO/VBO/…
Create PBO/… is OpenGL thread, map it to memory and pass this pointer to pure CPU worker thread, fill the data there. Once done, signal it in some way. Then the main OpenGL thread can unmap it and use it.

This actually works, having two OpenGL contexts is not a solution for better performance. Some gfx drivers have global lock. This blocks the main thread.

If you do use a context per thread, VAOs are not shared among contexts. As overlay suggested, see Appendix D of the spec. You can still create your VBOs in the worker thread but the VAO needs to be created back in the rendering thread.

Regards,
Patrick

But even for that first frame, is it noticable? Not so much. You end up with something like client arrays perf for those, which isn’t too bad. Worst case, maybe you break just the first frame after you bring up a level if you were close to breaking anyway.

Where are the textures in all this? Do you simply upload all textures there are? An then rely on GL to swap them in and out? The client arrays analogy also worries me; it means all batches need to be kept around in system memory for possible uploading.

Yes, more specifically, just append it using the Streaming VBO approach Rob Barris describes here.

But how do you keep track of which batches are in the cache and which aren’t? I’ve used std::map<BatchHash, void const*> for something similar before and it featured as a prominent time waster in the profiler report.

Maybe a part of the cache VBO is permanent and only parts are discardable?

I’ve had the following vbo layout in mind:

|
| really, really important batches
|
| streaming batches
The really, really important batches stay in there all the time, while the streaming batches are overwritten after the VBO fills up. But you probably can’t do this, because a worker thread is doing the VBO loading and you need to orphan the VBO, so the GL can still fish something out of it, that it may need.

Re alignment, I use the trick offered in that thread of just rounding offsets up to multiples of 64 or something nice. No real cost or benefit for it that I’ve seen, but you can if you want. And no problems here with dumping attributes and indices into the same VBO.

Yeah, in my test, I realigned to the 4 byte boundary. I’ve found 64 wasteful, but I am probably gonna try someday. None of the alignment stuff is particularly well documented.

Think so, but I’d round each up to a multiple of 64 (lastest-gen cache line size) just for kicks to see if that fixes your problem.

It does.

Interesting. Which card is that? I’d like to bookmark that thought.

It’s the Radeon HD 5450. The ATI drivers historically had many problems so this bug does not surprise me at all. I don’t see how it could be mine. Why would GL render new random stuff from frame to frame, even when I don’t change the viewpoint nor the geometry, i.e. I don’t change anything in the scene from frame to frame, but the results are different from frame to frame. When I change the alignment to 2 (bad performance) or 4 (good performance), the scene starts rendering correctly.