is this GPU hang?

debinair · June 29, 2017, 9:40pm

I have 2 threads A, B with 2 different contexts in ES. A updates the FBO-texture by rendering on it, B renders it on the screen using rendertexture. Now B is running continuously without any block. A after 2 frames doesn’t signal fence object at all, timeout occurs. So I tried putting glFinish() in A, now this call never returns. So it seems that GPU is stuck at particular instruction.
A is having calls like this:

  
    glBindFramebuffer(GL_FRAMEBUFFER, renderTexture_gl_frame_buffer_);
    glViewport(0, 0, width, height);
    glScissor(0, 0, width, height);
    glDepthMask(GL_TRUE);
    glEnable(GL_DEPTH_TEST);
    glDepthFunc(GL_LEQUAL);
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

after this, it calls glFinish() I have commented out rendering code for A for now.

Does the GPU has only 1 ring buffer for all the contexts? if so, commands from B are executed successfully while command from A is stuck. Is this possible for 1 ring buffer? whats going on here?

Silence · June 30, 2017, 12:42am

From what I know, and what is said in the wiki, FBOs are not sharable between contexts.
See the wiki:

Any OpenGL object types which are not containers are sharable
.
So you must use 2 distinct FBOs to do what you want to do. You can also try to use PBOs.

Depending on what you want to do, you might also adapt another different behavior. Commonly threads feed data and the main thread consume these data by feeding them to GL and renders them. Like this, a single thread needs to communicate with GL, which avoids such troublesome behaviors.

debinair · June 30, 2017, 1:01am

I am sharing textures here, which is allowed.

GClements · June 30, 2017, 4:09am

It’s possible that a deadlock is occurring due to each thread waiting on an internal mutex which is held by the other thread. Although I can’t really foresee any specific mechanism which might cause this with one thread reading and one writing. Does it help if B calls glFlush() periodically? What if each thread unbinds the texture from the FBO or texture unit between uses?

Dark_Photon · June 30, 2017, 5:38am

What are you hoping to gain with 2 threads/contexts talking to OpenGL ES that you can’t get with one?

There are a few reasons to do this, but not as many as you might think.

Does the GPU has only 1 ring buffer for all the contexts?

It depends on the driver architecture. One architecture that’s possible is for the user-mode GL driver to have separate command queues (ring buffers) per context, and then for the kernel-mode GL driver to multiplex these command queues together into a single device command queue for the GPU. That is, the kernel-mode driver decides when each context’s commands gets to run on the single, shared GPU resource. So at any given time, only one context’s commands could be executing on the GPU.

That’s oversimplifying a bit because some GL commands can be handled by the driver alone, sometimes GPUs have separate command queues for different types of requests (e.g. upload/copy vs. render), etc. But you get the idea: in most cases, there’s only one GPU doing the work.

Now B is running continuously without any block. A after 2 frames doesn’t signal fence object at all, timeout occurs. So I tried putting glFinish() in A, now this call never returns. … whats going on here?

It is possible to do what you’re talking about, but you have to be careful.

This is a standard producer/consumer model where A is producing generated textures for B to consume, and B is producing “free” textures for A to consume.

Use fence sync events to hand-off ownership of the texture(s) between threads without forcing expensive GPU stalls. I’d recommend multiple textures because you’ll very likely need that to keep a mobile GPU from stalling. Mobile GPUs have deep pipelines and rasterize a frame late, so you need multiple textures in-flight to avoid big pipeline bubbles (i.e. GPU stalls). glFinish() introduces a massive pipeline bubble on mobile due to this architecture and should be avoided like the plague unless you want to cut your frame rate by 2X or 3X.

A tip: After you issue a sync event, be sure to issue a glFlush() to force it to be pushed into the GL driver for immediate processing. If you don’t, it’s possible that you can get into a deadlock situation where thread A has signaled a texture is ready (issued a sync event) thinking that it’s told B, but the sync event is stuck in A’s outgoing command queue and never processed because the driver was waiting for more GL commands to be issued before implicitly flushing A’s queue to the kernel-mode driver for processing. Meanwhile, thread B is waiting for A’s sync event and never sees it because it never left A’s command queue. So B is hung, and A eventually hangs because B is not finishing rendering with and releasing textures back to A.

My guess is that something like this is happening with your hang.