PDA

View Full Version : glBindFramebuffer is causing performance drop in OpenGLES



debonair
03-07-2016, 05:57 PM
i am doing rendering in 3 FBOs so i have 3 passes. while rendering to 3rd FBO when I call glBindFramebuffer() its causing my application to run at lower fps. If I just use 3rd FBO for rendering in all 3 passes, i get same fps but if i use 1st FBO to render in all 3 passes I get higher fps. what might be the reason for this behavior?

__bob__
03-08-2016, 03:55 AM
The binding operation have hudge cost : if you bind 3 times, the cost is biger. Maybe you can use GL_TEXTURE_2D_ARRAY to have a FBO with 3 layers that can be bind just once...

This kind of problem will disapear with vulkan....

Dark Photon
03-08-2016, 06:14 AM
i am doing rendering in 3 FBOs so i have 3 passes. while rendering to 3rd FBO when I call glBindFramebuffer() its causing my application to run at lower fps. If I just use 3rd FBO for rendering in all 3 passes, i get same fps but if i use 1st FBO to render in all 3 passes I get higher fps. what might be the reason for this behavior?
You said OpenGL ES, right? Which GPU(s)?

Given OpenGL ES, there's a fair bet you're targeting an embedded tile-based (sort-middle) GPU such as PowerVR, Mali, or (sometimes)Adreno. Unlike desktop GPUs, these have an extremely high cost of switching framebuffers, especially if you use them in such a way that it triggers a full pipeline flush (and sometimes sync). What you have to keep in mind with these GPUs is that (unlike desktops) as you're submitting work for one framebuffer, normally none of the fragment work is being done during the entire frame that you're submitting the work. All that fragment work is queued on the framebuffer object for later execution. When this fragment work is executed, it's executed on screen-tiles in very high-speed on-chip cache.

If you do something like reconfigure an FBO that you just rendered to, you could very well trigger a full pipeline flush and sync, which will really hurt your performance. You may need a pool of FBOs you LRU between to ensure that you don't stall the pipeline. Check with the GPU vendor's developer documentation or support forums for details. You should also be conscious about telling the GPU what framebuffer buffers not to read in from slow CPU DRAM at beginning of frame (with glClear) and not to write out at end of frame (with gl{Invalidate,Discard}Framebuffer), as DRAM reads/writes for large framebuffers are expensive.

If you post some code showing all of your framebuffer binds, FBO reconfigs (e.g. glFramebufferTexture2D), glClear*, and gl{Invalidate,Discard}Framebuffer calls, folks here might be able to offer some tips to help you optimize things.

Dark Photon
03-08-2016, 06:26 AM
The binding operation have hudge cost : ... This kind of problem will disapear with vulkan....

IANAVE (I am not a Vulkan expert), but...

As far as I know, no. In Vulkan we have render passes (http://blog.imgtec.com/powervr/trying-out-the-new-vulkan-graphics-api-on-powervr-gpus). Just as now you can create bottlenecks by using more framebuffer binds and reconfigs than necessary, in Vulkan you can use more render passes than necessary. The burden still falls on the developer to be efficient here.

Dark Photon
03-08-2016, 06:30 AM
By the way, is there any potential to restructure your code using MRT instead of 3 separate passes? Possibly writing to writing to some RTs in earlier passes and then reading from those RTs to write the other RTs in subsequent passes? That is something that can often be made very efficient on tile-based GPUs (because the RT data can often be kept completely on-chip). You'd potentially only have one framebuffer write to DRAM and zero read-ins from DRAM, not 3 framebuffer writes and 2 reads (or worse).

debonair
03-08-2016, 10:50 PM
I am using Mali GPU and ES.
I am using external library which is giving me FBO id to render for so I have to make my final render on that FBO. If I do all my 3 passes in that FBO only still I get fps drop. but if I create my own FBO and make all 3 passes with that FBO only and don't use the FBO which I am getting from external lib, I get fps gain. So I captured gl calls this library is making:



D/libEGL (21072): glGetError();
D/libEGL (21072): glEnable(GL_BLEND);
D/libEGL (21072): glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
D/libEGL (21072): glDepthMask(GL_FALSE);
D/libEGL (21072): glDisable(GL_CULL_FACE);
D/libEGL (21072): glActiveTexture(GL_TEXTURE0);
D/libEGL (21072): glBindTexture(GL_TEXTURE_2D, 2);
D/libEGL (21072): glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
D/libEGL (21072): glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
D/libEGL (21072): glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
D/libEGL (21072): glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
D/libEGL (21072): glUseProgram(24);
D/libEGL (21072): glUniform4f(1, 1, value);
D/libEGL (21072): glBindVertexArray(6);
D/libEGL (21072): glDrawElements(GL_TRIANGLES, 0, GL_UNSIGNED_SHORT, (const GLvoid *) 0x00000000);
D/libEGL (21072): glBindVertexArray(0);
D/libEGL (21072): glEnable(GL_CULL_FACE);
D/libEGL (21072): glDisable(GL_BLEND);
D/libEGL (21072): glDepthMask(GL_FALSE);
D/libEGL (21072): glGetError();
D/libEGL (21072): glDisable(GL_DEPTH_TEST);
D/libEGL (21072): glDisable(GL_CULL_FACE);
D/libEGL (21072): glClearColor(0, 0, 0, 1);
D/libEGL (21072): glEnable(GL_SCISSOR_TEST);
D/libEGL (21072): glScissor(0, 0, 1024, 1);
D/libEGL (21072): glClear(GL_COLOR_BUFFER_BIT);
D/libEGL (21072): glScissor(0, 1023, 1024, 1);
D/libEGL (21072): glClear(GL_COLOR_BUFFER_BIT);
D/libEGL (21072): glScissor(0, 0, 1, 1024);
D/libEGL (21072): glClear(GL_COLOR_BUFFER_BIT);
D/libEGL (21072): glScissor(1023, 0, 1, 1024);
D/libEGL (21072): glClear(GL_COLOR_BUFFER_BIT);
D/libEGL (21072): glScissor(0, 0, 1024, 1024);
D/libEGL (21072): glDisable(GL_SCISSOR_TEST);
D/libEGL (21072): glInvalidateFramebuffer(GL_FRAMEBUFFER_OES, 2, (const GLenum*) 0x72c98f4c);
D/libEGL (21072): glBindFramebuffer(GL_FRAMEBUFFER_OES, 0);
D/libEGL (21072): glFlush();
D/libEGL (21072): glBindFramebuffer(GL_FRAMEBUFFER_OES, 5);
D/libEGL (21072): glViewport(0, 0, 1024, 1024);
D/libEGL (21072): glScissor(0, 0, 1024, 1024);
D/libEGL (21072): glDepthMask(GL_TRUE);
D/libEGL (21072): glEnable(GL_DEPTH_TEST);
D/libEGL (21072): glDepthFunc(GL_LEQUAL);
D/libEGL (21072): glInvalidateFramebuffer(GL_FRAMEBUFFER_OES, 3, (const GLenum*) 0x72c98f78);
D/libEGL (21072): glClear(GL_DEPTH_BUFFER_BIT);



I dont think there is any suspicious call above which will cause fps drop. I don't think binding to default FBO and calling glFlush() will make app to run slower. I might have to use DS5 for this now.

nileshshah89
03-09-2016, 02:00 AM
Thats correct, switching the Framebuffer will flush the commands for the previous bound framebbuffer which will be very costly.
Though there are few optimization in place in modern driver to avoid flushes, based on the dependency of previous framebuffer rendering on current framebuffer.

Dark Photon
03-09-2016, 06:38 AM
A few thoughts after looking at your code: if you're rendering to the same FBO every frame, this is unlikely to parallelize well. In other words, when you start to render to the FBO again, this could cause the driver to stall until the previous rendering commands associated with the FBO are complete. If this is your case, try using 2 or 3 FBOs and round-robining across them.

Also (important but less important), it's not clear that you are clearing and invaliding all the buffers that you could (though you might be), so you might double-check that. In particular, I would issue a glClear for the entire screen for all buffers in your framebuffer (with all writemasks enabled and no scissor rectangle) immediately after a glBindFramebuffer call.

debonair
03-09-2016, 06:12 PM
Thanks guys for your help.!

I found that my external library is already using 6 FBOs instead of one. When I render into that FBO, I am calling glInvalidateFramebuffer() with depth, stencil and color attachment. It removed my bottleneck at bindFBO(). But I am still wondering how did that help as the color attachment has already been used before by bindFBO call for rendertotexture and now GPU should not be waiting to complete the render onto it.