PDA

View Full Version : Clearing DEPTH_COMPONENT of type UNSIGNED_SHORT in gles2 using OES_depth_texture



tikotus
05-29-2017, 07:41 AM
Hi,

I'm trying to implement shadowmapping in opengl es 2.0 and ran into a problem.

I'm rendering the depth to a GL_DEPTH_COMPONENT texture with type GL_UNSIGNED_SHORT. I have made sure OES_depth_texture is available. Rendering works fine but clearing isn't working (properly). To be on the safe side I'm enabling and clearing anything I can find. This should work, right?



glClearDepthf(1.0f);
glDepthMask(GL_TRUE);
glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);


For some reason this doesn't clear the texture on my android device using Mali-400 MP. Same implementation works in OSX and even in WebGL. I haven't tried other android devices.

My guess is that this has something to do with the fact that the depth texture is of integer type. In gles3 glClearBufferiv was introduced for clearing integer buffers. This isn't available in gles2 (probably because integer types aren't a valid format without extensions). Could this be the case?

How are you guys clearing depth attachment FBOs? Does it work for you on all devices?

Any suggestions for going around this issue?

tikotus
05-31-2017, 01:29 AM
An update on this:
It's not only the depth texture that isn't cleared. I get similar problems also with a normal depth buffer when rendering to texture. The issue almost disappears when I reduce the size of the texture, but not quite, still some flickering. Might be a memory issue? I'll try to optimize my memory usage and see what happens.

Silence
05-31-2017, 03:20 AM
Is the FBO bound when you call to these functions ?
Which draw buffers are set before your clear attempt ?

mhagain
05-31-2017, 03:20 AM
.....The issue almost disappears when I reduce the size of the texture.....

Have you scissoring enabled (and a scissor rectangle set)?

tikotus
05-31-2017, 04:40 AM
Is the FBO bound when you call to these functions ?
Which draw buffers are set before your clear attempt ?

Yes, the correct FBO is bound.
glDrawBuffers isn't available in OpenGL es 2.0, so I guess no draw buffers are set? Or what do you mean by this?


Have you scissoring enabled (and a scissor rectangle set)?

I haven't enabled scissoring at any point. Trying to disable it anyways yields no result.


I really don't think this is a programming issue, unless I've forgotten some critical step in setting up the buffers. The behaviour is too random. Sometimes the buffer is cleared every few frames, sometimes almost never, sometimes it clears it but nothing is rendered on it (or it's cleared for some reason after rendering?)... It feels like the driver is buggy or I'm doing some out of bounds memory operations that cause this random behaviour. I don't think it's the latter because the same solution works on osx and even WebGL. What could cause such weird buggy behaviour on this GPU? My FPS is pretty low, is it normal that GPUs start acting buggy when overloaded?

tikotus
05-31-2017, 05:12 AM
More input: I'm using SDL for windowing. If I add a 50ms delay after SDL_GL_SwapWindow, everything looks good (except for the frame rate). 10ms delay doesn't affect the result much. Maybe there's an issue with doublebuffering? I tried to disable it but it didn't do anything. Not sure if it's even possible to disable it.

Silence
05-31-2017, 06:07 AM
You might need a call to glFinish at some points of your code.
You most certainly did not managed any double buffering in your FBOs. Or are you ? (And even if, this won't be managed by any SwapBuffer functions or so).

Plus, answering prior questions would definitely help to solve your problem.

tikotus
05-31-2017, 06:26 AM
I tried to answer the questions but the reply went to moderator review. I guess it appears at some point. But no, I don't use scissors and gles2 doesn't have drawbuffers, so the question is irrelevant, right?

No, I'm not double buffering FBOs. Just wondering if SDL's double buffering messes up something somewhere. Just a wild guess.

I tried glFinish. And it actually does something!
It drops the framerate to what I had with 50ms delay. It also produces the same result as the 15ms delay. Not sure if it works by accident (because of the slowdown) or if it actually causes a finish that SDL_SwapWindow for some reason doesn't do.

Dark Photon
05-31-2017, 06:37 AM
I tried glFinish. And it actually does something!
It drops the framerate to what I had with 50ms delay. It also produces the same result as the 15ms delay.

I would expect that since you're using a mobile GPU. A full pipeline flush is very expensive on mobile as it defeats the DRAM-bandwidth-saving features of tile-based GPUs.

To get more ideas from folks, you might post some code. Best case, make this a standalone GLUT or SDL test program folks can compile and try locally. That would at least give you more input on results with other drivers.

tikotus
05-31-2017, 06:47 AM
I would expect that since you're using a mobile GPU. A full pipeline flush is very expensive on mobile as it defeats the DRAM-bandwidth-saving features of tile-based GPUs.

To get more ideas from folks, you might post some code. Best case, make this a standalone GLUT or SDL test program folks can compile and try locally. That would at least give you more input on results with other drivers.

So yeah, glFinish is slow. But it fixes the issue. Does this mean the GPU is messing up its pipeline? Any faster ways to help the GPU understand what it should do than glFinish? glFlush doesn't seem to help.

The code is part of a bigger game engine. Would take some time to rip out the rendering part and make it readable.

mhagain
05-31-2017, 07:08 AM
Can you show your context initialization code? It might be a useful place to start, and at the very list would give us a minimal subset of code to rule out potential issues in.

Silence
05-31-2017, 07:22 AM
Not sure if it works by accident (because of the slowdown) or if it actually causes a finish that SDL_SwapWindow for some reason doesn't do.

SDL_SwapWindow has nothing to do with FBOs at all.

tikotus
05-31-2017, 08:06 AM
Here is my context setup: https://gist.github.com/tikotus/ddd9f7a3eb20e008f213adb06a10ff64


SDL_SwapWindow has nothing to do with FBOs at all.

Not directly, but it has something to do with flushing, and the issue seems to be with flushing.

Silence
05-31-2017, 09:58 AM
Here is my context setup: https://gist.github.com/tikotus/ddd9f7a3eb20e008f213adb06a10ff64

Not directly, but it has something to do with flushing, and the issue seems to be with flushing.

True. But what I meant is that since there are implicit glFlush/glFinish when calling to swapBuffers, one might believe he can use swapBuffers for doing such synchronization, which might not work, as you experienced.
My guess is that the driver is clever enough to see that the synchronization has to be done with commands issued to the window framebuffer, not any potential FBOs. I might be wrong...

Dark Photon
05-31-2017, 04:18 PM
For testing purposes, try detaching the depth texture from the FBO before you use it for rendering in the shadow application pass.

Oh, and after rendering to your depth texture, when setting up for the shadow application pass, verify that you are binding your depth texture to the texture unit. IIRC that's how the driver knows there's an implicit flush needed for the FBO render (particularly in the absence of detaching the depth texture from the FBO).

You could also try binding 0 to the texture unit (i.e. unbinding whatever texture was bound) and then binding your depth texture before the shadow application pass to make sure the driver gets the picture.

Could be a driver bug you're chasing, but it could also be a usage problem in your program.


My FPS is pretty low, is it normal that GPUs start acting buggy when overloaded?

No.

The first thing of course is to get your shadow rendering working properly without any expensive waits or explicit flushes. However, after you've solved that...

One thought on the performance: drivers often use an FBO as a placeholder for all of the rendering that's targetted to that FBO. If you need to have renders for multiple off-screen render targets in-flight at the same time (which you're more than likely to need on mobile), then you shouldn't render everything through one FBO but rather use a small pool of FBOs -- enough to get you through 2-3 frames without re-use. That should let the driver efficiently parallelize rendering to different render targets.

However, there is a per-FBO memory cost associated with these FBOs, so don't just create a pool of dozens of them though or you could blow your GPU memory budget.

What you really should do when you get to optimization is to pull out an ARM Mali GPU profiler and see how your workload is mapping to the GPU's functional units. If you see big timing gaps on the vertex or fragment pipe, or you don't see those pipes executing in parallel, then you've got something to fix on your side.

GClements
05-31-2017, 06:12 PM
So yeah, glFinish is slow. But it fixes the issue.
It's not that glFinish() is slow per se. It's that glFinish() waits for rendering to complete, and the rendering may be slow. If omitting glFinish() means that it runs faster but produces garbage, that suggests that you're skipping much of the rendering, i.e. displaying what has been rendered by that point and discarding whatever is queued up. If that is what's happening, there isn't any solution that will be both fast and correct.

Provided that the CPU is mostly idle, glFinish() by itself shouldn't have much of an effect upon performance. Although the glFinish() will take a while, the subsequent GL commands can be executed immediately; whereas without glFinish(), subsequent commands will just be queued up for the future. The main situation where synchronisation has a major performance penalty is if the CPU needs to do a lot of work to generate the data to pass to the GL. In that case, synchronisation means that execution alternates between the CPU and GPU, rather than the two working concurrently.

tikotus
06-01-2017, 03:01 AM
For testing purposes, try detaching the depth texture from the FBO before you use it for rendering in the shadow application pass.

I did this now for testing purposes, kind of. Now I attach texture, depth texture and depth buffer (which ever are available) to the FBO before draw call and detach right after. Same with clear. I also set all texture texture units to 0 after draw call and made sure I'm binding them before draw call. It changed the result a bit. Depending on my setup and FPS it glitches in different ways. With very low FPS I get this kind of results (with and without glFinish, the white quad is the depth texture. Weird that it's white during the quad rendering but works partially during the shadowing?):

urls because I'm having issues with image uploads
https://drive.google.com/file/d/0B1KNlPk-OGgiRVRqY3pmRTlMaFU/view?usp=sharing
https://drive.google.com/file/d/0B1KNlPk-OGgiRVRqY3pmRTlMaFU/view?usp=sharing
https://drive.google.com/file/d/0B1KNlPk-OGgiaUdJWjBHdmx6R2c/view?usp=sharing
https://drive.google.com/file/d/0B1KNlPk-OGgiaUdJWjBHdmx6R2c/view?usp=sharing

EDIT: It looks like the depth texture is cleared halfway through the shadowing pass. The ground is rendered in more than one pass because of the vertex count. Seems like one of the passes succeeds while the other one leaves polygons unshadowed. Only once in a few seconds there is a frame when all polygons are shadowed correctly, and once in a few seconds the depth texture shows correctly. When ever the depth texture shows correctly, the shadows are also correct, but not vice versa. To me it seems that the depth texture is cleared before all meshes are rendered, except sometimes.


If omitting glFinish() means that it runs faster but produces garbage, that suggests that you're skipping much of the rendering, i.e. displaying what has been rendered by that point and discarding whatever is queued up.

This makes sense to me. Still I would like to figure out why I need the explicit glFinish. Why does the driver discard the work?

Dark Photon
06-01-2017, 06:46 AM
Provided that the CPU is mostly idle, glFinish() by itself shouldn't have much of an effect upon performance.

This is somewhat true for a discrete desktop GPU (if you disable read-ahead), but less true for a mobile/embedded GPU where by-design the GPU is still rendering the work submitted last frame "this" frame. That is, it requires parallel CPU/GPU operation to avoid overrunning the low bandwidth of ordinary DRAM.

On mobile / GLES, glFinish() (if even honored correctly by the driver) may force the CPU to wait for several VSync clocks for the GPU to catch up, even though the CPU itself may have been totally idle. At that point, the app has totally missed the opportunity to submit a frame, which clearly reduces performance.

Dark Photon
06-01-2017, 07:03 AM
EDIT: It looks like the depth texture is cleared halfway through the shadowing pass. The ground is rendered in more than one pass because of the vertex count. Seems like one of the passes succeeds while the other one leaves polygons unshadowed.

Only once in a few seconds there is a frame when all polygons are shadowed correctly, and once in a few seconds the depth texture shows correctly. When ever the depth texture shows correctly, the shadows are also correct, but not vice versa. To me it seems that the depth texture is cleared before all meshes are rendered, except sometimes.

Good find! That's definitely something you can work with.

When you say that the ground is rendered in more than one pass because of the vertex count, how do you know that? Are you talking about you are rendering it in multiple passes, or under-the-hood the GPU is rendering it in multiple passes?

If the latter, then I think you're saying that when rendering into your shadow map, the geometry you're submitting for your ground pass is overrunning the size of the tiled primitive buffer passed between the vertex and the fragment pipes, causing multiple read/rasterize/write passes to be performed when rendering to your shadow map (aka a pipeline flush).

If so, then this is going to reduce your performance of course, but unless you're rendering with MSAA when the pipeline flush occurs, this shouldn't generate any rendering artifacts unless there's a bug involved (either in your graphics driver or in your app).

Here are a few things you might check into:


the logs for your graphics driver to see if it indicates what the size of that driver primitive buffer (associated with the framebuffer) is,
look for ways to tune up the size of that primitive buffer in your graphics driver configuration, and
try reducing the amount of geometry you're sending to your shadow map to see if you can get it to consistently work. That would at least help you nail down the root cause.


What I'm wondering is if your driver experiences a primitive buffer overflow, is it properly busting up your rendering into multiple passes properly, or is it possible it's just dumping the whole primitive buffer in the bit bucket when it experiences an overflow? The logs emitted by your graphics driver may help answer this question.

tikotus
06-01-2017, 07:39 AM
Are you talking about you are rendering it in multiple passes
This one. And actually the ground isn't even rendered to the shadowmap. The shadowmap is just used to shadow the ground.


try reducing the amount of geometry you're sending to your shadow map to see if you can get it to consistently work. That would at least help you nail down the root cause.
I started with this. Now down to 3254 triangles.


The logs emitted by your graphics driver may help answer this question.
Can't see anything interesting in the logs.

I further simplified the case (should have done this earlier). Getting really weird. Now I'm only rendering the simplified scene (around 3200 triangles) to a texture with a depth texture. I render the depth texture on a quad. FPS is over 90 and the quad is flickering. It looks correct maybe 70% of the frames, otherwise it's empty. With glFinish it's correct again.

Doesn't seem like this is related to GPU being too busy. It's something else in the pipeline.

tikotus
06-01-2017, 09:46 PM
Some more pinpointing and recapping:

The issue clearly is with operations on the depth buffer of an FBO being discarded. It doesn't matter if the depth buffer is a texture or a render buffer, they behave the same. Also seems like operations on the FBO depth buffer can occur at wrong times in the pipeline: Sometimes the depth buffer is cleared between two draw calls (which only read from the depth buffer). The issue can be "fixed" by giving time to the renderer with a delay or low render time + vsync. Also glFinish fixes the issue, glFlush doesn't seem to do anything.

The FBO's color attachment texture works correct (except for errors caused by broken depth testing). It's only the depth buffer that doesn't seem to sync up well with rest of the rendering.

I've narrowed the whole issue to a very simple case. It's just an FBO with a depth buffer. I doubt it's only the hardware because many games would break on this device otherwise. I've exported a simple scene with shadows from Unity3D and they don't have the same issue.

Getting really stuck here. I think I need to get some other test devices and perhaps just do the glFinish on this GPU.

Thanks everyone for the help so far.

EDIT: Oh, looking into Mali Graphics Debugger now. Seems like what I need, didn't know such tools exist.
EDIT: Ok, great tool. But when I connect it to the app, rendering works. When I disconnect it, glitches appear again. (Probably because it slows down the pipeline for logging)
EDIT: With a fairly complex scene I can see the depth buffer glitching while running MGD. Here is the output of one frame (fairly long): https://gist.github.com/tikotus/f02ccb8915744a5f7e61b8128efa522c

Silence
06-02-2017, 12:31 AM
glFlush doesn't seem to do anything.

This is because glFlush is asynchronous at the opposite of glFinish.

Silence
06-02-2017, 12:50 AM
All of this makes me recall an article someone wrote about shadows on mobile games. There can have many constraints on mobile plateforms which can prevent you to do things the same way that on PC. You might be interested in this article (https://www.gamedev.net/resources/_/technical/graphics-programming-and-theory/shadows-and-light-on-kepler-22-r4514).

Other things you might try is to reduce the resolution of your depth map to something acceptable both for quickness and good looking... You might also try to render directly into a texture without using any FBOs.

tikotus
06-02-2017, 05:06 AM
Interesting things found in Mali Graphics Debugger. I found the "Render pass dependencies" feature and the output doesn't match what I think should be happening.

Frame 2:
===
Renderpass 4(FrameBuffer2) (Renders shadowmap)
-Depends on Renderpass 2(FrameBuffer 2) in Frame 2 due to Texture 12 (Clears the FBO color and depth, texture 12 contains the color texture and isn't actually used for anything)
-Depends on Renderpass 4(FrameBuffer 2) in Frame 1 due to Texture 13 (Depth texture rendering from previous frame! 13 is the depth texture)

Renderpass 5(FrameBuffer 0) (Applies shadowmap)
-Depends on Renderpass 4(FrameBuffer 2) in Frame 2 due to Texture 13 (This dependency actually makes sense, depth texture is used by the fragment shader)
===

Renderpass 4 depending on a render pass from previous frame because of the depth texture is suspicious. I'm clearing the depth buffer bit in renderpass 2 which should tell the driver that the depth from previous frame isn't needed. Right?

EDIT: Furthermore, renderpass 4 NOT depending on renderpass 2 due to texture 13 is also weird. It's cleared in that pass just like texture 12. Supports the idea that the driver doesn't recognize this dependency.

Dark Photon
06-02-2017, 07:06 AM
Interesting things found in Mali Graphics Debugger. I found the "Render pass dependencies" feature and the output doesn't match what I think should be happening.

Frame 2:
===

Renderpass 4(FrameBuffer2) (Renders shadowmap)
- Depends on Renderpass 2(FrameBuffer 2) in Frame 2 due to Texture 12 (Clears the FBO color and depth, texture 12 contains the color texture and isn't actually used for anything)
- Depends on Renderpass 4(FrameBuffer 2) in Frame 1 due to Texture 13 (Depth texture rendering from previous frame! 13 is the depth texture)

Renderpass 5(FrameBuffer 0) (Applies shadowmap)
- Depends on Renderpass 4(FrameBuffer 2) in Frame 2 due to Texture 13 (This dependency actually makes sense, depth texture is used by the fragment shader)


This is really useful info, and especially important on mobile (where the GPU is reading/writing to sloooow DRAM rather than super-fast VRAM).

On mobile, it's important to prevent the GPU from:
1) Reading in the previous framebuffer buffer contents FROM DRAM when it starts rendering a tile, and
2) Writing out the current framebuffer buffer contents TO DRAM when it finishes rendering a tile (if you don't need the contents later).

And how you do that is:

1) Call glClear( <BUFFER>_BIT ) immediately after binding the framebuffer and before rendering to it, and
2) Call glInvalidateFramebuffer() or glDiscardFramebuffersEXT() at the end of your rendering to that framebuffer IF you don't need the contents written out to DRAM.


You should be able to clear up those needless renderpass dependencies by:

In the Generate Shadowmap Pass:

1) Call glClear() on both the COLOR and DEPTH buffers immediately after binding the shadow FBO.
- Make sure you've disabled scissor test and have your buffer write masks set to allow the glClear() to clear ALL bits in the ENTIRE render target.
- Also, why do you have a COLOR buffer here? If you don't need it, I'd get rid of it.

2) Call gl{Invalidate,Discard}Framebuffer*() on the COLOR buffer after rendering to the shadow FBO if you don't need it.
- Obviously you want to write out DEPTH, so don't call it for that.

In the Apply Shadowmap Pass:

1) Call glClear() on all buffers in your framebuffer (COLOR/DEPTH/STENCIL/etc.) immediately after binding that framebuffer.
2) Call gl{Invalidate,Discard}Framebuffer*() on ALL buffers except COLOR after rendering to that framebuffer (obviously you want to keep COLOR).



Renderpass 4 depending on a render pass from previous frame because of the depth texture is suspicious. I'm clearing the depth buffer bit in renderpass 2 which should tell the driver that the depth from previous frame isn't needed. Right?

Ideally, yes. Make sure your scissor test is off and DepthMask is set to allow the entire depth texture to be cleared.


EDIT: Furthermore, renderpass 4 NOT depending on renderpass 2 because of texture 13 is also weird. It's cleared in that pass just like texture 12. Supports the idea that the driver doesn't recognize this dependency.

What are renderpasses 1,2,3 in your frame? Only render passes 4 and 5 are listed above.

tikotus
06-02-2017, 10:43 AM
What are renderpasses 1,2,3 in your frame? Only render passes 4 and 5 are listed above.
2 was explained, it clears the shadow texture. 1 and 3 are clearing unused FBOs, should have removed them.

Some progress. I managed to simplify the renderpasses by clearing the FBO right after attaching it and before rendering. I used to clear all FBOs at the beginning of the frame and then start rendering. While logically correct, this might have confused the driver, and not optimal.

Still some weird things going on (in addition to depth buffer still not working). Sometimes I get a neat one dependency, the depth renderpass which is used to create the shadows. Awesome. But then randomly the dependencies look very weird (might be fps related, haven't really pinned down what changes it. EDIT: with a simpler scene and lower fps I get the weird outcome all the time)

Frame 9

Renderpass 0(Framebuffer 2) Depends on Renderpass 0(Framebuffer 2) in frame 8 due to Texture 12. (Dependency to previous frame! Texture 12 is the color texture of the shadow FBO)
Renderpass 1(Framebuffer 0) Depends on Renderpass 0(Framebuffer 2) in frame 9 due to Texture 13. (This is correct according to my understanding)

What's up with the first dependency? I know I don't need the color buffer so I should discard it, but I'm having problems including the discarding extension (not part of standard gles2) and also it shouldn't be the solution to this weirdness.

Here are the opengl calls during frame 8 and 9 with some comments: https://gist.github.com/tikotus/c40fdbf8086150aa50d03cd1b1837033
"RENDER SHADOWMAP" under "FRAME 9" is the weird one. FBO 2 is cleared right after attaching but still it depends on Texture 12 which is bound to FBO 2. And why on earth the color texture? Blend isn't enabled.

EDIT: I now discard the shadowmap's color attachment. It didn't affect this dependency issue.

EDIT: EXCITING! While I'm still seeing that weird dependency thing, the depth buffer actually seems to work. I've been trying different setups and haven't seen a single glitch yet. Not sure which step did it. Probably many of these great suggestion. I'll try to work backwards and see what (hopefully) fixed it

tikotus
06-02-2017, 12:12 PM
Alright! Thank you very much! I believe this is solved. I'm not fully sure which step actually fixed it. Probably a combination of many. The biggest culprit was probably me clearing all the buffers first, then rendering to them. It was just too complicated for the driver I guess.

Also the fps is now a lot higher!

I still have that weird dependency thing I mentioned in my previous post. Would be nice to see how high the fps gets without it. Too bad gles2 doesn't allow FBO without color attachment (right?)

Thanks again. Let's hope I'm not just lucky.

EDIT: Now I actually have a setup where the shadows are buggy when I don't discard the color buffer. Interesting.

Dark Photon
06-02-2017, 07:03 PM
2 was explained, it clears the shadow texture. 1 and 3 are clearing unused FBOs, should have removed them.

Ooops. Sorry I missed it. I was reading too fast.


Renderpass 4 depending on a render pass from previous frame because of the depth texture is suspicious. I'm clearing the depth buffer bit in renderpass 2 which should tell the driver that the depth from previous frame isn't needed. Right?

Ooh. Yeah, that's bad. Here's what happens:


RenderPass 2:
1) Bind FBO
2) Clear // Tells driver NOT to read in the old contents of the COLOR and DEPTH buffers

RenderPass 3:
1) Bind another framebuffer


Because you didn't invalidate any buffers in your FBO from Pass 2, on this framebuffer switch, the driver has to write out two full-screen buffers, one for your COLOR buffer and one for your DEPTH buffer. Then later...


RenderPass 4:
1) Bind FBO
2) Render into your shadow map.


When the driver sees this FBO bind "without" a Clear at the beginning, it has to go to DRAM and fetch in those two full-screen buffers it wrote back at the end of Pass 2 when you switched away from the FBO without calling gl{Invalidate,Discard}Framebuffer.


Some progress. I managed to simplify the renderpasses by clearing the FBO right after attaching it and before rendering. I used to clear all FBOs at the beginning of the frame and then start rendering. While logically correct, this might have confused the driver, and not optimal.

Exactly! This is the right solution. It gets rid of these needless full-screen writes/reads of multiple framebuffer buffers to/from DRAM. In the end, you want none of them read, and none of them to write "except" for your depth buffer (write to your shadow map texture, that is) in the shadow gen renderpass.


Still some weird things going on (in addition to depth buffer still not working). ... randomly the dependencies look very weird (...EDIT: with a simpler scene and lower fps I get the weird outcome all the time)

Renderpass 0(Framebuffer 2) Depends on Renderpass 0(Framebuffer 2) in frame 8 due to Texture 12. (Dependency to previous frame! Texture 12 is the color texture of the shadow FBO)
...

...
What's up with the first dependency?

Hmm... So the the shadow gen renderpass thinks the FBO depends on itself, presumably because the COLOR texture is both an INPUT and an OUTPUT of this renderpass.

Well, that it is an OUTPUT of the renderpass is intended. But why is it that it thinks it's an INPUT? Do you have the COLOR texture bound to a texture unit when performing your shadow gen renderpass? I'd make sure it isn't bound to a texture unit here. In fact, as a test, I'd add in some code to bind 0 (i.e. No texture) to all texture units before you perform this renderpass.


Here are the opengl calls during frame 8 and 9 with some comments: https://gist.github.com/tikotus/c40f...d03cd1b1837033 (https://gist.github.com/tikotus/c40fdbf8086150aa50d03cd1b1837033)
"RENDER SHADOWMAP" under "FRAME 9" is the weird one. FBO 2 is cleared right after attaching but still it depends on Texture 12 which is bound to FBO 2. And why on earth the color texture?

I'll take a look here shortly.


EDIT: EXCITING! While I'm still seeing that weird dependency thing, the depth buffer actually seems to work. I've been trying different setups and haven't seen a single glitch yet. Not sure which step did it. Probably many of these great suggestion. I'll try to work backwards and see what (hopefully) fixed it

Cool -- progress!

Dark Photon
06-02-2017, 07:11 PM
Alright! Thank you very much! I believe this is solved. I'm not fully sure which step actually fixed it. Probably a combination of many. The biggest culprit was probably me clearing all the buffers first, then rendering to them. It was just too complicated for the driver I guess.

Also the fps is now a lot higher!

Yeah, getting rid of those 2 full-framebuffer writes of your shadow gen COLOR and DEPTH, and the 2 full-framebuffer read-ins of your shadow gen COLOR and DEPTH got rid of a "ton" of DRAM bandwidth.

And it sounds like you had the same thing going on with your system framebuffer too, which was probably 2 more full-framebuffer writes and 2 full-framebuffer reads.

And by adding glInvalidateFramebuffer for your COLOR framebuffer in the shadow gen pass and for the DEPTH buffer in your scene render pass, that'll get rid of 2 more full-framebuffer writes.

Total savings = 10 full-framebuffer reads or writes to DRAM! Without this savings, it makes sense that it'd be painfully slow!

That's one of the most tricky things about mobile GPU programming: knowing how the driver is actually going to process the work you give it, and using that knowledge to keep it from doing something that's going to royally suck performance-wise. Desktop GPUs are much more forgiving of inefficiencies like this.


I still have that weird dependency thing I mentioned in my previous post.

I'll look into this here shortly.


Would be nice to see how high the fps gets without it. Too bad gles2 doesn't allow FBO without color attachment (right?)

It doesn't? Why do you think that?

I just re-read the "Framebuffer Completeness" section in the OpenGL ES 2.0 Spec (https://www.khronos.org/registry/OpenGL/specs/es/2.0/es_full_spec_2.0.pdf), and I didn't see anything about having to have a COLOR attachment. That said, I might have missed something.

On desktop GL (ver 3.0-4.1), you need this for a depth-only FBO (call this during setup after binding the FBO):


glDrawBuffer(GL_NONE);
glReadBuffer(GL_NONE);


However, supposedly this isn't needed for GLES. And I see mixed reports on whether depth-only FBO rendering works on GLES2 devices without a color attachment (see the following links).

Related: See this stackoverflow post: LINK (https://stackoverflow.com/questions/28313782/porting-opengl-es-framebuffer-to-opengl), and this OpenGL.org Forum Post: LINK (https://www.opengl.org/discussion_boards/showthread.php/168740-Shadowmapping-on-iphone-3GS).


EDIT: Now I actually have a setup where the shadows are buggy when I don't discard the color buffer. Interesting.

Huh. That's weird. I'm actually really curious how you create and setup your color and depth buffer attachments and FBO for the shadow gen pass. Could be the root cause lies there. Could you post this?

Dark Photon
06-02-2017, 08:11 PM
Here are the opengl calls during frame 8 and 9 with some comments: https://gist.github.com/tikotus/c40fdbf8086150aa50d03cd1b1837033
"RENDER SHADOWMAP" under "FRAME 9" is the weird one. FBO 2 is cleared right after attaching but still it depends on Texture 12 which is bound to FBO 2. And why on earth the color texture?

Let's see the setup code for the shadow map attachments and FBO. I suspect you may have the color texture still bound somewhere. I'm also curious about this w.r.t. not discarding color triggering depth corruption.

Here are some questions I had and things I noticed when scanning your frame 8 and 9 GL call trace:



You're clearing STENCIL for both the shadow gen pass and the scene render pass. Do you have a stencil buffer in either?
The RENDER SHADOWMAP pass is missing a glDiscardFramebufferEXT for the COLOR (and possibly STENCIL) buffers.
The SCENE render pass is missing a glDiscardFramebufferEXT for the DEPTH (and possibly STENCIL) buffers.
There's a redundant bind of the system framebuffer (framebuffer=0) at the end of each frame before you call eglSwapBuffers. Ideally this should be a no-op. But depending on the driver it could be a very expensive no-op. I'd remove this.
Why do you have eglWaitNative() and eglWaitGL() at the end of your frames? The latter is effectively a glFinish(), and glFinish() is very bad for performance on mobile (forcing a full pipeline flush on more compliant drivers, which implies more needless full-framebuffer write-outs and read-ins, and "big" pipeline stalls preventing GPU parallelism).

tikotus
06-03-2017, 02:57 AM
Do you have the COLOR texture bound to a texture unit when performing your shadow gen renderpass?
I now verified with MGD that no texture unit is bound when rendering the shadowmap.


It doesn't? Why do you think that?
glDrawBuffer doesn't exist for gles2. Not having a color attachment gives GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT with glCheckFramebufferStatus. Same problem exists on WebGL (http://blog.tojicode.com/2012/07/using-webgldepthtexture.html). Seems like I have to use color masks to disable writing to it and then discard the buffer.


1. You're clearing STENCIL for both the shadow gen pass and the scene render pass. Do you have a stencil buffer in either?
That was just a desperate test. I don't have stencil buffers and stencil tests are disabled.



2. The RENDER SHADOWMAP pass is missing a glDiscardFramebufferEXT for the COLOR (and possibly STENCIL) buffers.
3. The SCENE render pass is missing a glDiscardFramebufferEXT for the DEPTH (and possibly STENCIL) buffers.

I hadn't implemented it in that version yet. Anyways, even after implementing, I can't see it in the log. I think MGD fails to hook into it because it's an extension.


4. There's a redundant bind of the system framebuffer (framebuffer=0) at the end of each frame before you call eglSwapBuffers
This is for osx because of this (https://wiki.libsdl.org/SDL_GL_SwapWindow) and should be removed on Android, thanks for pointing it out.


5. Why do you have eglWaitNative() and eglWaitGL() at the end of your frames
I don't know. I'm only calling SDL_GL_SwapWindow. It seems to add them.

I can still confirm that discarding shadowmap FBO's color attachment after rendering the depth fixes the issue I'm experiencing, but not the dependency. Still randomly the dependency isn't there always. Also a very simple case now works without the discarding so it's not the discarding alone that fixes it. Feels like a driver bug and with these simplifications the driver seems to cope better?

Here I create the shadowmap framebuffer: https://gist.github.com/tikotus/4eb06ad1ef839b57abb54bc9f6d74c14
In case you're wondering about the GL_UNSIGNED_SHORT_5_6_5, I'm just minimizing the memory footprint caused by this necessary but redundant color texture. I tried GL_ALPHA but that breaks depth rendering, haven't looked into why.

Dark Photon
06-03-2017, 07:07 AM
glDrawBuffer doesn't exist for gles2. Not having a color attachment gives GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT with glCheckFramebufferStatus. Same problem exists on WebGL (http://blog.tojicode.com/2012/07/using-webgldepthtexture.html). Seems like I have to use color masks to disable writing to it and then discard the buffer.

That's interesting. It was worth a shot anyway.


Here I create the shadowmap framebuffer: https://gist.github.com/tikotus/4eb06ad1ef839b57abb54bc9f6d74c14

Looks good to me. This leaves the depth texture bound, but you said you've verified no textures are bound to texunits when rendering to the shadow map. So that's not it.

I'm not sure what's causing that dependency. You might see if there is a Mali graphics developer forum you could post the question to. A quick websearch turns up this:

* https://community.arm.com/graphics/

tikotus
06-03-2017, 08:04 AM
I'm not sure what's causing that dependency. You might see if there is a Mali graphics developer forum you could post the question to.

I will continue my quest there if it becomes a bigger problem. Now that my shadows seem to be working I can continue my project. Thanks a lot for the great help! Learned a bunch about OpenGL and mobile GPUs.

Dark Photon
06-04-2017, 07:18 AM
No problem! Glad you got it working.