Memory-eating FBO bug on Intel graphics cards?

If I repeatedly call the code below when using an Intel graphics card, virtual memory usage increases without bound. That doesn’t seem to happen when using an NVIDIA card. But I suppose it’s still possible that there’s a bug in my code, so please let me know if I’m doing anything fishy.

Here I use multisample renderbuffers. If I use 0 for the sample counts instead of 4, then it still eats VM, but more slowly. (The difference is a factor of 4, as you would guess.)


    GLuint frameBufferID;
    GLuint colorRenderBufferID, depthRenderBufferID;
    
    // Create 2 framebuffer objects, one for drawing and one for reading
    glGenFramebuffers( 1, &frameBufferID );
    glBindFramebuffer( GL_DRAW_FRAMEBUFFER, frameBufferID );
    glReadBuffer( GL_COLOR_ATTACHMENT0 );
    glDrawBuffer( GL_COLOR_ATTACHMENT0 );
    
    // Create multisample color renderbuffer
    glGenRenderbuffers( 1, &colorRenderBufferID );
    glBindRenderbuffer( GL_RENDERBUFFER, colorRenderBufferID );
    glRenderbufferStorageMultisample( GL_RENDERBUFFER, 4, GL_RGB, 1244, 700 );
    glFramebufferRenderbuffer( GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
        GL_RENDERBUFFER, colorRenderBufferID );
    
    // Create multisample depth/stencil renderbuffer
    glGenRenderbuffers( 1, &depthRenderBufferID );
    glBindRenderbuffer( GL_RENDERBUFFER, depthRenderBufferID );
    glRenderbufferStorageMultisample( GL_RENDERBUFFER, 4, GL_DEPTH24_STENCIL8,
        1244, 700 );
    glFramebufferRenderbuffer( GL_DRAW_FRAMEBUFFER, GL_DEPTH_ATTACHMENT,
        GL_RENDERBUFFER, depthRenderBufferID );
    glFramebufferRenderbuffer( GL_DRAW_FRAMEBUFFER, GL_STENCIL_ATTACHMENT,
        GL_RENDERBUFFER, depthRenderBufferID );
    
    // Check FBO status
    GLenum drawStat = glCheckFramebufferStatus( GL_DRAW_FRAMEBUFFER );
    if ( drawStat != GL_FRAMEBUFFER_COMPLETE )
    {
        NSLog( @"FBO status: %X", drawStat );
    }
    else
    {
        NSLog(@"FBOs is OK.");
        glClear( GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT );
    }
    
    // Clean up
    glBindRenderbuffer( GL_RENDERBUFFER, 0 );
    glBindFramebuffer( GL_FRAMEBUFFER, 0 );
    
    // Destroy all that was built!
    glDeleteRenderbuffers( 1, &colorRenderBufferID );
    glDeleteRenderbuffers( 1, &depthRenderBufferID );
    glDeleteFramebuffers( 1, &frameBufferID );

Originally I said the problem only occurred when using multisampling. I corrected my post to say that multisampling just eats VM faster.

By the way, I have seen this on Mac laptops using Iris Pro and HD 4000 graphics.

As to your finding and your question…

In practice, you wouldn’t/shouldn’t be creating and deleting framebuffer objects (FBOs) and their attachments on-the-fly during rendering – much less bunches of times! So what exactly are you trying to prove here?

As to why you see this behavior from the Intel driver, only an Intel driver engineer is going to able to tell you for absolute certain what’s really going on down there. However, based on what little I do know, here’s a theory as to what might be going on and the background behind it:

Inside of the GL driver, framebuffers (e.g. FBOs and the system framebuffer) are heavy-weight objects to which GL draw calls are attached. They’re especially heavy-weight and long-lived on mobile GPU drivers due to the difference in GPU architecture (which makes it possible for them to use slow system RAM instead of requiring faster video RAM for framebuffers and textures). Intel’s driver should fall into this category of driver.

Take the use case where you create an FBO, issue draw work, and then delete an FBO. And repeat this a number of times. Now consider that the driver pipelines the work all the way to the display. From that you can easily see that the driver cannot delete the FBO immediately when you tell it. It has to wait until all of the work issued for that FBO (which is basically attached to that FBO) has fully completed before the driver can come along later and lazily process the FBO deletion. In other words, when you tell it to delete the FBO, it’s not really deleted. It’s just queued up a request to delete it later, and from then on the GL client library will just “pretend” it’s deleted when talking to you.

On a mobile GPU, where fragment work may be delayed a full frame or two, that queued FBO deletion may not be processed until 2-3 frames later. In the meantime, if you queue up a bunch of FBO create/deletes, it could be that actually none of the deletes have occurred because the FBOs haven’t reached a complete state at the tail end of the GPU pipeline.

But you’re thinking “Wait! I didn’t actually queue any work on those FBOs! So they should be deleted immediately, right?” Not necessarily. Internally as part of the FBO create and setup, it could be (and probably is) queuing work on it internally. And so when you say “delete FBO”, it doesn’t know for sure that there’s no meaningful queued work that may affect the result of rendering unless it’s got a special case in the driver to test for that. This is a crazy case to have a special case in the driver for since it’s pretty inefficient and wasteful. I don’t know why they would bother to add that.

All that said, I’m not an Intel GL driver engineer, so I don’t really know what the driver is doing. But this my educated but blindfolded guess as to what might be going on.

If I understand you, you’re saying that there may be a delay between requesting that an FBO be deleted and actually reclaiming the memory. I can buy that, but in this case the memory NEVER comes back. (OK, I haven’t actually waited until the end of time, but maybe an hour.)

As to what I’m doing: at times I render some images at a variable (depending on user choices) resolution, so I create an FBO of the desired size, render, and delete it. Yes, with some code restructuring, I could do some caching and less creation/deletion.

I tried keeping the framebuffer object and renderbuffer objects around, and just changing the storage allocated to the renderbuffers, but the memory problem persists.

I’ve submitted this stuff as a bug report to Apple. I’m not sure if the Mac drivers are written by Intel, Apple, or some collaboration.

[QUOTE=James W. Walker;1286557]If I repeatedly call the code below when using an Intel graphics card, virtual memory usage increases without bound. …

… in this case the memory NEVER comes back. (OK, I haven’t actually waited until the end of time, but maybe an hour.)[/QUOTE]

Ok, and I’m guessing you’ve already called swapbuffers several times after the deletion during this hour, but memory for additional FBOs still seems to grow VM.

I can see VM growing and then never shrinking (just because of how pages are committed to processes, even if you free the underlying heap memory). However, I can’t see growth without bound unless there’s a leak, either in your code or in the driver.

… I tried keeping the framebuffer object and renderbuffer objects around, and just changing the storage allocated to the renderbuffers, but the memory problem persists.

That’s interesting. Have you tried not reallocating the storage and just re-using the last-attached storage?

Typically what you’d do is create a small pool of FBOs (enough to provide decent parallelism in the driver), create/attach your render targets, and then just re-use the FBOs and render targets so that there is no dynamic creation of FBOs or render targets occurring steady-state (as this is potentially a very slow operation). This strategy may reduce the priority of the issue you’re chasing and potentially resolve it.

The bug is fixed in the coming macOS 10.13 (High Sierra).