Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 8 of 8

Thread: avoiding the default framebuffer blit overhead

Hybrid View

  1. #1
    Junior Member Regular Contributor
    Join Date
    Apr 2004
    Posts
    228

    avoiding the default framebuffer blit overhead

    Hi,

    First I will describe the problem.
    As we know the default framebuffer (0) is remnant from the past which for some mysterious reason opengl is still dragging along like a bag with stones.
    It is very un-flexible and totally alien to many modern-day ways of doing things, e.g. deferred rendering.
    One would often need to be able to combine freely various color/depth/stencil buffers, which is easy with the FBO infrastructure.
    But when we need to display something there is a problem. The final image to be displayed is often not generated in the default framebuffer,
    because we need the flexibility of FBOs. For example we may need the depth buffer used to render the scene available as a texture or something.

    Then we need to blit to the default framebuffer. This adds overhead, which may be something like 1-2 milliseconds per frame.

    In direct3d the colorbuffer that can be displayed (swapchain) is a pure colorbuffer-only object from the POV of the renderer and can be combined with other buffers just like the non-displayable ones.
    This is unlike the opengl default framebuffer, which drag it's own depth buffer (or has none) and can not be changed.

    I experimented a bit with the nvidia WGL_NV_DX_interop2 extension.
    I created some d3d11 device with it's swapchain, then using the extension, setup a opengl renderbuffer that corresponds to the swapchain backbuffer.
    Then i did some rendering on the opengl while using the d3d's way of presenting image to a window.
    After some tweaking i managed that to run faster than opengl's own way using blit.

    All the rendering was just a glClear(GL_COLOR_BUFFER_BIT) and then present the result.

    I tested 3 cases:
    a) opengl clear + opengl present (using blit to the default fb)
    b) opengl clear + d3d present
    c) d3d clear + d3d present.

    b) and c) are equally fast and a) is noticeably slower than them.

    The mentioned tweaking included removing of the synchronization calls (wglDXLockObjectsNV and wglDXUnlockObjectsNV)
    I only call wglDXLockObjectsNV once and the objects stays locked all the time (otherwise opengl generates GL_INVALID_FRAMEBUFFER_OPERATION)

    the render loop is basically
    glClearColor(0, rand()%256*(1.0f/256), 0, 1);
    glClear(GL_COLOR_BUFFER_BIT);
    glFlush();
    sc->Present(0, 0);
    the backbuffer of the swapchain is bound to the opengl draw framebuffer.

    Also when the swapchain is created, the BufferUsage must include the DXGI_USAGE_RENDER_TARGET_OUTPUT flags, otherwise the performance is crippled.

    It is a shame that this ugly hack actually outperforms the opengl's native way to output it's graphics.
    I think it is about time they get rid of the default framebuffer.
    They can look at the ipad for an idea how to do it.

  2. #2
    Junior Member Regular Contributor
    Join Date
    Apr 2004
    Posts
    228
    here is the test source if someone is interested to try it
    change "mode" to select among the 3 cases i mentioned - see the comment
    ah, also "start" is the program entry point (i set that in the linker options). you can rename it to WinMain or whatever

    Code :
    #include <stdio.h>
    #define INITGUID
    #include <windows.h>
    #include <GL/gl.h>
    #include <d3d11.h>
     
    static LRESULT CALLBACK wnd_proc(HWND wnd, UINT msg, WPARAM wp, LPARAM lp)
    {
        switch (msg) {
            case WM_PAINT: ValidateRect(wnd, NULL); return 0;
            case WM_CLOSE: ExitProcess(0); return 0;
            default: return DefWindowProcA(wnd, msg, wp, lp);
        }
    }
     
    #define WIDTH 1024
    #define HEIGHT 768
     
    // 0 = gl_clear/gl_present, 1 = gl_clear/d3d_present, 2 = d3d_clear/d3d_present
    int mode = 1;
     
    void start()
    {
        // window
        WNDCLASSA wc;
        RECT rc;
        HWND wnd;
        // d3d
        ID3D11Device *d3ddev;
        ID3D11DeviceContext *d3dctx;
        IDXGISwapChain *sc;
        DXGI_SWAP_CHAIN_DESC scd;
        ID3D11Texture2D *d3dbb;
        ID3D11RenderTargetView *view;
         // opengl
        HDC dc;
        PIXELFORMATDESCRIPTOR pfd;
        int pf;
        HGLRC ctx;
        #define WGL_ACCESS_READ_WRITE_NV          0x0001
        HANDLE (WINAPI *wglDXOpenDeviceNV)(void *dxDevice);
        HANDLE (WINAPI *wglDXRegisterObjectNV)(HANDLE hDevice, void *dxObject, GLuint name, GLenum type, GLenum access);
        BOOL (WINAPI *wglDXLockObjectsNV)(HANDLE hDevice, GLint count, HANDLE *hObjects);
        BOOL (WINAPI *wglDXUnlockObjectsNV)(HANDLE hDevice, GLint count, HANDLE *hObjects);
        HANDLE idev;
        #define GL_READ_FRAMEBUFFER               0x8CA8
        #define GL_DRAW_FRAMEBUFFER               0x8CA9
        #define GL_RENDERBUFFER                   0x8D41
        #define GL_COLOR_ATTACHMENT0              0x8CE0
        void (APIENTRY *glGenFramebuffers) (GLsizei n, GLuint *framebuffers);
        void (APIENTRY *glBindFramebuffer) (GLenum target, GLuint framebuffer);
        void (APIENTRY *glFramebufferRenderbuffer) (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
        void (APIENTRY *glGenRenderbuffers) (GLsizei n, GLuint *renderbuffers);
        void (APIENTRY *glBindRenderbuffer) (GLenum target, GLuint renderbuffer);
        void (APIENTRY *glRenderbufferStorage) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height);
        GLenum (APIENTRY *glCheckFramebufferStatus) (GLenum target);
        void (APIENTRY *glBlitFramebuffer) (GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
        GLuint bb, fb;
        HANDLE ibb;
     
        // create windwo    
        memset(&wc, 0, sizeof(wc));
        wc.lpfnWndProc = wnd_proc;
        wc.lpszClassName = "test_wc";
        wc.hCursor = LoadCursor(NULL, MAKEINTRESOURCE(IDC_ARROW));
        RegisterClassA(&wc);
        rc.left = rc.top = 0;
        rc.right = WIDTH;
        rc.bottom = HEIGHT;
        AdjustWindowRect(&rc, WS_CAPTION|WS_SYSMENU, FALSE);
        wnd = CreateWindowExA(0, wc.lpszClassName, "window", WS_CAPTION|WS_SYSMENU, 0, 0, rc.right - rc.left, rc.bottom - rc.top, NULL, NULL, NULL, NULL);
        ShowWindow(wnd, SW_SHOW);
     
        if (mode) {
            IDXGIFactory *factory;
            IDXGIAdapter *adapter;
            IDXGIOutput *output;
            DXGI_OUTPUT_DESC od;
            CreateDXGIFactory(&IID_IDXGIFactory, &factory);
            factory->lpVtbl->EnumAdapters(factory, 0, &adapter);
            adapter->lpVtbl->EnumOutputs(adapter, 0, &output);
            output->lpVtbl->GetDesc(output, &od);
            output->lpVtbl->Release(output);
     
            // create d3d device
            memset(&scd, 0, sizeof(scd));
            scd.BufferDesc.Width = WIDTH;
            scd.BufferDesc.Height = HEIGHT;
            scd.BufferDesc.RefreshRate.Numerator = 60;
            scd.BufferDesc.RefreshRate.Denominator = 1;
            scd.BufferDesc.Format = DXGI_FORMAT_B8G8R8A8_UNORM;
            scd.SampleDesc.Count = 1;
            scd.BufferUsage = DXGI_USAGE_BACK_BUFFER|DXGI_USAGE_RENDER_TARGET_OUTPUT;
            scd.BufferCount = 1;
            scd.OutputWindow = wnd;
            scd.Windowed = TRUE;
            D3D11CreateDeviceAndSwapChain(adapter, D3D_DRIVER_TYPE_UNKNOWN, NULL, D3D11_CREATE_DEVICE_SINGLETHREADED,
                NULL, 0, D3D11_SDK_VERSION, &scd, &sc, &d3ddev, NULL, &d3dctx);
            sc->lpVtbl->GetBuffer(sc, 0, &IID_ID3D11Texture2D, (void **)&d3dbb);
     
            if (mode > 1) {
                D3D11_RENDER_TARGET_VIEW_DESC vd;
                D3D11_VIEWPORT vp;
                vd.Format = DXGI_FORMAT_UNKNOWN;
                vd.ViewDimension = D3D11_RTV_DIMENSION_TEXTURE2D;
                vd.Texture2D.MipSlice = 0;
                d3ddev->lpVtbl->CreateRenderTargetView(d3ddev, d3dbb, &vd, &view);
                d3dctx->lpVtbl->OMSetRenderTargets(d3dctx, 1, &view, NULL);
                vp.TopLeftX = vp.TopLeftY = 0;
                vp.Width = WIDTH;
                vp.Height = HEIGHT;
                vp.MinDepth = 0;
                vp.MaxDepth = 1;
                d3dctx->lpVtbl->RSSetViewports(d3dctx, 1, &vp);
            }
        }
     
        if (mode < 2) {    
            dc = GetDC(wnd);
            memset(&pfd, 0, sizeof(pfd));
            pfd.nSize = sizeof(pfd);
            pfd.nVersion = 1;
            pfd.dwFlags = PFD_DRAW_TO_WINDOW|PFD_SUPPORT_OPENGL|PFD_DEPTH_DONTCARE;
            pf = ChoosePixelFormat(dc, &pfd);
            SetPixelFormat(dc, pf, NULL);
            ctx = wglCreateContext(dc);
            wglMakeCurrent(dc, ctx);
            glGetString(GL_RENDERER);
            *(PROC *)&glGenRenderbuffers = wglGetProcAddress("glGenRenderbuffers");
            *(PROC *)&glGenFramebuffers = wglGetProcAddress("glGenFramebuffers");
            *(PROC *)&glBindFramebuffer = wglGetProcAddress("glBindFramebuffer");
            *(PROC *)&glFramebufferRenderbuffer = wglGetProcAddress("glFramebufferRenderbuffer");
            *(PROC *)&glCheckFramebufferStatus = wglGetProcAddress("glCheckFramebufferStatus");
            *(PROC *)&glBindRenderbuffer = wglGetProcAddress("glBindRenderbuffer");
            *(PROC *)&glRenderbufferStorage = wglGetProcAddress("glRenderbufferStorage");
            *(PROC *)&glBlitFramebuffer = wglGetProcAddress("glBlitFramebuffer");
            glGenFramebuffers(1, &fb);
            glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
            glBindFramebuffer(GL_READ_FRAMEBUFFER, fb);
            glGenRenderbuffers(1, &bb);
     
            if (mode) {
                *(PROC *)&wglDXOpenDeviceNV = wglGetProcAddress("wglDXOpenDeviceNV");
                *(PROC *)&wglDXRegisterObjectNV = wglGetProcAddress("wglDXRegisterObjectNV");
                *(PROC *)&wglDXLockObjectsNV = wglGetProcAddress("wglDXLockObjectsNV");
                *(PROC *)&wglDXUnlockObjectsNV = wglGetProcAddress("wglDXUnlockObjectsNV");
                idev = wglDXOpenDeviceNV(d3ddev);
                ibb = wglDXRegisterObjectNV(idev, d3dbb, bb, GL_RENDERBUFFER, WGL_ACCESS_READ_WRITE_NV);
                GetLastError();
                wglDXLockObjectsNV(idev, 1, &ibb);
                glFramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, bb);
                glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER);
                //wglDXUnlockObjectsNV(idev, 1, &ibb);
            } else {
                glBindRenderbuffer(GL_RENDERBUFFER, bb);
                glRenderbufferStorage(GL_RENDERBUFFER, GL_RGBA8, WIDTH, HEIGHT);
                glFramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, bb);
                glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER);
            }
        }
     
        while (1) {
            MSG msg;
            while (PeekMessageA(&msg, NULL, 0, 0, PM_REMOVE))
                DispatchMessageA(&msg);
     
            if (mode > 1) {
                float col[4] = {rand()%256*(1.0f/256),0,0,1};
                d3dctx->lpVtbl->ClearRenderTargetView(d3dctx, view, col);
                sc->lpVtbl->Present(sc, 0, 0);
            } else {
                //if (mode) wglDXLockObjectsNV(idev, 1, &ibb);
                glClearColor(0,rand()%256*(1.0f/256),0,1);
                glClear(GL_COLOR_BUFFER_BIT);
     
                if (mode) {        
                    glFlush();
                    //wglDXUnlockObjectsNV(idev, 1, &ibb);
                    sc->lpVtbl->Present(sc, 0, 0);
                } else {
                    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);
                    glBlitFramebuffer(0, 0, WIDTH, HEIGHT, 0, HEIGHT, WIDTH, 0, GL_COLOR_BUFFER_BIT, GL_LINEAR);
                    glGetError();
                    glFlush();
                    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
                }
            }
     
            {
                // show fps in the window title bar. dont update it on every frame to avoid crippling the performance
                static DWORD fc, last;
                DWORD now = GetTickCount();
                fc += 1;
                if (!last) last = now;
                else if (now - last > 300) {
                    char txt[64];
                    sprintf(txt, "fps: %.4f", 1000.0f * fc / (float)(now - last));
                    SetWindowTextA(wnd, txt);
                    fc = 0;
                    last = now;
                }
            }
        }
    }

  3. #3
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    Code :
    DWORD now = GetTickCount();

    I don't think this makes for a good test, considering that the resolution on this function is poor. Try using QueryPerformanceCounter, which has much higher resolution and is the common means for doing serious timings in Windows.

    Also, you never said what your actual results are, only that one was "noticeably slower". Oh, and I would be curious to see what you would get via query objects. That is, detecting the GPU time rather than the CPU time.

  4. #4
    Junior Member Regular Contributor
    Join Date
    Apr 2004
    Posts
    228
    you have the source, feel free to test with QueryPerformanceCounter, queries and whatever you like. the results i got were telling enough for me

    both d3d-present cases did about 1000 fps on my machine and the opengl-present case did something between 500 and 600 fps
    the gl-clear/d3d-present case did abit lower than pure d3d, but the difference was marginal.

    To me it is clear that the gl-present case has one additional buffer copy than the d3d-present cases
    Last edited by l_belev; 11-17-2012 at 05:03 PM.

  5. #5
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    The biggest question I have is this... what if you're not doing it the way you describe?

    Consider the case of actually rendering something for real. You're doing deferred rendering; OK, fine. You have your g-buffers, where you have your actual data. Then you convert this into light reflectance as seen by the camera. But if you're doing HDR (which, let's be honest, is far more of a no-brainer than deferred rendering by this point), you're doing all of this accumulation into a floating-point buffer. You can't "present" that; you need to tone-map it first. Not only that, you probably have some transparent objects to render, so you need to do some blending. This should presumably be done in HDR space.

    Now it's time to tone-map down to SRGB8_ALPHA8. But where should the output go? Why not... the default framebuffer?

    In short, I'm not seeing the problem here. Your problem seems to be that you don't want to use the default framebuffer (as stated by your passive/aggressive introduction). That's fine, but... it still there.

    No matter how many threads on this forum you make, no matter how many alternative rendering systems you write, no matter how much you want it to be so, it's still there. It was there in OpenGL 4.1. It was there in OpenGL 4.2. It was there in OpenGL 4.3. Next year, it will still be there in OpenGL 4.4/5.0/etc. Whether you want to use it or not, it is there and available for use. So if you can, use it. And in most real cases, you can. So use it, and you won't have to worry about that copy being slow, since you won't be doing a copy.

    If you spent more time using the API you have, rather than the API you want, you'll be a lot happier.

    To me it is clear that the gl-present case has one additional buffer copy than the d3d-present cases
    You're not honestly showing this code off because you had the revelation that copying is slower than not copying, are you?

  6. #6
    Intern Contributor
    Join Date
    Mar 2010
    Location
    Winston-Salem, NC
    Posts
    62
    Quote Originally Posted by Alfonse Reinheart View Post
    (as stated by your passive/aggressive introduction).
    It's simply an aggressive introduction, which you don't happen to agree with. Could you please skip the psychoanalysis in the future?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •