PDA

View Full Version : Performance of FBOs in various situations.



Jose Goruka
05-15-2012, 03:37 PM
Hi! I remember reading in some old ATI or nVidia papers that said that, in your Application, FBOs had to be created as early as possible for maximum performance.
Is this still true today? If i want to create a large FBO for a shadowmap after having loaded a few resources, buffer objects, textures, etc. Does it still hold?

To add a bit more depth to the discussion, I am familiar with how ATI and nVidia hardware works and I know that framebuffers are allocated in a video memory region called "Tiled Memory", which ensure that access to pixels for reading/writing linearly is more cache friendly. I know textures are not stored there because they are, instead, swizzled on upload and uploading buffer objects or shaders to tiled memory doesn't make any sense. Is this why papers recommend creating FBOs as early as possible? Is there any other reason? or is it just not important/necesary nowadays?

Thanks!

aqnuep
05-15-2012, 06:08 PM
There is no such thing as a "large FBO" as FBOs only hold state, not actual resources. The textures and renderbuffers you use as FBO attachments are the resources that can be "large". Probably it is still preferable to allocate the textures and renderbuffers that you plan to use as FBO attachments as soon as possible as then it is more likely they will fit in video memory, rather than GPU addressable system memory. However, as long as you don't overrun your video memory budget, that shouldn't be an issue.

Also, there is no such concept as "tiled memory". Textures and renderbuffers do have a tiled internal structure, i.e. the texels are not stored linearly but in a swizzled/tiled layout, however, that has no connection with the actual memory location of the resource itself, so they can be either in video memory or system memory. Buffers and 1D textures, in fact, have a linear layout, obviously, but every other texture and renderbuffer will most likely use a different (tiled) layout.

So to sum it up:
- Not FBO creation what matters but resource creation (i.e. texture and/or renderbuffer creation).
- FBOs hold only state data, not resource data.
- You shouldn't worry about the memory type used for the FBO attachment creation as long as you don't overrun your video memory budget with your buffers and textures.
- There is no such thing as "tiled memory" but "tiled layout" which is independent of the memory type used.

What paper did you read that recommends creating FBOs as early as possible?

I think what made you confused is that earlier hardware had limit on how many depth textures could have Hi-Z support (due to special on-chip memory used for them). If that's the case, you shouldn't worry about it, modern GPUs have a unified handling of resources and do the Hi-Z construction (including compression and decompression) on-demand.

mhagain
05-16-2012, 02:10 AM
That's most likely based on advice from the DirectX SDK which recommends the very same for render target textures (and other default pool resources), with the vendors extrapolating to OpenGL too. Yes, the thinking was to ensure that they have a higher chance of being allocated in GPU memory.

Youkakun
08-09-2012, 12:52 PM
Hi there,

i'm trying to use OpenGL for video frame processing (inside a filter for frameserving). For this purpose, i wrote following class for an offscreen OpenGL context on windows:

OGLContext.h

#pragma once

#include <GLEW/glew.h>
#include <GLEW/wglew.h>
#include <GL/glu.h>
#include <string>

class OGLContext
{
public:
OGLContext(unsigned int, unsigned int, GLenum, unsigned char);
~OGLContext();
void Activate(bool);
void ReadPixels(GLubyte*);
void DrawPixels(GLubyte*);

private:
// Windows resources
std::wstring inst_name; // Unique class name
HWND hwnd;
HDC hdc;
HGLRC ctx;
// FBO resources
GLuint tex_color, fbo_transfer, rbo_color, rbo_depth_stencil, fbo_render;
// Context
unsigned int width, height;
GLenum colorspace;
};

OGLContext.cpp

#include "OGLContext.h"
#include "resources.h" // holds DLL module handle 'void *dll_module'

//Window callback
static LRESULT CALLBACK WndProc(HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam){return DefWindowProc(hwnd, msg, wParam, lParam);}

// Create and activate OpenGL context
OGLContext::OGLContext(unsigned int width, unsigned int height, GLenum colorspace, unsigned char antiAliasing) : width(width), height(height), colorspace(colorspace){
// Find unique instance name
WNDCLASSEX wcx;
this->inst_name = L"YoukaOffscreen00";
for(unsigned char i = 0; i <= 100; i++){
if(i == 100)
throw "Cannot use more than 100 instances!";
inst_name[14] = 48 + (i/10);
inst_name[15] = 48 + (i%10);
if(!GetClassInfoEx(reinterpret_cast<HINSTANCE>(dll_module), this->inst_name.c_str(), &wcx))
break;
}
// Window class
wcx.cbSize = sizeof(WNDCLASSEX);
wcx.style = CS_OWNDC;
wcx.lpfnWndProc = WndProc;
wcx.cbClsExtra = 0;
wcx.cbWndExtra = 0;
wcx.hInstance = reinterpret_cast<HINSTANCE>(dll_module);
wcx.hIcon = LoadIcon(NULL, IDI_APPLICATION);
wcx.hCursor = LoadCursor(NULL, IDC_ARROW);
wcx.hbrBackground = (HBRUSH)GetStockObject(BLACK_BRUSH);
wcx.lpszMenuName = NULL;
wcx.lpszClassName = this->inst_name.c_str();
wcx.hIconSm = LoadIcon(NULL, IDI_WINLOGO);
RegisterClassEx(&wcx);
// Create window
this->hwnd = CreateWindowEx(0, this->inst_name.c_str(), this->inst_name.c_str(), WS_POPUP, 0, 0, this->width, this->height, NULL, NULL, reinterpret_cast<HINSTANCE>(dll_module), 0);
// Get window context
this->hdc = GetDC(this->hwnd);
// Set window context pixel format
PIXELFORMATDESCRIPTOR pfd;
memset(&pfd, 0, sizeof(PIXELFORMATDESCRIPTOR));
pfd.nSize = sizeof(PIXELFORMATDESCRIPTOR);
pfd.nVersion = 1;
pfd.dwFlags = PFD_DRAW_TO_WINDOW | PFD_SUPPORT_OPENGL;
pfd.iPixelType = PFD_TYPE_RGBA;
pfd.cColorBits = 32;
pfd.cRedBits = 8;
pfd.cGreenBits = 8;
pfd.cBlueBits = 8;
pfd.cAlphaBits = 8;
pfd.cDepthBits = 24;
pfd.cStencilBits = 8;
pfd.iLayerType = PFD_MAIN_PLANE;
int pformat = ChoosePixelFormat(this->hdc, &pfd);
if(!SetPixelFormat(this->hdc, pformat, &pfd)){
ReleaseDC(this->hwnd, this->hdc);
DestroyWindow(this->hwnd);
UnregisterClass(this->inst_name.c_str(), reinterpret_cast<HINSTANCE>(dll_module));
throw "Couldn't find a fitting pixel format!";
}
// Create OGL context
this->ctx = wglCreateContext(this->hdc);
// Initialize glew for OpenGL >1.1 and check needed version & extensions
this->Activate(true);
if(glewInit() || !GLEW_VERSION_2_1 || !GLEW_ARB_framebuffer_object){
this->Activate(false);
wglDeleteContext(this->ctx);
ReleaseDC(this->hwnd, this->hdc);
DestroyWindow(this->hwnd);
UnregisterClass(this->inst_name.c_str(), reinterpret_cast<HINSTANCE>(dll_module));
throw "Couldn't initialize GLEW or OpenGL 2.1 & ARB_framebuffer_object isn't supported!";
}
// Create transfer FBO
glGenTextures(1, &this->tex_color); // Color
glBindTexture(GL_TEXTURE_2D, this->tex_color);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, this->width, this->height, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
glBindTexture(GL_TEXTURE_2D, 0);
glGenFramebuffers(1, &this->fbo_transfer); // Attach
glBindFramebuffer(GL_FRAMEBUFFER, this->fbo_transfer);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, this->tex_color, 0);
if(glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE){
glBindFramebuffer(GL_FRAMEBUFFER, 0);
glDeleteFramebuffers(1, &this->fbo_transfer);
glDeleteTextures(1, &this->tex_color);
this->Activate(false);
wglDeleteContext(this->ctx);
ReleaseDC(this->hwnd, this->hdc);
DestroyWindow(this->hwnd);
UnregisterClass(this->inst_name.c_str(), reinterpret_cast<HINSTANCE>(dll_module));
throw "Bad framebuffer status!";
}
// Create render FBO
glGenRenderbuffers(1, &this->rbo_color); // Color
glBindRenderbuffer(GL_RENDERBUFFER, this->rbo_color);
glRenderbufferStorageMultisample(GL_RENDERBUFFER, antiAliasing, GL_RGBA, this->width, this->height);
glGenRenderbuffers(1, &this->rbo_depth_stencil); // Depth & stencil
glBindRenderbuffer(GL_RENDERBUFFER, this->rbo_depth_stencil);
glRenderbufferStorageMultisample(GL_RENDERBUFFER, antiAliasing, GL_DEPTH_STENCIL, this->width, this->height);
glBindRenderbuffer(GL_RENDERBUFFER, 0);
glGenFramebuffers(1, &this->fbo_render); // Attach
glBindFramebuffer(GL_FRAMEBUFFER, this->fbo_render);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, this->rbo_color);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_STENCIL_ATTACHMENT, GL_RENDERBUFFER, this->rbo_depth_stencil);
if(glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE){
glBindFramebuffer(GL_FRAMEBUFFER, 0);
glDeleteFramebuffers(1, &this->fbo_render);
glDeleteRenderbuffers(1, &this->rbo_color);
glDeleteRenderbuffers(1, &this->rbo_depth_stencil);
glDeleteFramebuffers(1, &this->fbo_transfer);
glDeleteTextures(1, &this->tex_color);
this->Activate(false);
wglDeleteContext(this->ctx);
ReleaseDC(this->hwnd, this->hdc);
DestroyWindow(this->hwnd);
UnregisterClass(this->inst_name.c_str(), reinterpret_cast<HINSTANCE>(dll_module));
throw "Bad framebuffer status!";
}
// All done; deactivate context for now
this->Activate(false);
}

// Deactivate and destroy OpenGL context
OGLContext::~OGLContext(){
// Free FBOs
glBindFramebuffer(GL_FRAMEBUFFER, 0);
glDeleteFramebuffers(1, &this->fbo_render);
glDeleteRenderbuffers(1, &this->rbo_color);
glDeleteRenderbuffers(1, &this->rbo_depth_stencil);
glDeleteFramebuffers(1, &this->fbo_transfer);
glDeleteTextures(1, &this->tex_color);
// Free OGL context
this->Activate(false);
wglDeleteContext(this->ctx);
// Free window context
ReleaseDC(this->hwnd, this->hdc);
// Free window
DestroyWindow(this->hwnd);
// Unregister window class
UnregisterClass(this->inst_name.c_str(), reinterpret_cast<HINSTANCE>(dll_module));
}

// (De)Activates OpenGL context for current thread
void OGLContext::Activate(bool active){
if(active)
wglMakeCurrent(this->hdc, this->ctx);
else
wglMakeCurrent(this->hdc, NULL);
}

// Reads image from framebuffer
void OGLContext::ReadPixels(GLubyte *image){
glBindFramebuffer(GL_READ_FRAMEBUFFER, this->fbo_render);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, this->fbo_transfer);
glBlitFramebuffer(0, 0, this->width, this->height, 0, 0, this->width, this->height, GL_COLOR_BUFFER_BIT, GL_NEAREST);
glBindFramebuffer(GL_FRAMEBUFFER, this->fbo_render);
glBindTexture(GL_TEXTURE_2D, this->tex_color);
glGetTexImage(GL_TEXTURE_2D, 0, this->colorspace, GL_UNSIGNED_BYTE, image);
glBindTexture(GL_TEXTURE_2D, 0);
}

// Sends image to framebuffer
void OGLContext::DrawPixels(GLubyte *image){
glBindTexture(GL_TEXTURE_2D, this->tex_color);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, this->width, this->height, this->colorspace, GL_UNSIGNED_BYTE, image);
glBindTexture(GL_TEXTURE_2D, 0);
glBindFramebuffer(GL_READ_FRAMEBUFFER, this->fbo_transfer);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, this->fbo_render);
glBlitFramebuffer(0, 0, this->width, this->height, 0, 0, this->width, this->height, GL_COLOR_BUFFER_BIT, GL_NEAREST);
glBindFramebuffer(GL_FRAMEBUFFER, this->fbo_render);
}


It's important for me to render with multisampling and having a good performance. Regrettably, pixel transfer by member functions ReadPixels and DrawPixels is extremely slow, so streaming a video with 24 frames per second hangs a lot (by not more than simple pixel transfer per frame, no drawing).
In comparison: before, i tried it with one single FBO without multisampling and glReadPixels+glDrawPixels for pixel transfer - much better performance, no hanging.

I don't want to require multisampled textures, but is there an alternative for better performance?