Preflight creat large FBOs, detecting errors

Duncan_Champney · March 25, 2008, 8:18am

My app uses FBOS to save 3D views to disk as JPEG or TIFF files. I create 2 renderbuffers for my FBO, one for the front color buffer, and one for a depth buffer. Everything works well as long as I don’t try to create an image that’s too big.

I first query OpenGL to make sure FBOs are supported, and get the maximum dimensions for an FBO by calling glGetIntegerv(GL_MAX_RENDERBUFFER_SIZE_EXT, &maxRenderbufferSize)

I also check each OpenGL call in my setup for errors using glGetError(). None of my OpenGL calls are returning errors.

My development machine, a recent MacBook Pro with an NVIDIA 8600 graphics chip and 256 mb of VRAM, claims to support FBOs of 8192x8192.

However, at a little over 4000x4000 pixels, the resulting image comes out black. If I increase the image size much beyond that, it frequently locks up my machine to the point where I have to do a force shutdown and restart.

I suspect that I am taking up too much VRAM, and leaving the Quartz system unable to handle drawing to the screen.

Sometimes after using my app the system will also get VERY sluggish, as if it is having to page the contents of VRAM back and forth to main memory. I have not had this happen since

How can I preflight my setup to avoid this, and better detect errors when they do happen?

I am a fair newbie when it comes to OpenGL, so it may be that I am missing something obvious.

Here is the code for the key part of my routine to save an image to disk using an FBO:


	//Code to set up an NSBitmapImageRep and  get a handle to the bitmap data in theRepData not shown
	if (use_FBOs || force_fbos)
		{
		//if use_FBOs 
		//		create an FBO size to to save_width, save_height
		//		Create a renderbuffer size to save_width, save_height
		//		bind FDO
		//		bind FBO to renderbuffer
		//		perform an openGL copy, with or without selection
		//		unbind renderbuffer, FBO
		//		delete FBO

		//Set up an FBO with one renderbuffer attachment
		GLuint framebuffer = 0;
		GLuint  renderbuffers[2] = {0,0};
		glGenFramebuffersEXT(		1, 
									&framebuffer);					//Generate a new framebuffer id
		opengl_errchk("glGenFramebuffersEXT");
		if (!framebuffer) return nil;
		glBindFramebufferEXT(		GL_FRAMEBUFFER_EXT, 
									framebuffer);					//Bind to it
		opengl_errchk("glBindFramebufferEXT");
		glGenRenderbuffersEXT(		2, renderbuffers);				//Generate 2 new renderbuffer ids
																	//one for the color data, and one for a depth buffer
		if (!renderbuffers[0] || !renderbuffers[1]) return nil;
		opengl_errchk("glGenRenderbuffersEXT");
		glBindRenderbufferEXT(		GL_RENDERBUFFER_EXT, 
									renderbuffers[0]);					//Bind the  first renderbuffer
		opengl_errchk("glBindRenderbufferEXT");
		glRenderbufferStorageEXT(	GL_RENDERBUFFER_EXT, 
									GL_RGB, 
									save_width, 
									save_height);					//Create storage for the renderbuffer object
		opengl_errchk("glRenderbufferStorageEXT");
		glFramebufferRenderbufferEXT(
									GL_FRAMEBUFFER_EXT, 
									GL_COLOR_ATTACHMENT0_EXT,
									GL_RENDERBUFFER_EXT, 
									renderbuffers[0]);				//Install the renderbuffer in the framebuffer?
									
		opengl_errchk("glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT...");
		//Now bind a depth buffer to the FBO
		glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, renderbuffers[1]);
		opengl_errchk("glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, renderbuffers[1])");
		glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH_COMPONENT, save_width, save_height);
		opengl_errchk("glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH_COMPONENT, save_width, save_height)");
		glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, renderbuffers[1]);
		opengl_errchk("glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, renderbuffers[1])");
		
		
		status = glCheckFramebufferStatusEXT(GL_FRAMEBUFFER_EXT);	//Make sure the framebuffer is "complete" (ready for drawing)
		if (status != GL_FRAMEBUFFER_COMPLETE_EXT)
			{														//Error. clean up before returning
			//unbind the frame buffer.
			glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0);
			glDeleteRenderbuffersEXT(2, renderbuffers);
			glDeleteFramebuffersEXT(1, &framebuffer);
			NSLog(@"Error. Frame buffer not complete.");
			return nil;
			}
		// code to draw content to the renderbuffer
		
		
		if (!save_selection)
			glViewport (0, 0, openGLRect.size.width, openGLRect.size.height);
		else
			{// shift and scale our drawing
			scale_factor = ((float)save_width) / view_selection_rect.size.width;
			glViewport (-view_selection_rect.origin.x *	scale_factor,
						-view_selection_rect.origin.y *	scale_factor,
						camera.viewWidth *				scale_factor,
						camera.viewHeight *				scale_factor );  
			}
			
		glClear (GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

		[theMeshObject drawMesh];
		

		glPixelStorei(GL_PACK_ALIGNMENT, 8);
		glReadPixels(	openGLRect.origin.x, 
						openGLRect.origin.y, 
						openGLRect.size.width, 
						openGLRect.size.height, 
						GL_RGB, 
						GL_UNSIGNED_BYTE, 
						theRepData);			//theRepData points to bitmapData from an NSBitmapImageRep set up with it's rows aligned on 8-byte boundaries
		opengl_errchk("glReadPixels");
		
		glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0); //Now unbind the framebuffer 
		opengl_errchk("glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0)");

		glDeleteRenderbuffersEXT(2, renderbuffers);	//delete both renderbuffer objects
		opengl_errchk("glDeleteRenderbuffersEXT(2, renderbuffers)");
		glDeleteFramebuffersEXT(1, &framebuffer);
		opengl_errchk("glDeleteFramebufferEXT(GL_FRAMEBUFFER_EXT, framebuffer)");
		glViewport (0, 0, camera.viewWidth, camera.viewHeight);	//restore the viewport back to it's original value.
		opengl_errchk("glViewport");
		}

NiCo1 · March 25, 2008, 8:42am

One thing’s for sure, your system will get sluggish because of the limited amount of VRAM. If you do the math:

4096x4096x4(GL_RGB == GL_RGBA8 == 4 bytes) = 64MB
+
4096x4096x4(GL_DEPTH_COMPONENT most likely packed 24 bit depth and 8 bit stencil = 4 bytes) = 64MB

128MB -> half your VRAM without textures/models for your app and OS-reserved memory.

I’m not sure how renderbuffers are implemented in the driver, but if it’s the same as 2D textures it’s possible that they use data padding to get power-of-2 in dimensions. In that case allocating a 4097x4097 renderbuffer would take up the space of a 8192x8192 renderbuffer which is well over your VRAM limit. Try allocating 4096x4096 renderbuffers first and then 4097x4097 renderbuffers and see if the performance changes dramatically. If this is the case, you can try to circumvent this issue by using texture rectangles and binding the textures to your FBO rather than renderbuffers.

Duncan_Champney · March 25, 2008, 3:38pm

NiCo,

Having the machine get sluggish during the short time I’m writing a large 3D image to disk is acceptable to me. I’m not trying to do animation on these huge images, just create a still on disk.

I’m creating my renderbuffer as RGB8, so it’s 3 bytes/pixel. I’m not using a stencil buffer, but I am using a depth buffer. I think that is 16 bits/pixel, so that’s like 83+ megabytes, not counting the padding to give me 8 bytes/row alignment. Your point about using up a lot of VRAM is well taken.

Do you really think the driver might be allocating a square power-of-two buffer for a renderbuffer? Why would it do that? That would waste a horrific amount of memory. It’s a big reason I decided to use renderbuffers instead of texture objects.

I’d like to get the code working properly on my machine, but more importantly I’d like to get it working on ANY Mac running OS 10.4 or later, and have a way of figuring out how big a renderbuffer I can create without creating a black image, or worse yet locking up the machine. Are there ways to figure out what the driver can handle?

I’m beginning to think that for large plots (like 2000x2000 or larger) I should set up my viewport to render part of the plot, then copy it in pieces to a smaller NSBitmapImageRep, and then use Cocoa calls to copy my smaller NSBitmapImageRep over to my full-sized NSBitmapImageRep. That would be a lot of work, but it should let me create an arbitrary sized image.

Also, is there a straightforward way to tell if your app is leaking VRAM? A few times my app has rendered my Mac REALLY slow, even after I quit. That makes me think I might be failing to release a large data structure in VRAM, since the application’s footprint in main memory is released when the app quits. I’ve used the OpenGL Driver Monitor a little, but the statistics it shows on VRAM use are not very fine-grained.

NiCo1 · March 25, 2008, 3:59pm

I’m not sure if it’s actually padding the data. But I know it’s true for 2D textures as you can see in this post. I was merely pointing out that it is possible. The only way to check this is experimentally by allocating 4096x4096 renderbuffers first and then 4097x4097 renderbuffers and see if the performance changes dramatically.

I believe the limits on the graphics card are only dependent on the chip architecture and do not take memory size into account. So large renderbuffers would easily fit on a GF8800 ultra while it would result in a performance drop on GF8xxx with less memory.

In theory, if your query of the max renderbuffer size is larger than the size you’re allocating, it should work. But if the memory size is too small it will have to start swapping which leaves more room for possible bugs in the driver. Swapping shouldn’t be a problem if it can swap currently unused data, but I believe it’s logical to assume that the whole renderbuffer should be in VRAM before you can start drawing to it.

So unless you’re writing apps for architectures with a minimum Vram requirement it’s best to split it up into smaller chunks, and even better, re-use a single/smaller renderbuffer and modify your projection matrix to render different chunks of you whole viewport just like you said.

I don’t think there’s a straightforward way to tell if your app is leaking VRAM. If you encapsulated some objects such as textures in your code you can try adding a static variable that tracks the number of objects of that type you created and check this against the GLuint value that is returned by the glGen call of that objects. AFAIK OpenGL returns the first unused value, so if you have 5 objects and it returns 10 on the next glGen call of that type you probably forgot to delete some objects. I’m only speculating right now so don’t shoot me if I’m wrong

PaladinOfKaos · March 25, 2008, 10:41pm

Sounds like a driver bug. Just like any RAM used by your app is freed when the app shuts down, any VRAM used should be freed by the driver.

It’s also possible that the driver is swapping other stuff back into VRAM - how long does the slowdown last after closing down your app?

Duncan_Champney · March 26, 2008, 6:36am

Sounds like a driver bug. Just like any RAM used by your app is freed when the app shuts down, any VRAM used should be freed by the driver.

It’s also possible that the driver is swapping other stuff back into VRAM - how long does the slowdown last after closing down your app? [/QUOTE]

PaladinOfKaos,

Does the driver keep track of the process/thread that creates objects in VRAM, then release them when the app quits?

As best I can tell, the slowdown is permanent. I’ve had to restart my machine to fix it. Other times the machine locks up completely, and I’ve had to force a restart to get it back.

Duncan_Champney · March 26, 2008, 7:10am

NiCo1:

I’m not sure if it’s actually padding the data. But I know it’s true for 2D textures as you can see in this post. I was merely pointing out that it is possible. The only way to check this is experimentally by allocating 4096x4096 renderbuffers first and then 4097x4097 renderbuffers and see if the performance changes dramatically.

I believe the limits on the graphics card are only dependent on the chip architecture and do not take memory size into account. So large renderbuffers would easily fit on a GF8800 ultra while it would result in a performance drop on GF8xxx with less memory.

In theory, if your query of the max renderbuffer size is larger than the size you’re allocating, it should work. But if the memory size is too small it will have to start swapping which leaves more room for possible bugs in the driver. Swapping shouldn’t be a problem if it can swap currently unused data, but I believe it’s logical to assume that the whole renderbuffer should be in VRAM before you can start drawing to it.

So unless you’re writing apps for architectures with a minimum Vram requirement it’s best to split it up into smaller chunks, and even better, re-use a single/smaller renderbuffer and modify your projection matrix to render different chunks of you whole viewport just like you said.

I don’t think there’s a straightforward way to tell if your app is leaking VRAM. If you encapsulated some objects such as textures in your code you can try adding a static variable that tracks the number of objects of that type you created and check this against the GLuint value that is returned by the glGen call of that objects. AFAIK OpenGL returns the first unused value, so if you have 5 objects and it returns 10 on the next glGen call of that type you probably forgot to delete some objects. I’m only speculating right now so don’t shoot me if I’m wrong

NiCo,

I’m pretty sure that the driver isn’t doing square power-of-two sized renderbuffers. I don’t see any real change in performance between a 4096x4096 and a 4097x4097 renderbuffer, and I can also create a 8192x512 renderbuffer just fine, at least in terms of having enough memory (my code is having some sort of calculation problem - I get the WRONG part of the 3D image when I create such a long, skinny rendebuffer. I think I’m having rounding errors in the way I set up my viewport. I tried converting my code to use double precision and rounding in the code that calculates my viewport, but it’s still off. See the code I posted above. Look for the call to glViewPort.)

I think you may be right about bugs in the driver. I think it’s getting unstable when it’s VRAM-starved.

Duncan_Champney · March 26, 2008, 11:10am

I figured out why my code is exporting the wrong part of my drawing when I create a small selection and try to write that part of the image bigger.

It turns out that there is a fairly small limit on the height/width of a viewport you can create. My code was scaling the viewport to be larger than the renderbuffer, so the rendering would be scaled. If I select a small part of the 3D window, and try to output it to a larger size, I was exceeding the maximum size of my viewport.

On my machine at least, the maximum height/width of a viewport is 8192, which is the same as the maximum size of a renderbuffer. That means that I can’t expand a small selected area of my 3D drawing very much before I hit the limit on the size of a viewport.

I guess if I want to be able to output an arbitrarily large version of my 3D view I will need to cut the image into a grid of rectangles and write them to smaller bitmaps, which I then assemble together into a larger bitmap. This is a real pain in the a**, because I’ll have to allocate a smaller bitmap object for my grid of rectangles, plus separate bitmap objects for the leftover sections when my output size isn’t evenly divisible by a useful value.

Duncan_Champney · March 26, 2008, 11:47am

In my reading I haven’t seen any documentation on how to tell how much VRAM is available, so I can tell if my code is likely to lock up the machine.

The only code I’ve found to do this is Core Graphics Library code. Is that what I need to do? I would think there would be OpenGL calls to let me query the amount of total/available VRAM in the current renderer.

I would need to know the total VRAM size, the amount that’s currently free, and how much COULD be free if the system flushed what it could back to main memory.

Since I’m doing static saves, having the machine stall for a few seconds while I write a file to disk is acceptable. Having it create a black image or lock the machine is clearly <u>NOT</u> acceptable.

Relic · March 26, 2008, 4:15pm

Guess what, that has been solved before. Look at this:
http://www.mesa3d.org/brianp/TR.html

Relic · March 26, 2008, 4:21pm

I have no clue about developing on Macs, but generally core OpenGL is not offering any kind of these informations. It’s window system agnostic.

Reading out the amount of free VRAM is an outdated number as soon as you look at it, because VRAM is a shared system resource and no matter what you got at one point in time, the next moment another application can have taken that.

(Even worse on Vista where that is part of the OS and every application might get the full amount of VRAM reported and the OS will try to keep the fighting applications under control. People know how “good” that worked out.)

Duncan_Champney · March 26, 2008, 4:59pm

[quote=“Relic”]

Guess what, that has been solved before. Look at this:
http://www.mesa3d.org/brianp/TR.html [/QUOTE]

Relic,

Wow. Thank you. That’s awesome. I wish I had known about this about 2 weeks ago. That looks like it does everything I need. It looks like it can even use the back buffer from the current drawing context rather than FBOs. My code is currently checking for FBO support and only offerering to render my images at larger than screen size if FBOs are available.

Duncan_Champney · March 26, 2008, 5:01pm

I have no clue about developing on Macs, but generally core OpenGL is not offering any kind of these informations. It’s window system agnostic.

Reading out the amount of free VRAM is an outdated number as soon as you look at it, because VRAM is a shared system resource and no matter what you got at one point in time, the next moment another application can have taken that.

(Even worse on Vista where that is part of the OS and every application might get the full amount of VRAM reported and the OS will try to keep the fighting applications under control. People know how “good” that worked out.) [/QUOTE]

Relic,

Recent versions of Mac OS X are much the same way. Everything is done with OpenGL at the system level, and apps have to compete with the system for video resources.

arekkusu · March 28, 2008, 3:19am

s/Recent/All/

Duncan_Champney · March 28, 2008, 5:22am

???

ZbuffeR · March 28, 2008, 5:56am

http://en.wikipedia.org/wiki/Perl#Uses

Duncan_Champney · March 28, 2008, 6:43am

ZbuffeR,

Did you mean to link to that wiki article? That is an article on Perl, and I don’t see anything in it related to OpenGL.

ZbuffeR · March 28, 2008, 7:17am

Yes I really meant it.
You seemed to be surprised by the s/Recent/All/ from arekkusu. He simply meant “replace Recent by All”, and this is a syntax you can use with Perl and vi/vim, as I pointed out with this helpful Perl example on wikipedia.

Ah the kids nowadays…

Duncan_Champney · March 28, 2008, 7:44am

ZbuffeR,

I’m no kid. I’ve been working with computers since the late 1970s, and doing it professionally since the early 1980s. However, I’ve never learned PERL. The number of specialized things one can learn is just about limitless. I can still write 6502 assembler off the top of my head, for example, but that’s a skill that doesn’t get much use these days.

I still don’t get the meaning of the “replace recent by all.” I’m up to my neck in code, and don’t know the shorthand for this board. Can somebody take pity on me and translate the shorthand into english?

ZbuffeR · March 28, 2008, 7:51am

Oups sorry grandpa! Now its me that looks like an annoying kid

Back on topic :
“All versions of Mac OS X are much the same way. Everything is done with OpenGL at the system level, and apps have to compete with the system for video resources.”
I would even say that it is the same for all multitasking systems.

Preflight creat large FBOs, detecting errors

4096x4096x4(GL_RGB == GL_RGBA8 == 4 bytes) = 64MB + 4096x4096x4(GL_DEPTH_COMPONENT most likely packed 24 bit depth and 8 bit stencil = 4 bytes) = 64MB

4096x4096x4(GL_RGB == GL_RGBA8 == 4 bytes) = 64MB
+
4096x4096x4(GL_DEPTH_COMPONENT most likely packed 24 bit depth and 8 bit stencil = 4 bytes) = 64MB