PBO + FBO performance

I have an FBO which i am rendering offscreen to.

Anyway i want to read the contents of this FBO into main memory so I can use it.

I’ve tried …

glGetTexImage <- SLOW

then

glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glReadPixels()

which is slow.

Then i’ve tried creating a PBO

something like this

glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB,10000);

glPushAttrib	(GL_PIXEL_MODE_BIT);
glReadBuffer	(GL_COLOR_ATTACHMENT0_EXT);
glReadPixels  (0,0,textureWidth,textureHeight,GL_BGR_EXT,GL_UNSIGNED_BYTE,BUFFER_OFFSET(0));
pfn_glPopAttrib		();

//aviCapture->captureFrame(buffer);

glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB);
glBindBufferARB	(GL_PIXEL_PACK_BUFFER_ARB, 0); 

anyway that works, but its exactly the same speed as just using readpixels with the glReadBuffer(GL_COLOR_ATTACHMENT0_EXT), in some cases it might actually be slower.

My program works at about 75fps normally, with read pixels with PBO attached i get about 18fps. Just using read pixels the conventional way i get also 18fps.

What am I doing wrong ? I appear to be getting no fps improvement at all.

I am using Vista + a quadro 3400/400 card which is something like an nvidia 6800 card.

Normal glReadPixel operation is blocking call. But when you use glReadPixels with PBO then it is nonblocking call. What you have to do is to create two PBO buffers… once per frame copy data from first PBO to sysmem (or codec) and use second PBO to start glReadPixels. Then swap PBO buffers. In next frame do the same.

You will get framebuffer data with one frame behind but it will not stall your CPU.

hmm
i tried this

if(primaryBuffer) {
	pfn_glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB,10000);
	buffer = (UCHAR*) pfn_glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY); 

	aviCapture->captureFrame(buffer);

	pfn_glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB);

	pfn_glBindBufferARB	(GL_PIXEL_PACK_BUFFER_ARB,10001);
	pfn_glReadPixels	(0,0,textureWidth,textureHeight,GL_BGR_EXT,GL_UNSIGNED_BYTE,BUFFER_OFFSET(0));		 
	pfn_glBindBufferARB	(GL_PIXEL_PACK_BUFFER_ARB,0);

	primaryBuffer = false;
}
else {
	pfn_glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB,10001);
	buffer = (UCHAR*) pfn_glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY);  

	aviCapture->captureFrame(buffer);

	pfn_glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB);

	pfn_glBindBufferARB	(GL_PIXEL_PACK_BUFFER_ARB,10000);
	pfn_glReadPixels	(0,0,textureWidth,textureHeight,GL_BGR_EXT,GL_UNSIGNED_BYTE,BUFFER_OFFSET(0));		
	pfn_glBindBufferARB	(GL_PIXEL_PACK_BUFFER_ARB,0);

	primaryBuffer = true;
}

but i still only get 18fps. No speed increase at all just over using glReadPixels :frowning: Without the calls above i get 75fps.

75, 18 fps ? You are vsynced, not good for benchmarking…

Try to bench w/o aviCapture->captureFrame(buffer) calls.
Do you haver any gl errors. Call glGetError before and after readback.

And finally… you say its a NV4x based GPU. What about chipset? Is it Intel or SIS, VIA? Is it AGP or PCI-X?

its PCI-E
intel chipset

and i was benching without the avicapture calls

No expert in this, but some wild thoughts:

  • I see you’re using GL_BGR_EXT, however isn’t it likely that the framebuffer has an alpha channel (even though you didn’t ask for one)? Also, is that the native format for the framebuffer (and not GL_RGBA)? Afaik, in either case the driver would have to convert the texture before presenting it to you. I’d try different pixel formats and see if speed improves.

  • Are you bandwidth limited, ie if you decrease the size of the ReadPixel call, does the speed increase?

my FBO i only requested RGB format, but I could try requesting RGBA and yes if the resolution is lower the frame rate increases.

Very strange… did you tried that on another machine? Do you have example to reproduce problem, so I can test it here? Do you use dualview mode?

i dont have any other machines to test it on currently,
i do use dualview, i’ll try disabling that. Worth a shot.

I’m thinking perhaps my gfx card is just too old.

sm4 hardware can be had for around 50 clams…

one of the nvidia SDK demos is a PBO texture performance demo
http://developer.download.nvidia.com/SDK/9.5/Samples/DEMOS/OpenGL/TexturePerformancePBO.zip

Anyway … on my quadro card, downloading textures theres very little difference in speed between using just straight glReadPixels and using it with a PBO(<5%). That normal ? There is a difference uploading textures to the gfx card with multi PBO, but the performance difference is 10-15% maybe. Not huge.

I was expecting somewhat more.

This is something related to your hw setup… chipset, driver or gfx card. Try to borrow proper gfx card, or try your code on another machine.

what sort of performance difference do you get on your h/w with the above demo ?

On my 8800GT, download rates are:
glTexSubImage: 1162
PBO: 1760
Multi PBO: 1625

For readback, it’s actually slightly slower using PBO:
glReadPixels: 1190
PBO Readback: 1045

with the default program settings (ie PBO source)

my readback rates are

94 with readpixels
94 with readpixels + PBO

which is somewhat surprising. What’s more surprising is your card has 10x the readback speed of mine :smiley: