glReadPixels with PBO's stalls every nth iteration

I use PBOs for fast read back of pixels, i.e., such that glReadPixels() does not stall anymore. I get the strange problem that every n-th iteration, glReadPixels() stalls anyways, where n depends on the size of the read image (n=7 for 1920x1080 and n=31 for 720x576). This happens even if I call glReadPixels() several times in a row, with Sleep 100 ms in-between to make sure that the command is finished. The trial code yielding the described behavior looks as follows:


renderScene();
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[0]);
for (int i = 0; i < 100; i++) {
  glReadPixels(0,0,width,height, GL_BGRA, GL_UNSIGNED_BYTE, 0);
  Sleep(100);
}

The timings I get for glReadPixels for 1920x1080 is around 10 ms every 7th and about 0.05 ms for all other calls.
I’m using a NVIDIA GeForce 8800 GTS 512. Because the problem depends on width and height I could imagine that there is some OpenGL memory issue, but actually I have no idea what’s going wrong.

Any help and ideas are highly appreciated, Thx, Ritsch

This code sample is wrong. When PBO is bind, glReadPixels is non-blockable call. Any CPU time measuring in this case is just wrong, because it will measure how much time driver need to insert glReadPixels into command stream, which is useless information. Im suggesting you to do PBO duublebuffering:
bind pbo1
post readpixels
bind pbo2
map buffer and copy data
bind 0
swap pbo1 and pbo2 names

In case of large buffers two pbo’s might not be enough, so you can easly extend this example to use more than two PBO’s, and make triple or quad buffering. In this case, map buffer N and copy data while post read pixels in 1st pbo. Then just shift pbo names.

If you need to measure time on GPU side, read GL_EXT_timer_query specification.

edit:
I just did some tests… on my c2D T7700 @ 2.4GHz + Quadro 1600M (just like 8600GTS) + WinXP SP2, readback speed is ~1050 MB/sec. My app grab 1920x1200xBGRA screen @ ~120fps

That’s exactly the problem: It should be non-blockable but every n-th time it blocks anyways. The example I used was just to show what happens, in my real code I do use multiple buffering. There it showed that it does not make a difference whether I use two or more buffers.

Uhm… I can confirm this… On my machine (HP 8710w) glReadPixels stall after every ~47MB of readbacked data. I can readback a lot of small screen or several huge screen, but as soon it cross magic limit glReadPixels stall.

I did test backbuffer readback or fbo radback and it happens in both cases.

Maybe the driver use a fixed cpu memory block to readback data, and must perform some sort of garbage collection after a while ?

I test wide range of NVidia cards in our office… from 6600AGP, Quadro1500M, 8500GT, … with different drivers and all suffer from same problem… readback stall after 47.5MB.

Can anyone from NVidia explain this?

Can anyone who has experienced this please post the driver version they are using? We’ll take a look at this.

Thanks!
Barthold

It happens with FW 6.14.11.7474 on Quadro 1600M (in HP 8710w).

This should be fixed now. The beta driver 177.79 is now available with this fix in it.To find your driver version, start the NVIDIA control panel, go to help->system information.

http://www.nvidia.com/Download/Find.aspx?lang=en-us

Barthold

I see a beta driver for GeForce but not Quadro FX.

I’m running Quadro FX cards - any news on a fix for those? I’m experiencing the same problem on Vista64 using the latest Quadro FX driver version 169.96.