which one is the best way to download image?

the dowload(glReadPixels()) performance is the key to my program,so I think out 3 ways to do it:

method1 :
void RenderFunc()
{
glReadPixels();
RenderScene();
}

method2 :
void RenderFunc()
{
RenderScene();
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB,pbo0);
glReadPixels();
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB,pbo1);
BYTE * pMapBuffer = glMapBuffer();
copyBuffer(pDstBuffer,pMapBuffer);
glUnmapBuffer();
}

method3 :
void RenderFunc()
{
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB,pbo0);
glReadPixels();
RenderScene();
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB,pbo0);
BYTE * pMapBuffer = glMapBuffer();
copyBuffer(pDstBuffer,pMapBuffer);
glUnmapBuffer();
}

method2 and method3 use PBO to do asynchronous download,so I think they are better than method1(my test also prove it),but between method2 & method3,which is the best?

I would say 2 seem to allow more parallelism between renderscene and readpixels.
Why do you need the copyBuffer ?

glReadPixels is blocking call if you pass pointer to sysmem. But, if you bind PBO before glReadPixels call, function will return immediatly.

Best way is to do PBO doublebuffering… ie… post glReadPixels in pbo1 and read (memcpy) previous frame from pbo2. Then just swap pbo1 and pbo2 names.

As yooyo already hinted, you won’t see a performance gain with your code, because there is no parallelism. It is true that in your second method the ReadPixels starts asynchronously, but calling MapBuffer immediately therafter blocks the CPU till the read is complete. Similar, in the third method the rendering of the scene will have to wait till the read operation is complete. Either you use two rendering targets as yooyo suggested or introduce some heavy CPU work in your method2: (insert SomeHeavyCPUCode() betweed ReadPixels and MapBuffer)

Sorry,the method2 code should like below:

void RenderFunc()
{
RenderScene();
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB,pbo0);
glReadPixels();
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB,pbo1);
BYTE * pMapBuffer = glMapBuffer();
copyBuffer(pDstBuffer,pMapBuffer);
glUnmapBuffer();
swap(pbo0,pbo1);
}

I think I should explain a fact : the “RenderFunc()” is called every 80ms,so in method2 there is no wait to do because pbo0 & pbo1 are mapped every 80ms,in 80ms download operation should be complete;in method3 “RenderScene()” call will host 5-6ms,so it is the “HeavyCPUCode”?

It is very stange,in my machine,I found method3 is the best one,and I found the key is the “glReadPixels()” call;In method2,If the “RenderScene()” is not a heavy work,“glReadPixels()” return immediatly,but if it is “glReadPixels()” will host 5ms,in method3,“glReadPixel()” all return immediatly;It is strange,who can explain it ?