the fastest way to get pixels from display card

my app need to read pixels from display card,I use “glReadPixels()” to do it,but the performance of the function is not satisfied me,so is there any OpenGL extenstions to do the same purpose and the speed is faster than “glReadPixels()”

NV_pixel_data_range significantly helped us with glReadPixels performance… not necessarily that the call was faster, but that it could execute asynchronously.

Aren’t there like 3 identical questions on this board right now? Curious.

Anyway, to read back with reasonable speed, you should make sure that you ask for the same format of pixels that your card uses internally, and ideally with the same alignment.

This typically (for 32-bit contexts on x86 machines) translates into GL_BGRA,GL_UNSIGNED_BYTE external pixel format. If you try to read back anything else, or if read back into an un-aligned buffer, or if your internal format is not 32-bit RGBA, then you’ll get a slow software path on many cards.

Thanks for OneSadCookie and jwatte,but what I want to know is the bandwidth of read back,the result I get is only 2M-3M bytes per second,I can’t believe it!I think my method may has some problem,so please tell me the bandwidth number of AGP read back?

JWatte, do you think the destination buffer should be aligned to any specific amount?

You might want to try what is mentioned in the green book on page 403. It certainly makes a difference for me. Just in case you don’t have the green book, it suggest minimizing the per-fragment operations during read/draw and copy pixel operations.

glDisable(GL_ALPHA_TEST);
glDisable(GL_LIGHTING);
glDisable(GL_LOGIC_OP);
glDisable(GL_TEXTURE_1D);
glDisable(GL_TEXTURE_2D);
glDisable(GL_DITHER);
glDisable(GL_STENCIL_TEST);
glDisable(GL_DEPTH_TEST); // seems to have a tremendous effect on performance
glDisable(GL_BLEND);
glDisable(GL_FOG);
glBindTexture(GL_TEXTURE_2D,0); // seriously effects the copypixel performance
glBlendFunc(GL_ONE,GL_ZERO);

glPixelZoom(1.0,1.0);
/*

  • Disable all unnecessary pixel transfer modes
    */
    glPixelTransferi(GL_MAP_COLOR, GL_FALSE);
    glPixelTransferi(GL_MAP_STENCIL,GL_FALSE);
    glPixelTransferi(GL_INDEX_SHIFT,0);
    glPixelTransferi(GL_INDEX_OFFSET,0);
    glPixelTransferf(GL_MAP_COLOR,false);

glPixelTransferf(GL_RED_SCALE,1.0);
glPixelTransferf(GL_GREEN_SCALE,1.0);
glPixelTransferf(GL_BLUE_SCALE,1.0);
glPixelTransferf(GL_ALPHA_SCALE,1.0);
glPixelTransferf(GL_DEPTH_SCALE,1.0);

glPixelTransferf(GL_RED_BIAS,0.0);
glPixelTransferf(GL_GREEN_BIAS,0.0);
glPixelTransferf(GL_BLUE_BIAS,0.0);
glPixelTransferf(GL_ALPHA_BIAS,0.0);
glPixelTransferf(GL_DEPTH_BIAS,0.0);
/*

  • Pixel store alignment
    */
    glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
    glPixelStorei(GL_UNPACK_ROW_LENGTH, 0);
    glPixelStorei(GL_PACK_ALIGNMENT, 1);

Hope This helps

Heath.

There was some article at nvidia about using GDI functions for getting back the front buffer. Look it up.

It’s better than a plain glReadPixels

From the faq on the NVidia site:

"BGRA is and always has been the fastest format to use. (There are some cases where RGBA is OK, and usually BGR is better than RGB, but in general, BGRA is the safest mode.)

The fastest performance you’ll get a readback is approximately 160-180 MB/s (~45 MPix/s) for RGBA/BGRA which is the GPU hardware limit (due to PCI reads on the memory interface). This is with a P4 1.5GHz and above class system. The readback rate doesn’t change significantly with the GeForce FX family. Note that you’ll get the highest performance when you read back large areas as opposed to small ones. "

Originally posted by V-man:
[b]There was some article at nvidia about using GDI functions for getting back the front buffer. Look it up.

It’s better than a plain glReadPixels[/b]

I can’t find the article, how much faster is it?

Originally posted by Adrian:
I can’t find the article, how much faster is it?

For source code, look at the NVidia SDK under Demos\OpenGL\src\shared\MovieMaker.cpp

Avi

If you do a glDisable(GL_BLEND); then glBlendFunc(,) is irrelavent. To be consistent, why doesn’t the code also change the AlphaFunc and DepthFunc to something easy?

Regarding alignment, I’d assume each row needs to be aligned on at least 4 bytes. The next bigger alignment size that might make sense is 8 bytes; the next up is cacheline size; the next up is page size. I don’t think anything > 8 bytes alignment is likely to matter.

Yes that’s it!
http://cvs1.nvidia.com/DEMOS/OpenGL/inc/shared/MovieMaker.h
http://cvs1.nvidia.com/DEMOS/OpenGL/src/shared/MovieMaker.cpp

Thanks, yes I have seen that before, I was hoping there would be some information as to how and why it is supposedly faster. I’m a little sceptical. If it is faster I would have expected to find information and benchmarks via google. I would also expect NVidia’s readpixel faq to recommend this as an alternative method but there is no mention of it. It doesnt add up.

Originally posted by jwatte:
I don’t think anything > 8 bytes alignment is likely to matter.

I don’t know about Windows/Linux, but on the Mac, it helps a great deal to have pointers 16-byte aligned (many system routines use altivec, and altivec requires 16-byte aligned pointers), and it helps a great deal to align large buffers to the size of a cacheline (32 bytes for G3 & G4; 128 bytes for G5). Page-aligned seems mostly to be overkill, but at least you’re guaranteed that it’s aligned the best way possible

OneSadCookie,

On X86, it also helps to align on 16-byte buffers if you want to use parallel instructions. Unfortunately, the data bus of the CPU is only 64 bits wide, so the wider alignment won’t give you any speed in copy operations.

Similarly, we’re copying large chunks from uncacheable memory to cacheable system memory (assuming it goes through the CPU) so the only benefit of aligning on cache lines would be avoiding the partial cache line eviction at the beginning/end of the large block – but the cost of the block would totally dwarf that.

If you manage to hit a fully-DMA path on the hardware, then the hardware doesn’t even see the cache, so anything more than 4 byte alignment would probably not be necessary – make that 8 for good measure :slight_smile:

To V-man:
You say the method of using GDI is better than “glReadPixel()”,why?I had browse the code,I know the core of it is screen capture,but do you think the speed of screen capture is faster than the API operating hardware directly?I always think the performance of GDI function is not good,and MS had released the GDI+,so I think the method of NVSDK is not better than “glReadPixel()”,do you think is it right?

Originally posted by pango:
To V-man:
You say the method of using GDI is better than “glReadPixel()”,why?I had browse the code,I know the core of it is screen capture,but do you think the speed of screen capture is faster than the API operating hardware directly?I always think the performance of GDI function is not good,and MS had released the GDI+,so I think the method of NVSDK is not better than “glReadPixel()”,do you think is it right?

You can always benchmark and see for yourself. I havent benchmarked but I think (and others have said so) that it is faster.

#1 GDI is hardware accelerated (some functions may not be available)
#2 GDI+ is GDI with a few extras plus it is OO. The primary reason for its existance is OO design and not performance or hw accel.

Originally posted by V-man:
[b] You can always benchmark and see for yourself. I havent benchmarked but I think (and others have said so) that it is faster.

#1 GDI is hardware accelerated (some functions may not be available)
#2 GDI+ is GDI with a few extras plus it is OO. The primary reason for its existance is OO design and not performance or hw accel.

[/b]

I benchmarked it a while back (before I knew of NV_PIXEL_DATA_RANGE) and the GDI was much faster. Not sure with the extension, but it’s important to test. I don’t remember the actual numbers.

The reason the GDI func was fast, I recall, was that it’s used for MS Video for Windows, which was a high priority for MS (both fast reads and writes to the framebuffer for obvious reasons).

Avi

I haven’t had the chance to look at the GDI specs, but will it allow you to specify the type of pixels you want to read back? (GL_READ, GL_DEPTH etc?)

~Main