Reading the framebuffer - fast

cix_foo · February 11, 2002, 2:44pm

For reasons I won’t go in to I need to read the entire frame buffer, RGBA, every frame, and blit it to another, non-GL graphics card in the same system, every frame.

The primary GL card is Nvidia (GF2 or 3). glReadPixels() seems to be pathetically slow. I hope I’m doing it right (using 8_8_8_8 etc)

Is there a better way of doing this? (On Win32)? Perhaps using DirectX?

Cas

imported_jwatte · February 11, 2002, 2:47pm

I don’t think they have DMA channels that go that way. Well, they might, as they do have to read back textures if you use glCopyTexSubImage() and that texture then gets purged from VRAM. Or maybe that purging would just be slow.

Even if they have DMA channels going toward the bus, their drivers don’t expose them for you to actually use, AFAICT. Perhaps if you sent their devsupport e-mail and bags of money, they could help you – assuming the hardware is at all cooperative.

SirKnight · February 11, 2002, 4:53pm

You can send me the bags of money and ill make sure the dev support helps you out.

-SirKnight

nexusone · February 14, 2002, 4:46am

What does every body think they are going top-secret work! J
Half the time I find that what they are trying to do has been done and they just don’t know it.
What do you mean by every frame, video frame speed of say 60 fames a second or monitor refresh rates 70+.
Do you want to do any type of video processing to the image on one to the other or maybe use a mask?
Just copying data from one point to another then maybe you should not use openGL at all, but an assembly routine.

Then again there maybe something in the windows direct draw that could do this, since windows supports multiple video cards.

Originally posted by cix>foo:
[b]For reasons I won’t go in to I need to read the entire frame buffer, RGBA, every frame, and blit it to another, non-GL graphics card in the same system, every frame.

The primary GL card is Nvidia (GF2 or 3). glReadPixels() seems to be pathetically slow. I hope I’m doing it right (using 8_8_8_8 etc)

Is there a better way of doing this? (On Win32)? Perhaps using DirectX?

Cas [/b]

cix_foo · February 14, 2002, 5:37am

No, it’s not top secret really, it’s just to avoid a lot of anal holier-than-thou comments about not using glReadPixels and other back-across-the-bus GL methods mainly on the strength that nvidia’s drivers are a bit slow in this respect (SGI Onyx for example runs like greased lightning but that’s another story…)

I need to do it at PAL frame rates which is fortunately 50hz. I seem to be achieving this OK with EXT_packed_pixels and reading into AGP ram and then a custom memcpy routine to system RAM (NEVER use memcpy! The ****ers! It does it a byte at a time!)

Oh yeah, and I couldn’t write a line of x86 so that’s out of the question Besides I haven’t any way of getting past the drivers. Or knowledge in that respect.

Cas

[This message has been edited by cix>foo (edited 02-14-2002).]

Husted · February 14, 2002, 8:28am

I need to do it at PAL frame rates which is fortunately 50hz. I seem to be achieving this OK with EXT_packed_pixels and reading into AGP ram and then a custom memcpy routine to system RAM (NEVER use memcpy! The ****ers! It does it a byte at a time!)[/b]

I cannot see any significant performance increase if I readback to AGP memory…

What is your transfer rate in MB/s? And what is you screen resolution?

– Niels

chrisATI · February 14, 2002, 8:30am

Originally posted by cix>foo:
[b]No, it’s not top secret really, it’s just to avoid a lot of anal holier-than-thou comments about not using glReadPixels and other back-across-the-bus GL methods mainly on the strength that nvidia’s drivers are a bit slow in this respect (SGI Onyx for example runs like greased lightning but that’s another story…)

I need to do it at PAL frame rates which is fortunately 50hz. I seem to be achieving this OK with EXT_packed_pixels and reading into AGP ram and then a custom memcpy routine to system RAM (NEVER use memcpy! The ****ers! It does it a byte at a time!)

Oh yeah, and I couldn’t write a line of x86 so that’s out of the question Besides I haven’t any way of getting past the drivers. Or knowledge in that respect.

Cas

[This message has been edited by cix>foo (edited 02-14-2002).][/b]

as a point of interest, this code path (glReadPixels) is accelerated on Radeon cards. It runs quite fast on Win2k and WinXP and acceleration will be in Win9x drivers Real Soon Now.

Husted · February 14, 2002, 8:50am

Originally posted by chrisATI:
as a point of interest, this code path (glReadPixels) is accelerated on Radeon cards. It runs quite fast on Win2k and WinXP and acceleration will be in Win9x drivers Real Soon Now.

Chris - when can we see official ATI drivers for Radeon on Linux?

– Niels

nexusone · February 14, 2002, 9:00am

You know they make hardware that converts NTSC video to PAL video.

I am not sure about the ATI cards, they offer both NTSC and PAL versions… Are they hardware fixed or software controled.
That would be a nice feature if you needed to switch between the two.

That is what made the Amiga a nice system, one hardware to do them all…

Originally posted by cix>foo:
[b]No, it’s not top secret really, it’s just to avoid a lot of anal holier-than-thou comments about not using glReadPixels and other back-across-the-bus GL methods mainly on the strength that nvidia’s drivers are a bit slow in this respect (SGI Onyx for example runs like greased lightning but that’s another story…)

I need to do it at PAL frame rates which is fortunately 50hz. I seem to be achieving this OK with EXT_packed_pixels and reading into AGP ram and then a custom memcpy routine to system RAM (NEVER use memcpy! The ****ers! It does it a byte at a time!)

Oh yeah, and I couldn’t write a line of x86 so that’s out of the question Besides I haven’t any way of getting past the drivers. Or knowledge in that respect.

Cas

[This message has been edited by cix>foo (edited 02-14-2002).][/b]

Moshe_Nissim · February 14, 2002, 10:43am

Originally posted by cix>foo:
… methods mainly on the strength that nvidia’s drivers are a bit slow in this respect (SGI Onyx for example runs like greased lightning but that’s another story…)

I timed this operation at around 4.4 ms for PAL field resolution.
AFAIR Onyx is not much faster, if at all

cix_foo · February 15, 2002, 1:53am

chris, that’s interesting; we’ve got a Radeon kicking about so we could try it out. Say, why have you not got a GL card which will output YUV and a separate linear key (eg. the alpha buffer) as a composite signal? You do realise the broadcast market would practically die from heart attacks if you did? (Needs genlock and SDI option too mind) Only Matrox have an offering that does this (CG2000) but to the best of my knowledge it has no GL support, just D3D.

Moshe, I haven’t timed it yet, and to be honest it’s second hand info from the BBC’s 3d graphics team who use an Onyx 2 but he was fairly adamant that the Onyx could read pretty fast. Unless of course he doesn’t know about EXT_packed_pixels and the like on PC hardware in which case it’ll take him rather longer than 4ms.

I seem to be getting an acceptable framerate so no worries right now. The big worry is blitting it down the PCI bus to a Matrox Digisuite. If this takes 4ms as well it means there’s less than 10ms to render the whole frame. Normally not too much of a problem but we aren’t allowed a single glitch or dropped frame so we’ve got to err on the side of caution.

Cas

Moshe_Nissim · February 15, 2002, 3:07am

Originally posted by cix>foo:

Moshe, I haven’t timed it yet, and to be honest it’s second hand info from the BBC’s 3d graphics team who use an Onyx 2 but he was fairly adamant that the Onyx could read pretty fast. Unless of course he doesn’t know about EXT_packed_pixels and the like on PC hardware in which case it’ll take him rather longer than 4ms.

No, I was not using packed pixels when I timed this. Just GL_UNSIGNED_BYTE on the host memory side, and a 32-bit window buffer. I hope you understand that my 4ms time is for the nVidia, not the Onyx!

. The big worry is blitting it down the PCI bus to a Matrox Digisuite. If this takes 4ms as well it means there’s less than 10ms to render the whole frame. Normally not too much of a problem but we aren’t allowed a single glitch or dropped frame so we’ve got to err on the side of caution.

IMHO, a decend video PCI board API should not make you wait for the transfer, you should issue the command, and let it DMA on its own without blocking your process. Sadly, GL doesn’t allow for this. But for the PCI, it should be in parallel.
Lucky for us, we don’t have to use the readback method at all…

nexusone · February 15, 2002, 5:20am

You do know about the Newtek video toster card for the PC? Turn’s a PC into a real time video editing machine/genlock/etc.
It has been around for some time, lot’s of TV stations use it for putting text/graphics on the screen for live video feeds.

[QUOTE]Originally posted by cix>foo:
[b]chris, that’s interesting; we’ve got a Radeon kicking about so we could try it out. Say, why have you not got a GL card which will output YUV and a separate linear key (eg. the alpha buffer) as a composite signal? You do realise the broadcast market would practically die from heart attacks if you did? (Needs genlock and SDI option too mind) Only Matrox have an offering that does this (CG2000) but to the best of my knowledge it has no GL support, just D3D.

cix_foo · February 15, 2002, 10:12am

Nexusone,
I do know about the video toaster etc. and indeed most other proprietry solutions in the industry after spending a hell of a lot of time researching! The point we’re trying to get across to the BBC is to stop spending all this money on weirdo proprietry stuff and go with an open API. Starting with OpenGL, running on Java; OpenML later, when someone actually releases hardware that supports it.

SGI should be quaking in their boots; the days of expensive proprietry hardware are numbered.

Cas

nexusone · February 15, 2002, 10:45am

That is how the video toster came about, TV stations needed a low cost solution for video graphics and overlay.

I wish you luck, that is now company’s get started with an idea.

BTW my major is Electronics engineering, computers and graphics is a hobby. But one project a few years back, I designed a real time video frame graber.

Originally posted by cix>foo:
[b]Nexusone,
I do know about the video toaster etc. and indeed most other proprietry solutions in the industry after spending a hell of a lot of time researching! The point we’re trying to get across to the BBC is to stop spending all this money on weirdo proprietry stuff and go with an open API. Starting with OpenGL, running on Java; OpenML later, when someone actually releases hardware that supports it.

SGI should be quaking in their boots; the days of expensive proprietry hardware are numbered.

Cas [/b]

cix_foo · February 15, 2002, 12:40pm

If you can figure out how to tweak a geforce so we can genlock it we’d pay you a lot of money

Cas

nexusone · February 15, 2002, 1:24pm

You would have more luck using a ATI Radeon All-n-wonder card. ATI has been big in the video playback and processing chip sets.
geforce is more targeted for the 3D gaming market.

Originally posted by cix>foo:
[b]If you can figure out how to tweak a geforce so we can genlock it we’d pay you a lot of money

Cas [/b]

cix_foo · February 16, 2002, 2:30am

I really couldn’t care less which hardware it was so long as it was under a grand and it had working, reliable GL drivers for more than one OS.

And I’d pay £2k if it had a composite linear key output and YUV.

And £3k if it had SDI.

Then I’d buy about, ooh, forty of them. To begin with.

Cas

Husted · February 17, 2002, 10:00pm

Originally posted by cix>foo:
If you can figure out how to tweak a geforce so we can genlock it we’d pay you a lot of money

The NVIDIA boards (Quadro2 Pro) that SGI used in their Graphics Clusters were able to synchronize [with something called ImageSync] - But I’m not really sure if this would be accurate enough for TV production. You could take a look at 3DLabs Wildcat II/Wildcat III - they come with an genlock option.

– Niels

cix_foo · February 17, 2002, 11:39pm

The Wildcat’s sadly useless as it costs £2,500 and doesn’t have a separate linear key output, so we’d probably just be better off using a cheapo Geforce and doing the framebuffer copy to a genlockable PCI card. Which is what we’re attempting.

Cas