PDA

View Full Version : Reading the framebuffer - fast



cix>foo
02-11-2002, 03:44 PM
For reasons I won't go in to I need to read the entire frame buffer, RGBA, every frame, and blit it to another, non-GL graphics card in the same system, every frame.

The primary GL card is Nvidia (GF2 or 3). glReadPixels() seems to be pathetically slow. I hope I'm doing it right (using 8_8_8_8 etc)

Is there a better way of doing this? (On Win32)? Perhaps using DirectX?

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

jwatte
02-11-2002, 03:47 PM
I don't think they have DMA channels that go that way. Well, they might, as they do have to read back textures if you use glCopyTexSubImage() and that texture then gets purged from VRAM. Or maybe that purging would just be slow.

Even if they have DMA channels going toward the bus, their drivers don't expose them for you to actually use, AFAICT. Perhaps if you sent their devsupport e-mail and bags of money, they could help you -- assuming the hardware is at all cooperative.

SirKnight
02-11-2002, 05:53 PM
You can send me the bags of money and ill make sure the dev support helps you out. http://www.opengl.org/discussion_boards/ubb/wink.gif

-SirKnight

nexusone
02-14-2002, 05:46 AM
What does every body think they are going top-secret work! J
Half the time I find that what they are trying to do has been done and they just donít know it.
What do you mean by every frame, video frame speed of say 60 fames a second or monitor refresh rates 70+.
Do you want to do any type of video processing to the image on one to the other or maybe use a mask?
Just copying data from one point to another then maybe you should not use openGL at all, but an assembly routine.

Then again there maybe something in the windows direct draw that could do this, since windows supports multiple video cards.


Originally posted by cix>foo:
For reasons I won't go in to I need to read the entire frame buffer, RGBA, every frame, and blit it to another, non-GL graphics card in the same system, every frame.

The primary GL card is Nvidia (GF2 or 3). glReadPixels() seems to be pathetically slow. I hope I'm doing it right (using 8_8_8_8 etc)

Is there a better way of doing this? (On Win32)? Perhaps using DirectX?

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

cix>foo
02-14-2002, 06:37 AM
No, it's not top secret really, it's just to avoid a lot of anal holier-than-thou comments about not using glReadPixels and other back-across-the-bus GL methods mainly on the strength that nvidia's drivers are a bit slow in this respect http://www.opengl.org/discussion_boards/ubb/smile.gif (SGI Onyx for example runs like greased lightning but that's another story...)

I need to do it at PAL frame rates which is fortunately 50hz. I seem to be achieving this OK with EXT_packed_pixels and reading into AGP ram and then a custom memcpy routine to system RAM (NEVER use memcpy! The ****ers! It does it a *byte* at a time!)

Oh yeah, and I couldn't write a line of x86 so that's out of the question http://www.opengl.org/discussion_boards/ubb/smile.gif Besides I haven't any way of getting past the drivers. Or knowledge in that respect.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif



[This message has been edited by cix>foo (edited 02-14-2002).]

Husted
02-14-2002, 09:28 AM
I need to do it at PAL frame rates which is fortunately 50hz. I seem to be achieving this OK with EXT_packed_pixels and reading into AGP ram and then a custom memcpy routine to system RAM (NEVER use memcpy! The ****ers! It does it a *byte* at a time!)[/B]

I cannot see any significant performance increase if I readback to AGP memory...

What is your transfer rate in MB/s? And what is you screen resolution?

-- Niels

chrisATI
02-14-2002, 09:30 AM
Originally posted by cix>foo:
No, it's not top secret really, it's just to avoid a lot of anal holier-than-thou comments about not using glReadPixels and other back-across-the-bus GL methods mainly on the strength that nvidia's drivers are a bit slow in this respect http://www.opengl.org/discussion_boards/ubb/smile.gif (SGI Onyx for example runs like greased lightning but that's another story...)

I need to do it at PAL frame rates which is fortunately 50hz. I seem to be achieving this OK with EXT_packed_pixels and reading into AGP ram and then a custom memcpy routine to system RAM (NEVER use memcpy! The ****ers! It does it a *byte* at a time!)

Oh yeah, and I couldn't write a line of x86 so that's out of the question http://www.opengl.org/discussion_boards/ubb/smile.gif Besides I haven't any way of getting past the drivers. Or knowledge in that respect.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

[This message has been edited by cix>foo (edited 02-14-2002).]

as a point of interest, this code path (glReadPixels) is accelerated on Radeon cards. It runs quite fast on Win2k and WinXP and acceleration will be in Win9x drivers Real Soon Now.

Husted
02-14-2002, 09:50 AM
Originally posted by chrisATI:
as a point of interest, this code path (glReadPixels) is accelerated on Radeon cards. It runs quite fast on Win2k and WinXP and acceleration will be in Win9x drivers Real Soon Now.

Chris - when can we see official ATI drivers for Radeon on Linux?

-- Niels

nexusone
02-14-2002, 10:00 AM
You know they make hardware that converts NTSC video to PAL video.

I am not sure about the ATI cards, they offer both NTSC and PAL versions..... Are they hardware fixed or software controled.
That would be a nice feature if you needed to switch between the two.

That is what made the Amiga a nice system, one hardware to do them all....


Originally posted by cix>foo:
No, it's not top secret really, it's just to avoid a lot of anal holier-than-thou comments about not using glReadPixels and other back-across-the-bus GL methods mainly on the strength that nvidia's drivers are a bit slow in this respect http://www.opengl.org/discussion_boards/ubb/smile.gif (SGI Onyx for example runs like greased lightning but that's another story...)

I need to do it at PAL frame rates which is fortunately 50hz. I seem to be achieving this OK with EXT_packed_pixels and reading into AGP ram and then a custom memcpy routine to system RAM (NEVER use memcpy! The ****ers! It does it a *byte* at a time!)

Oh yeah, and I couldn't write a line of x86 so that's out of the question http://www.opengl.org/discussion_boards/ubb/smile.gif Besides I haven't any way of getting past the drivers. Or knowledge in that respect.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

[This message has been edited by cix>foo (edited 02-14-2002).]

Moshe Nissim
02-14-2002, 11:43 AM
Originally posted by cix>foo:
... methods mainly on the strength that nvidia's drivers are a bit slow in this respect http://www.opengl.org/discussion_boards/ubb/smile.gif (SGI Onyx for example runs like greased lightning but that's another story...)


I timed this operation at around 4.4 ms for PAL field resolution.
AFAIR Onyx is not much faster, if at all

cix>foo
02-15-2002, 02:53 AM
chris, that's interesting; we've got a Radeon kicking about so we could try it out. Say, why have you not got a GL card which will output YUV and a separate linear key (eg. the alpha buffer) as a composite signal? You do realise the broadcast market would practically die from heart attacks if you did? (Needs genlock and SDI option too mind) Only Matrox have an offering that does this (CG2000) but to the best of my knowledge it has no GL support, just D3D.


Moshe, I haven't timed it yet, and to be honest it's second hand info from the BBC's 3d graphics team who use an Onyx 2 but he was fairly adamant that the Onyx could read pretty fast. Unless of course he doesn't know about EXT_packed_pixels and the like on PC hardware in which case it'll take him rather longer than 4ms.

I seem to be getting an acceptable framerate so no worries right now. The big worry is blitting it down the PCI bus to a Matrox Digisuite. If this takes 4ms as well it means there's less than 10ms to render the whole frame. Normally not too much of a problem but we aren't allowed a single glitch or dropped frame so we've got to err on the side of caution.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

Moshe Nissim
02-15-2002, 04:07 AM
Originally posted by cix>foo:

Moshe, I haven't timed it yet, and to be honest it's second hand info from the BBC's 3d graphics team who use an Onyx 2 but he was fairly adamant that the Onyx could read pretty fast. Unless of course he doesn't know about EXT_packed_pixels and the like on PC hardware in which case it'll take him rather longer than 4ms.

No, I was not using packed pixels when I timed this. Just GL_UNSIGNED_BYTE on the host memory side, and a 32-bit window buffer. I hope you understand that my 4ms time is for the nVidia, not the Onyx!



. The big worry is blitting it down the PCI bus to a Matrox Digisuite. If this takes 4ms as well it means there's less than 10ms to render the whole frame. Normally not too much of a problem but we aren't allowed a single glitch or dropped frame so we've got to err on the side of caution.


IMHO, a decend video PCI board API should not make you wait for the transfer, you should issue the command, and let it DMA on its own without blocking your process. Sadly, GL doesn't allow for this. But for the PCI, it should be in parallel.
Lucky for us, we don't have to use the readback method at all...

nexusone
02-15-2002, 06:20 AM
You do know about the Newtek video toster card for the PC? Turn's a PC into a real time video editing machine/genlock/etc.
It has been around for some time, lot's of TV stations use it for putting text/graphics on the screen for live video feeds.


[QUOTE]Originally posted by cix>foo:
[B]chris, that's interesting; we've got a Radeon kicking about so we could try it out. Say, why have you not got a GL card which will output YUV and a separate linear key (eg. the alpha buffer) as a composite signal? You do realise the broadcast market would practically die from heart attacks if you did? (Needs genlock and SDI option too mind) Only Matrox have an offering that does this (CG2000) but to the best of my knowledge it has no GL support, just D3D.

cix>foo
02-15-2002, 11:12 AM
Nexusone,
I do know about the video toaster etc. and indeed most other proprietry solutions in the industry after spending a hell of a lot of time researching! The point we're trying to get across to the BBC is to stop spending all this money on weirdo proprietry stuff and go with an open API. Starting with OpenGL, running on Java; OpenML later, when someone actually releases hardware that supports it.

SGI should be quaking in their boots; the days of expensive proprietry hardware are numbered.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

nexusone
02-15-2002, 11:45 AM
That is how the video toster came about, TV stations needed a low cost solution for video graphics and overlay.

I wish you luck, that is now company's get started with an idea.


BTW my major is Electronics engineering, computers and graphics is a hobby. But one project a few years back, I designed a real time video frame graber.



Originally posted by cix>foo:
Nexusone,
I do know about the video toaster etc. and indeed most other proprietry solutions in the industry after spending a hell of a lot of time researching! The point we're trying to get across to the BBC is to stop spending all this money on weirdo proprietry stuff and go with an open API. Starting with OpenGL, running on Java; OpenML later, when someone actually releases hardware that supports it.

SGI should be quaking in their boots; the days of expensive proprietry hardware are numbered.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

cix>foo
02-15-2002, 01:40 PM
If you can figure out how to tweak a geforce so we can genlock it we'd pay you a lot of money http://www.opengl.org/discussion_boards/ubb/smile.gif

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

nexusone
02-15-2002, 02:24 PM
You would have more luck using a ATI Radeon All-n-wonder card. ATI has been big in the video playback and processing chip sets.
geforce is more targeted for the 3D gaming market.


http://www.opengl.org/discussion_boards/ubb/smile.gif


Originally posted by cix>foo:
If you can figure out how to tweak a geforce so we can genlock it we'd pay you a lot of money http://www.opengl.org/discussion_boards/ubb/smile.gif

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

cix>foo
02-16-2002, 03:30 AM
I really couldn't care less which hardware it was so long as it was under a grand and it had working, reliable GL drivers for more than one OS.

And I'd pay £2k if it had a composite linear key output and YUV.

And £3k if it had SDI.

Then I'd buy about, ooh, forty of them. To begin with.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

Husted
02-17-2002, 11:00 PM
Originally posted by cix>foo:
If you can figure out how to tweak a geforce so we can genlock it we'd pay you a lot of money http://www.opengl.org/discussion_boards/ubb/smile.gif

The NVIDIA boards (Quadro2 Pro) that SGI used in their Graphics Clusters were able to synchronize [with something called ImageSync] - But I'm not really sure if this would be accurate enough for TV production. You could take a look at 3DLabs Wildcat II/Wildcat III - they come with an genlock option.

-- Niels

cix>foo
02-18-2002, 12:39 AM
The Wildcat's sadly useless as it costs £2,500 and doesn't have a separate linear key output, so we'd probably just be better off using a cheapo Geforce and doing the framebuffer copy to a genlockable PCI card. Which is what we're attempting.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

nexusone
02-18-2002, 09:35 AM
I went back to double check some spec's on the ATI Radeon. It has a built in vedio overlay system(genlock), with multiple video output options. So in real time you can render your 3D graphics over a incoming video signal. With both PAL/NTSC support.

They also have a SDK for video editing and play back which look's like it could give you a good starting point.

Originally posted by cix>foo:
The Wildcat's sadly useless as it costs £2,500 and doesn't have a separate linear key output, so we'd probably just be better off using a cheapo Geforce and doing the framebuffer copy to a genlockable PCI card. Which is what we're attempting.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif



[This message has been edited by nexusone (edited 02-18-2002).]

cix>foo
02-18-2002, 10:09 AM
Unfortunately we can't take an input; we're only allowed output. This ain't cable ya know http://www.opengl.org/discussion_boards/ubb/smile.gif

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

nexusone
02-18-2002, 10:43 AM
From this post I am lost at what you are trying to do.
If you want to do something with a gen-lock type feature, normaly it is to mix two diffent video signals together.
If you don't need a video input, then there is no reason to use two video cards, since you are just outputing a video image.




Originally posted by cix>foo:
Unfortunately we can't take an input; we're only allowed output. This ain't cable ya know http://www.opengl.org/discussion_boards/ubb/smile.gif

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

nexusone
02-18-2002, 10:46 AM
Here is the information on the ATI video processing features: (Genlock, etc.)

http://www.ati.com/na/pages/resource_centre/dev_rel/atirdv.pdf

henryj
02-18-2002, 11:21 AM
nexusone
TV is a bit different. It's not a matter of sync 2 video sources, but outputing your video in sync with a timing pulse that is common to all station devices. The sync is for frame timing, so you don't get roll, and colour so that your colours match.

cix>foo
Have you tried a video overlay card and a 3D card and doing like to old 3DFX cards used to do. Take the output from the 3D card and plug it into the input on the overlay card. It'll be analogue but it might work.

nexusone
02-18-2002, 12:30 PM
I understand how video sync works!
What I am now confused by is what he is trying to do.

If you want sync, alpha or any other signal, you could access it right off the video processing chip on the ATI.

Say on the ATI chip set, they use a second chip to preform the PAL/NTSC/SECAM conversion. Just grab the signal before this chip. I don't have the spec's, but from what I have seen the chip has the following: RGBA, YUV listed so you could grab these at the chip. Also you can get the H-sync and V-sync there also.

Another feature of the ATI card's gen-lock is the color filtering, it does process both sources color information to make sure the output of the combined sources is correct.

Also since all the video timming information came be programed in the ATI's video hardware, I don't see a ext. sync source would not be a problem.


Originally posted by henryj:
nexusone
TV is a bit different. It's not a matter of sync 2 video sources, but outputing your video in sync with a timing pulse that is common to all station devices. The sync is for frame timing, so you don't get roll, and colour so that your colours match.

cix>foo
Have you tried a video overlay card and a 3D card and doing like to old 3DFX cards used to do. Take the output from the 3D card and plug it into the input on the overlay card. It'll be analogue but it might work.

jwatte
02-18-2002, 01:26 PM
cix,

I understand why you'd think the ATI consumer input converters aren't broadcast quality, but they'd be good enough to hitch a ride on house blackburst, no? Assuming the consumer TV out converters are at all useable, that is.

Set a high black level on your GL image, then go crazy on outboard mixing gear as much as you want :-)

Husted
02-19-2002, 12:07 AM
Originally posted by cix>foo:
The Wildcat's sadly useless as it costs £2,500 and doesn't have a separate linear key output, so we'd probably just be better off using a cheapo Geforce and doing the framebuffer copy to a genlockable PCI card. Which is what we're attempting.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

Could you please recommend any cheap video cards for this?

-- Niels

cix>foo
02-19-2002, 05:21 AM
Quite an interesting topic this one eh?
We're attempting to use a Matrox Digisuite for the output. It's got SDI and YUV output, separate linear key from the alpha buffer, and genlockable. It's not a great solution because we're going to lose maybe 10ms rendering time copying the framebuffer about. It may not work yet - I start on that tomorrow. Buggers charged us £700 for the SDK as well, which is just bloody criminal of Matrox. We only want one function call.

As for ATI - well, if anyone knows exactly how to put a straight old 50hz genlock sync into it, and get YUV out of it so we don't have to scan convert, and a *separate* output which is (preferably) the framebuffer alpha but otherwise could be the dualhead display output, *without* a degree in electronics, should mail me directly. If you make it work you will have earned yourself a lot of money; trouble is you have maybe 2 weeks to get a working board to me in Guildford.

We tried a luminance key but, well, black is black and when you get down to the dark reds the mixing goes into noise. Besides we need a linear key so we can make bits more transparent than others, not a luminance key.

Anyone who doesn't know exactly what we use genlock for in broadcast, know this: we use it to sync the entire studio, not just one card and an incoming picture. The entire studio goes through one big Sony digital mixer. We supply two graphic feeds: one for "supers" - that's the transparent overlays we need the keys for really, such as captions; and one for fullforms, which take up the whole screen.

There's no way they'd entrust our crappy PCs to mix the output of a live programme; all our remit is, is to produce graphics and transparency keys, and that's it.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

cix>foo
02-19-2002, 05:24 AM
oh yeah; if I tried to put the TV out from the ATI on air I'd probably be shot by the producer http://www.opengl.org/discussion_boards/ubb/smile.gif

BTW the Digisuite costs about £3500 for the SDI version I think. In other words, waaaay too expensive. But we're running out of choices.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

nexusone
02-19-2002, 06:57 AM
What about the ATI S-Video output or the Digital video output?

You say you have talked to Matrox, how about to ATI? They are supporting open source, by providing tech data for the people writing Linux drivers for their cards.

I bet you would get their SDK without any charge, I am sure they are open to push into other areas.


Originally posted by cix>foo:
oh yeah; if I tried to put the TV out from the ATI on air I'd probably be shot by the producer http://www.opengl.org/discussion_boards/ubb/smile.gif

BTW the Digisuite costs about £3500 for the SDI version I think. In other words, waaaay too expensive. But we're running out of choices.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif



[This message has been edited by nexusone (edited 02-19-2002).]

Devulon
02-20-2002, 03:20 AM
I completely agree, don't use readPixels and don't use memcpy. what you want to do is set up a 32 mem move. doing 32 bits a move instead of 8. That in itself is a huge bump in speed. I used to have a 32 memcpy embedded in my old 16 bit dos apps. Those were fun days. basically set up (forgive me if this is wrong) edi and esi as dest and source. setup for forward copying and rep mov the data. Oh yeah i believe its ecx that gets the size (number of 32 bit longs). Something like that I dont' remember the exact x86 assem code. But you can find the reference on intels web site.

jwatte
02-20-2002, 12:02 PM
Devulon,

The implementation of memcpy() uses rep movsd, which is an "accelerated" 32 bit move.

Unfortunately, it will pollute your cache if you're working with cacheable memory, but when it comes to frame buffers, it's pretty much as efficient as you can get. Intel has special hardware in their chips to make it Do The Right Thing (tm) in that case.

I believe the main problem is that cix has very specific output signal needs, and there aren't any cheap cards around that fulfill these needs. In general, the difference between "pro" and "consumer" gear is OFTEN mostly in the connectors, and the level of care taken to implement things like buffer drivers and stuff. Witness XLR vs 3.5-millimeter plugs for sound cards as an obvious example.

He thinks $3500 is too expensive for a Digisuite? Well, that would be true, if there's actually some piece of hardware that does the same thing, cheaper. Haven't seen it yet :-)

cix>foo
02-20-2002, 02:52 PM
Bad news for frame buffer copies; it ain't fast enough: 17fps to the Digisuite and that's with no rendering...

Good news for all else concerned: hidden away in Matrox's product lineup is the CG2000. This has every conceivably useful output, costs not too much, and plugs directly into a G450 using a funny little ribbon cable.

More news on this when we get our hands on one, probably Monday. Stay tuned.

I hope Matrox's drivers aren't still ****.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

marcus256
02-21-2002, 06:51 AM
I may be way off here (not sure how the CPU/PCI/AGP buses cooperate/conflict), but wouldn't it be possible to parallellize the copying, so that reading and writing can run more or less simoultaneously?

Idea 1: 1 CPU, 2 threads
Thread 1 - Read from GL card
Thread 2 - Write to display card

If we have 2 frame buffers in "RAM" (the in-between-buffer"), it would be possible for the GL card to write to one buffer while the other thread reads from the other (old) buffer. (that is, if you can afford 1 frame of delay).

Possible problem 1: don't know if the CPU is free while doing glReadPixels (maybe it is on the Radeon, but not on the GeForce?). Can maybe be solved with a dual CPU board?

Possible problem 2: the buses may be choked already by only reading or writing, so doing both at the same time is not possible.

cass
02-21-2002, 07:55 AM
Cas,

Unless you're doing some strange packing or other pixel transferism, you should get good perf. Can you email me exactly what you're doing?

Thanks -
Cass

jwatte
02-21-2002, 09:43 AM
Marcus,

The bottleneck is not the CPU. The bottleneck is the various pathways between devices in the system, i e the PCI bus (video capture), AGP bus (graphics card) and, most importantly, memory bus.

Cix,

When you got 17 fps, what kind of system was that on? I e, what busses were the cards using, what chip set and bios, and what memory? (See my previous post for what a reasonable target might be)

cix>foo
02-21-2002, 01:54 PM
jw, I'm afraid I don't know what BIOS is in the machine off the top of my head (it's 130 miles away) except that it's a brand new modern system in a swishy black case http://www.opengl.org/discussion_boards/ubb/smile.gif

Reading into AGP RAM from the GF3 was very fast but then of course incredibly slow to copy out to the Digisuite; reading into system RAM, the framerate plummets to 17fps. Reading directly into the Digisuite's framebuffer gives us 17fps as well.

Also looking at the possiblity of finding the magic genlocking connectors in the ATI Radeon 8500 All In Wonder thing. Perhaps just feeding it an input signal will do? Then at least we can use dualhead and a pair of cheapo scan converters.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

cix>foo
02-21-2002, 02:38 PM
and while I'm here, where do I get the headers and extension specs for ATI's drivers, anyone?

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

jwatte
02-21-2002, 02:52 PM
The ATI web site has a developer section. You can get public specs from there. To get actually useful drivers, you gotta be in the developer program, though; you can register on the site. I've found them to be friendly, helpful, and having a lot to do, so response latencies are high.

FXO
02-21-2002, 03:00 PM
You can get the ATI OpenGL headers at:
http://www.ati.com/na/pages/resource_centre/dev_rel/sdk/RadeonSDK/Html/Info/Prog3D.html

nexusone
02-21-2002, 04:10 PM
I sent them a e-mail about the project to see if they where interested? Have not got a reply yet....



Originally posted by cix>foo:
and while I'm here, where do I get the headers and extension specs for ATI's drivers, anyone?

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

Sundy
02-22-2002, 08:27 AM
Unless OpenGL has some mechanism to access the framebuffer directly, there is no way to read it "fast", this has to be done in software unless NVIDIA or ATI wants to add some special hardware to copy frame buffer contents to system memory. Even in that case, transferring from video to system memory would be the bottleneck.

I am sure that this is one of the biggest limitations of OpenGL, if you want to really read back the frame buffer or perform some operations on it, use DirectX instead.... or wait for OpenGL 2.5 or sumthin, hope the ARB is looking at this is one of the most important needs of OpenGL....

-Sundar

farid
02-24-2002, 05:34 AM
Hi,
5d's Cyborg system use Wildcat for OpenGL rendering and DVS's SDstation for realtime
video output. As far as I know DVS (www.dvs.de)offer API for Win/Linux/IRIX and
a quality of the SDI board itself is exellent. Bit pricy though (~5K UK Pounds).

5d do exactly what you want to do whith their Cyborg. They use win2k,opengl and dvs
board. You can see on the broadcast monitor what is going on on you computer monitor with realtime update.

farid.

marcus256
02-25-2002, 12:32 AM
cix, what kind of memcopy are you doing? 32-bit?

I think that you can get a speed increase if you do 64-bit transfers (cast to double precision float vectors). Isn't there a "block move" instruction in the x86 instruction set? I think the Motorola 68040 had a MOVE16 instruction (move 16 32-bit words). [Come to think of it, even on the 68000 you could load or store 16 32-bit words with one instruction]

Otherwise SIMD instructions could even be faster (?), since they use 128-bit registers. Not sure, I have not benchmarked these things.

Eric
02-25-2002, 01:11 AM
On the 68000, you didn't really copy blocks with the 32-bit move like:




loop:
move.l (a0)+,(a1)+;
dbf d0,loop;


Instead, you did:




loop:
movem.l (a6)+,d1-d7/a0-a4;
movem.l d1-d7/a0-a4,(a5);
lea 48(a5),a5;
dbf d0,loop;


If you were clever, you could even use the a7 register (well, that was the stack so you needed to be careful with the ssp and interrupts if you were in supervisor mode !)...

Actually, on the Atari ST and Amigas, you used to pre-generate the code with loads of movem to save on the dbf... On the Atari Falcon 030, you could leave the dbf and keep the loop in the cache.

These were great days ! http://www.opengl.org/discussion_boards/ubb/wink.gif

Regards.

Eric

marcus256
02-25-2002, 02:39 AM
Originally posted by Eric:




loop:
movem.l (a6)+,d1-d7/a0-a4;
movem.l d1-d7/a0-a4,(a5);
lea 48(a5),a5;
dbf d0,loop;



Yes, that was what I meant. It could also be used for clearing memory - I think it was even faster than the hardware blitter on the Amiga 500. http://www.opengl.org/discussion_boards/ubb/wink.gif


These were great days ! http://www.opengl.org/discussion_boards/ubb/wink.gif

...when you could actually understand assembly language, 16 32-bit GPRs, flat memory addressing, memory mapped I/O, etc, etc! Someone should be punished for the x86 ISA (not just us coders).