View Full Version : Render to Texture vs. CopyTexSubImage

12-01-2003, 05:58 AM
I made a very small test, and contrary to what I expected, Render to Texture with context switches seems to be faster !

Would you mind testing it and posting your fps/hardware setup here ?

The two programs are located here :

Thank your for your feedback,

[This message has been edited by ZbuffeR (edited 12-01-2003).]

12-01-2003, 06:33 AM
Radeon 9700 Pro -- CoD HotFix drivers (Cat 3.9+)
AMD AthlonXP 2100

FRAPS reports approximately 211 fps on average for the CopyTexSubImage version. The P-Buffer version says it can't find the entry points for WGL_ARB_pbuffer. Oddly enough, the extension string is missing from the drivers I'm using. . .

12-01-2003, 06:42 AM
With older driver render-to-texture was slower. With Forceware i get 10-20% better performance.

12-01-2003, 07:07 AM
Thanks for your feedback.

to Ostsol:
Its impressive, 211fps for CopyTexSub. And for the render to texture, I 'll soon improve the extension detection mechanism.

I didn't made it clear, but fps are written on a text console behind the GL window.
You just have to drag the gl window a bit, so you can see the fps. (I don't know for sure what FRAPS calls a 'frame').

Furthermore, could you test what fps increase you get by turning off final render (hit B) ?


12-01-2003, 07:27 AM
Ah. . . Well, it looks like FRAPS is pretty accurate, as it agrees with the value in the console window. Pressing 'b' increases the framerate to around 600 fps.

[This message has been edited by Ostsol (edited 12-01-2003).]

12-01-2003, 07:39 AM
to Ostsol :
I've udated the detection of Render to Texture, you may try it now. I still hardly believe your 200+ fps ;-) . When I found only 5 fps on my gf3, I wanted to drop it... but if it works so well on radeon 9700, I may keep it.


12-01-2003, 07:43 AM
Ah. . . Works perfectly, now. I get 108 fps normally, then 220 after pressing 'b'. Quite a bit lower than the glCopySubImage version. . . It looks like the p-buffer version uses a much higher resolution render target, though, so perhaps the performance difference isn't so strange.

12-01-2003, 07:45 AM
Radeon 9700 Pro, Athlon XP 2500+

Render texture: 110FPS, pressing b: 220FPS
CopyTexSubImage: 211FPS, pressing b: 780FPS

12-01-2003, 07:57 AM
Thank you all for your figures.
Yes, the CopyTex version is 128x128 instead of 512x512, because on my setup the full res CopyTex was soooo crawwwwwliing.... I will post a high res version later this week.

Anybody with other video cards ?


12-01-2003, 09:35 AM
P4 2.4GHz / GeForce FX 5900 Ultra

pbuffer : 116 fps / 'b' : 214 fps
copytexsubimage : 6 fps / 'b' : 6 fps

I was saying to myself that my own copytexsubimage were so slow ... is there a trick ?


12-01-2003, 09:37 AM

rtt: 55/113(b)
copy: 7/7(b)

12-01-2003, 10:01 AM
copy to texture from pbuffer can be slow.. why dont you copy to texture from backbufffer instead?

12-01-2003, 01:32 PM
Originally posted by Mazy:
copy to texture from pbuffer can be slow.. why dont you copy to texture from backbufffer instead?

Good idea, but I did not know that it would make any difference, from a back- or pbuffer. It will try that, thank you.

EDIT: spelling...

[This message has been edited by ZbuffeR (edited 12-01-2003).]

12-02-2003, 02:13 AM
ZBuffer : the difference will show up on my 6 fps, it will rise maybe to 400 ...


12-02-2003, 03:19 AM
So, I made a new version following Mazy's advice (copying texture from backbuffer). You can test it here:

It is the third program.
However, my Geforce3 still prefer the render to texture :

Render to texture: 37 FPS
CopyTex from pbuffer: 6 FPS (128x128)
CopyTex from backbuffer: 30 FPS

And you ?

12-02-2003, 05:19 AM
Dual Athlon 2.4 Ghz, 2 Gb RAM, Radeon 9700 Pro:

Render to texture: 107 FPS / 170 FPS (B)
CopyTex from pbuffer: 212 FPS / 430 FPS (B)
CopyTex from backbuffer: 90 FPS / 153 FPS (B)


12-02-2003, 05:25 AM
I would like to see the source of that one,
I have never succeded to not have faster copy from backbuffer to texture than anything else http://www.opengl.org/discussion_boards/ubb/smile.gif

12-02-2003, 05:38 AM
anyone tested with fx5200? ... i tested it with fx5200 (forceware 52.16 and amd athlon1.8mhz) and got really low fps ...

1) 6fps/6fps
2) 31fps/41fps
3) 5fps/5fps
(no b key/b key)

not sure why this is so slow ...

12-02-2003, 07:42 AM
a)117 / 223
b)101 / 169
c)240 / 683

Radeon 9800Pro

But the resolition of test C was pretty bad.

12-02-2003, 09:29 AM
Mazy, I guess you changed the order of tests, no ?

Well, I updated the "copy from pbuffer" version so that the texture resolution is 512x512, like the others. My figures:

1) RTT : 37 fps
2) Copy from pbuffer : 0.29 fps (!)
3) Copy from backbuffer : 30 fps

Maybe RTT is the single way to go on Geforces, and Copy from back is to be used on Radeon... Any more tests with a GF FX 5200 ?

12-02-2003, 10:06 AM
heck, what's wrong with my test results then???(using fx5200, forceware52.16, amd1.8mhz) for RTT, i got 6fps while getting 31 fps for copy from buffer(2nd one) ... lol

12-02-2003, 10:19 AM
aha, sorry..


thats my order

but i still think its strange that BB copy is so slow.. sure you dont setCurrent or do any other strange stuff?

[This message has been edited by Mazy (edited 12-02-2003).]

12-02-2003, 01:40 PM
RTT - 120fps
pbuffer copy - 0.32fps
bb copy - 97fps

Which I find curious because RTT is slower in my code (but I had noticed sometime ago that CopyTexSubImage from a pbuffer was completely stuffed - not that that worried me as RTT should always be quicker).

[EDIT] geforce fx5900 Ultra - v52.16

[This message has been edited by rgpc (edited 12-02-2003).]

12-02-2003, 01:47 PM
Originally posted by Mazy:
but i still think its strange that BB copy is so slow.. sure you dont setCurrent or do any other strange stuff?
Mazy, would you mind test the 3 newer versions of the progs ? I'm sure that this time BB copy will be faster than pbuffer copy (equal resolutions now).

Speaking about the code, the bb version is by far the simplest, no wgl stuff, no extensions, very few gl commands. Very portable, but a bit slower than RTT on GF's.

I tried to comment out the glCopyTexSubImage for the BB version, and I merely go from 30 fps to 42, largely fillrate limited. Even if the 12 internal renderings at 512x512 are kept the same, resizing the window to 1024x768 drop the framerate to about 24, and resizing to a very small (still visible) window gives more than 1600 fps.

I still can't believe the GF3Ti200 is pushed to the max with 800x600x12 RGBA additive blended pixels. Only 5.76 mega pixels /s ? Did I made something wrong ?

I will try to post my GL code later, I need to clean it a bit, sorry.

12-02-2003, 01:50 PM
Originally posted by rgpc:
RTT - 120fps
pbuffer copy - 0.32fps
bb copy - 97fps

Could you post your graphic card/system spec please ? (just a guess : GF FX 5900 ? )

I do think RTT can be faster because it does not need to actually copy texture data. As the amount of texture data is quite high (512x512x12 rgb pixels), the copy may be slower that just switching contexts 12+1 times.

[This message has been edited by ZbuffeR (edited 12-02-2003).]

12-02-2003, 02:43 PM
Originally posted by ZbuffeR:
Could you post your graphic card/system spec please ? (just a guess : GF FX 5900 ? )

D'oh! I editted my above post by FYI yes it is a fx5900 Ultra (det 52.16)

I do think RTT can be faster because it does not need to actually copy texture data. As the amount of texture data is quite high (512x512x12 rgb pixels), the copy may be slower that just switching contexts 12+1 times.

It has been said on this board before that RTT is faster when you are dealing with large(r) volumes of data and I think yours is a case where that is so. In my case I am dealing with relatively small volumes so the context switch becomes too expensive.

One suggestion I have (assuming I read your page correctly), why not render 1 texture per frame rather than all 12. You can simply store the last 12 frames in a rotating list of textures (unless of course you are deliberately trying to stress the GPU)?

Oh and copy from pbuffer has always been slower (on NV) but prior to the 50 release (perhaps even prior to 42.xx?) it was useable, unlike what you have found (It's almost like they copy it to system memory and back again - it's that slow).

12-06-2003, 09:45 AM
Pentium 4 2.4
256MB DDR 2100 RAM
INTEL 82845 Graphics Card ( Came with Dell PC)
i couldn't get the first two programs to run on my machine, i don't think the graphics card supports them,
render from backbuffer achieved an astounding 2 fps
these intel cards aren't worth a @#$&%

[This message has been edited by dj_indo_420 (edited 12-06-2003).]

12-06-2003, 03:56 PM
Bring on the crappy mainstream cards http://www.opengl.org/discussion_boards/ubb/smile.gif

Radeon 9200 vanilla (250/200MHz)
Athlon XP2400+

RTT 41 fps
copytexsubimage 25 fps
bb_copytexsub 27 fps

12-06-2003, 06:30 PM
amd athlon xp 1800 + gf4 ti 4600

truemotionblur_RTT.exe 66 fps
truemotionblur_bb_copytexsub.exe 55 fps
truemotionblur_copytexsubimage.exe 0.3 fps

12-07-2003, 08:12 AM
P4 2.8 / 512 MB RAM / GeForceFX 5600 Go / 52.16 ForceWare

truemotionblur_RTT.exe 46 fps
truemotionblur_copytexsubimage.exe 0.24 fps
truemotionblur_bb_copytexsub.exe 35 fps

all testes without window resizing.

[This message has been edited by ScottManDeath (edited 12-07-2003).]

12-08-2003, 02:30 AM
Athlon XP 2100+ + GF4 TI4400 + 512 MB 333MHz DDR + 45.23 (Forceware drivers tend to crash for me)

copytexsubimage: 0.40 FPS
BB_copytexsub: 48 FPS

EDIT: A little more details

[This message has been edited by coelurus (edited 12-08-2003).]

12-08-2003, 03:52 AM
One thing I have noticed with pbuffers is that if you intend to copy them to a texture you should not set the pbuffer as being a render to texture buffer (this attrib: WGL_BIND_TO_TEXTURE_RGBA_ARB). Simply don't set this attrib (or the NV depth one) and I think matters should improve somewhat.


12-08-2003, 07:00 AM
Originally posted by MattS:
(this attrib: WGL_BIND_TO_TEXTURE_RGBA_ARB). Simply don't set this attrib (or the NV depth one) and I think matters should improve somewhat.
It sounded like a good idea, but when I don't set it, everything is the same (sub-1fps framerates with my prog on geforce).
I wonder if NVidia is aware of that, it sounds more like a driver un-optimisation.

12-08-2003, 08:08 AM
Ah, I was so sure that was it. There are some other attributes that I only set if I intend to render to texture. I'll list them below (in pairs). Perhaps one of these is the problem. Try not setting any of these as well. I have noticed that I do get problems with bordered textures, which I find a pain because I use ARB_shadow.

To be clear these are flags when creating the pbuffer. The previous flag (WGL_BIND_TO_TEXTURE_RGBA_ARB) would be used when choosing the pixel format.





[This message has been edited by MattS (edited 12-08-2003).]

12-08-2003, 12:23 PM
That did the trick ! Ya-hoooo ! Heee-haa !
You're a savior man !
Hum, sorry, well, it works now.
To be more precise, simply removing the two following pbuffer parameters make glCopyTexSubImage perform at similar speed (maybe 15% slower) to render-to-texture :
) with :

Indeed, it gives 100 times the speed (from 0.31 fps -> 32fps) !!! Woa, thank you again for bringing such a solution.

(The other parameters you mentioned did nothing WGL_MIPMAP_TEXTURE_ARB,TRUE, and for the pfd: WGL_BIND_TO_TEXTURE_RGBA_ARB )

12-08-2003, 03:58 PM
did i understood it right ? to get optimal (pbuffer-)performance on NVIDIA cards, i should avoid using NVIDIA-specific-extensions/features ???

12-08-2003, 07:01 PM
No, I believe that ZbuffeR & MattS are trying to say that if you don't want to use Render To Texture, don't set the RTT attributes when creating the pbuffer.

I just implemented this in my code and it works as stated. Nice one.

12-08-2003, 10:56 PM
with geforce ti 4600 :

RTT : 67 fps
BB : 52 fps
Copy pbuffer : 0.23 fps

12-09-2003, 12:28 AM
Sorry equentric, I have just updated my webpage, would you mind test the new prog ?
It should work as expected now, I would just like some more Radeon tests.

12-09-2003, 01:28 AM
Glad to be of help. I've found lots of solutions on this forum so it's nice to provide a solution for once.

I've never raised this with nVidia because I wasn't sure whether it would be considered a bug or not. Perhaps they would like to comment....

I'm currently developing on a Radeon 9800 128MB(non-Pro) with Cat 3.9 (not hot fix) on a P4 2.8 with 512 MB memory and these are my results from your new builds.

bb_ctsi 87
ctsi 15
pbuffer_ctsi 19
rtt 105

pressing 'B' roughly doubles the speed on the two high frame rates but has very little impact on the low frame rate ones.

I can't explain why the results are so bad for two of the ctsi, esp. considering that other Radeon owners do not have the same problem. As mentioned earlier I have problems with bordered textures and pbuffers so possibly that is it. It may be a driver issue I suppose. Any advice would be gratefully received.


12-09-2003, 01:40 AM
No snickering now http://www.opengl.org/discussion_boards/ubb/tongue.gif coz this topic is more relevent to me than some of you but... on my gf2 I get these results...

rtt 14/33
pb 11/18.72
ct 0.27/0.27
bb 10/15(but blank)

After the slash is the fps after pressing "b" and "(but blank)" comment refers to the second figure.

I am specifically developing for low-end gfx cards and had not realised until recently that pbuffers were supported on my GF2 card. Even once I did I have ignored them assuming (wrongly I see now) that they would be slow. I'm still rendering to the bb for dynamic textures (anything upto 512x512) and get ~17fps, just about useable for my app (pretty sure the low fps is 50/50 a result of max'ing the card and my overburdoned rendering code). All this is on a 600MHz PIII so I don't expect top-notch performance http://www.opengl.org/discussion_boards/ubb/biggrin.gif @ZbufferR - any chance of posting your rtt code? For anyone who's interested here's my card caps (excuse long post)...

Vendor: NVIDIA Corporation
Renderer: GeForce2 MX/AGP/SSE
Version: 1.4.0

Tex Units: 2
Aux Buffers: 0

OpenGL Extensions:


WGL Extensions:


12-09-2003, 02:05 AM
My low frame rates turned out to be a driver issue. After installing the hot fix drivers my results are now (second figure when 'B' has been pressed)

bb_ctsi 87, 148
ctsi 98, 184 (interestingly pressing v on this one results in an increase to 108, 184 ???)
pbuffer_ctsi 97, 184
rtt 105, 209

This is without resizing the window.


12-09-2003, 07:33 AM
Originally posted by ZbuffeR:
Sorry equentric, I have just updated my webpage, would you mind test the new prog ?
It should work as expected now, I would just like some more Radeon tests.My Radeon 9200 results didn't change.

All of the programs freeze after a few frames on my Radeon 7200, though. What extensions do these programs need?

12-09-2003, 08:10 AM
All of the programs freeze after a few frames on my Radeon 7200, though. What extensions do these programs need?

There is nothing special for the fourth one, it should work without problems (just copyTexSub from backbuffer).

Just pbuffers for the three firsts, and renderToTexture for the very first one.

Beware, if you let another window on front of the GL one, it will seem to freeze, as described in my site. Just drag it a bit to see the fps behind.

If the 4 programs display something and then freeze, it is probably bad programming or bad drivers. It should not be extension-related.

I have only one different case of failure, its on a Quadro FX 500, only the non-pbuffer works, at 1fps, with drivers 52.16

Edit: spelling
Edit2: apparently the quadro results match those of the Intel 82845 Graphics Card described on this forum... ?

[This message has been edited by ZbuffeR (edited 12-09-2003).]

12-10-2003, 12:34 AM
ZBuffer, please post your code.
I never saw any application using pbuffers that ran faster that a simple frame buffer copy.
The result of the test may be heavy implementation dependant.
Even if your code is not as clean as you wish, your result has no interest except for you. And we will all help you make the cleanest code possible.


12-10-2003, 08:37 AM
I never saw any application using pbuffers that ran faster that a simple frame buffer copy.
As I said earlier, highres render to texture must be faster than anything involving a copy.
SeskaPeel, I really admire what you and your team have done,the flight game. (Et bonne chance pour la suite, le jeu video en france c un peu la dèche. Keep up the good work and good luck!)

So i FINALLY posted (most of) my code at my webpage. At last... http://www.chez.com/dedebuffer/

Beware though, apart from the RTT one, I did not worked a lot on them recently. And I messed a bit with the different versions. I started that just as a small test to do motionblur, not to fine tune perfs. I was only surprised to have 100 times slower ctsi than pbuffer.

Speaking about it, I saw Prince of Persia (the new one) and it has a lot of effects lire glare and fast (fake) motion blur. You may check this video, watch carefully the fast camera move near the beginning: http://www.prince-of-persia.com/fr_html/game/trap/gam_trp_1.avi
Even if it is not physically correct (seen it frame by frame), it very nice to see when playing. I am wondering how they do the blur, with many fullscreen quads like me or something clever in pixel programs ?

Thanks a lot for every poster here, you've all helped me a lot already. Feel free to comment further my (poorly written) code.

[This message has been edited by ZbuffeR (edited 12-10-2003).]

12-10-2003, 09:04 AM
ZBuffer, thanks for the feedback about our game.

For the prince of persia glow stuff, I have an idea about the trick. Actually I'm convinced they did it this way. Here it is :

First render your scene, second grab the frame buffer, blur (mip map or gauss or box filter) that grabbed texture, and then render it full screen, above the real frame buffer. They use some way of specifying wich part of the screen should be more or less blurred. I think they grab at different stages in their rendering, and have multiple blurred textures.

The trick is how you render your blurred texture above the real frame buffer. When you blend it the classic way (linear interpolation), you can specify the amount of the real frame buffer, and the amount of the blurred version. When they start a camera animation, they lerp this amount from say 0.1 blurred texture / 0.9 frame buffer up to 1.0 blurred / 0.0 fb. Then, when approaching near the end of the camera animation, they lerp back to the original 0.1/0.9.
So, when the camera moves, you simply see the "blurred version" of the frame buffer. And that makes it ...

Thanks for the code, I'll have a look in that soon.


12-12-2003, 05:03 AM
Dmoc, have you succeded in writing your RTT codepath ?

SeskaPeel, the PoP motion blur is indeed a bit more involved. I recorded avis with FRAPS, and they seem to use 2 different methods.

When the camera moves from normal third-person to first-person (small move), one grabbed rendered frame is put 4 times on the screen, sligthly translated based on camera move.

When the camera moves from very distant viewpoint to third-person, they put the grabbed frame so many times on the screen that it not countable even on screenshots (at least more than 15 times), and they add the blurred version once on top. I am quite surprised that it does not slows down my gf3, because of fillrate.

I have re-read a presentation covering how to do fast texture blur for Tron 2.0 glow. It is better than just mipmaps: maybe it is what you use SeskaPeel? If I blur in only one direction, it may be usable for "insane number of samples" motion blur. I believe that just by using multitexture I can divide by 4 the number of passes I do, and maybe it will reduce fillrate time.

read "Special Effects in DirectX 9": http://developer.nvidia.com/object/GDC_2003_Presentations.html
(beware the pdf is in black&white)

For Pop though, I am still wondering how they managed to avoid screen borders problems. The screen borders are not darker, even with a motionblur length covering half the screen. And even when two opposite sides are very contrasted, there is not apparent bleeding. Maybe GL_CLAMP ?

Well, I have to try out all these ideas, and frame rate will improve.

And sorry for the long post, maybe should I have kept details for my web page...

12-12-2003, 07:53 AM

I'm still convinced they use blurred (mip map blurred) textures when the camera moves. Gauss or box filtering on whole screen ... I suppose the hardware can't do this yet. I admit they may use multiple of them.

For the border problem, ... well I don't see where is your problem. You can have multiple targets to grab the screen at each frame (as instance 25), and when you finally render you can use as many of them as you want, to have a longer or shorter motion blur. You can even tweak the fov of the camera between each frame, so you will have a slight fov blur.


12-14-2003, 12:58 AM
why the pbuffer_copytexsub is so slow in my computer:2.4G+1G DDR+Radeon 9800SE


02-02-2004, 07:19 AM
On a Win2K PIII-800 with a GeForce fx5900, I got frame rates of:
RTT: 103 fps
PBC: 90 fps
BBC: 85 fps

ZbuffeR - is there any chance you could share your backbuffer code? I'm working on some scientific computing applications on GPU, and I have to support Linux and Windows. I've had p-buffer problems in Linux - the backbuffer with glCopyTexSubImage like you're using it might be a stopgap.

Any help would be greatly appreciated. Thanks for the great demos.

Prof. Payne

02-03-2004, 11:43 AM
is there any chance you could share your backbuffer code?

Yeah, sorry, as I was not really interested on the backbuffer version, I totally forgot it.

So, I finally put it, and cleaned up a bit the web page. The code was hacked a lot from old versions and is probably different from the compiled version. But it works.

Hope it will help you.

02-04-2004, 11:57 AM
Athlon XP 2200
Radeon 9700 Pro 128MB
OmegaCorner.com 2.5.14 Catalyst 4.1 drivers

bbc: 92/157
cts: 60
pbc: 102/196
rtt: 109/174

[This message has been edited by Defiance (edited 02-04-2004).]

02-08-2004, 03:25 AM
Here i got a radeon 9600 and amd athlon 2000+ with catalyst 3.1. here are my results:

(no b key / b key)

truemotionblur_copytexsubimage.exe : 59.9 / 59.9
truemotionblur_bb_copytexsub.exe : 54 / 87 (when pressing b the screen appears black??)
truemotionblur_RTT.exe : 62 / 114

02-08-2004, 03:50 AM
P4 2.8C
Radeon 9800 XT with Catalyst 3.1

(no b key / b key)

truemotionblur_RTT.exe: 132 / 263
truemotionblur_pbuffer_copytexsub.exe: 122 / 231
truemotionblur_copytexsubimage.exe: 134 / 231
truemotionblur_bb_copytexsub.exe: 110 / black screen

02-08-2004, 02:37 PM
Athlon XP 2000+
512 MB RAM (333 Mhz)
W2K Pro
Radeon 9800Pro, Catalyst 4.1

with final rendering / without final rendering

truemotionblur_RTT.exe: 114 / 180
truemotionblur_pbuffer_copytexsub.exe: 94 / 156
truemotionblur_copytexsubimage.exe: 99 / 100
truemotionblur_bb_copytexsub.exe: 84 / 130

truemotionblur_bb_copytexsub.exe gives with pressed "b" a very dark screen (nearly black).