which method should I use:RTT vs CopyTexSubImage()

My program combine two textures and copy the result to third texture unit,I think two ways to do it:
-combine the textures in a pbuffer with RTT function,and bind the third tex id with the pbuffer.
-combine the textures in a pbuffer,and then use glCopyTexSubImage2D() to copy the result to third texture.

In first case,I must create 12 pbuffers with RTT function,so I afraid the frequent RC switch(there are 12 pbuffer RCs) will impact the program’s performance,but it is only my guess.

In second,I create 1 pbuffer and 12 textures,but I found the glCopyTexSubImage2D() is not faster than RTT in NVIDIA card.

So which one should I use?It is 12 RTT pbuffers,or 1 pbuffer and 12 textures?

It might help to understand what it is you are actually trying to do, rather than how you are doing it. Please describe your scenario clearly.

I made tests for similar situations, you can read them here :
www.chez.com/dedebuffer

In short, RTT seems to be efficient even with the RC switch.

By the way, your description is a bit unclear : first you speak about 2 textures, then 12 ?

I’m writing a video process engine using OGL,the engine receive some video streams and do effects on them.In my case,I think the pbuffer is the best place for rendering,then there are two ways I can use:
-use pbuffer for rendering,and then use “glCopyTexSubImage2D()” to copy the result to a texture,then the pbuffer can be freed out for next rendering;
-create some RTT pbuffers,and every pbuffer is binded with a tex ID,the pbuffer is not only the video stream input place,but also is the rendering place.For example,I’ll do a video blend effect,the operation can be described in such equation:Ta+Tb=Tc.The Ta,Tb,Tc are all the RTT pbuffer,so I can write video stream to Ta,Tc,and then make the Tc to be current,and do the rendering in Tc.So there is no need the use of “glCopyTexSubImage2D()”,and I found the function is slower in some card,so I prefare the method,because RTT is faster than “glCopyTexSubImage2D()”.

I had described my program,my problem is:in case 2,I must create 12 pbuffers,and every pbuffer's size is 720*576,would them impact the system's performance?How much the cost of switching between them?

Originally posted by ZbuffeR:
[b]I made tests for similar situations, you can read them here :
www.chez.com/dedebuffer

In short, RTT seems to be efficient even with the RC switch.

By the way, your description is a bit unclear : first you speak about 2 textures, then 12 ?[/b]
ZbuffeR,thanks for your reply.I just say the task of my program is combine two textures,but the texture unit in my program is more than 2,because it will do more than one combine.I just want to know,which one in below is faster:
-do rendering in a place,and use “glCopyTexSubImage()” to save the result to a texture,then start a new rendering;
-do rendering in a RTT pbuffer,then switch to next pbuffer to start a new rendering.

A RC switch has a (somewhat) fixed cost, so with large buffers like yours, it will probably be a win to avoid completely glCopyTexSubImage() which depends directly on the resolution.

So try the second option, and only if it is not enougth test the other.

(As a general advice, it is quite difficult to predict performance with hardware accelerated 3D, as it depends on many factors. So program, benchmark, repeat until it is fast enough)