WGL_ARB_render_texture gives bad performance?

Ok, now this is weird. When I use WGL_ARB_render_texture in a cubic environment mapping sample for my engine, I get about half the frame rate than when simply rendering into a pbuffer and doing a glCopyTexImage.

I only noticed this when I ported my sample app to Linux, where GLX_ARB_render_texture is not available (why?), so I had to copy the image from the frame buffer to the texture. After that I also changed the Windows version of my sample to not use WGL_ARB_render_texture…and the frame rate doubled all of a sudden.

So why is this? Even if the driver can’t do anything else than copy the image instead of using the color buffer of the pbuffer directly, it shouldn’t be slower than my app performing a manual copy? Am I missing something here?

I’m using a GF4 with the 40.41 drivers.

[This message has been edited by Asgard (edited 09-27-2002).]

Just installed the 40.71 drivers. Same thing. Bad performance with ARB_render_texture.

Another note for the NVIDIA driver guys: Calling wglCreatePbuffer with the attributes parameter set to NULL (which is allowed according to the spec), gives an access violation in NVOGLNT.dll. Just modified the simple_render_texture example, and it exhibits the same problem.

And another thing: Please please please, for all us Europeans who have PAL as TV standard here, add the 768x576 and 720x480 display modes to nv4_disp.inf in the drivers. Now that they finally support overscan for the TV out, these modes give the best quality on PAL TVs (768x576 is the native PAL format and 720x480 is used for PAL60). Until now I always had to hack the inf file myself, which is kind of annoying…especially seeing that you guys pump out a driver release every few weeks (which is a good thing, but… ).

I have the same problem. In my “Shadows that don’t suck” demo (available here ) which depends heavily on rendering to cubemap I get 90fps on a Radeon 8500 in default windowed mode. Under then same conditions a GF3 with 30.xx drivers gets like 10-15 fps, and with 40.xx driver 20fps. The Radeon also scales as expected in different resolutions, the GF3/4 on the other hand have very similar performance under all resolutions, which rules out that it’s a fillrate issue.

It is something of a known issue (though word hasn’t gotten around that far) that nVidia’s drivers don’t do well with ARB_render_texture. For whatever reason, their driver performance is slower than a copy operation.

I get 30fps on me gf4 4600, with 40.52.

Thats quite poor.

80fps with Radeon 8500 !

Humus you should’ve read this before using render to texture.
http://opengl.nutty.org/forum/viewtopic.php?t=19

  1. RenderToTexture extension.

PROS:

i) Renders directly into texture, supposedly the fastest.

CONS:

i) I’ve heard ppl say it’s the slowest of the lot.

ii) Not supported very widely.

hehe …

I thought NVIDIA fixed the slow nature of ARB_r_t. Because I remember when they released the 40.xx drivers PH said that ARB_r_t ran as fast as glCopyTexSubImage2D for the first time. Wierd.

-SirKnight

Because I remember when they released the 40.xx drivers PH said that ARB_r_t ran as fast as glCopyTexSubImage2D for the first time.

That’s not a fix; it’s just an improvement. A fix would be Render texture running faster than a copy, like ATi’s. After all, there shouldn’t be a copy.

I did not say a complete fix. I said fixed the slow nature of ARB_r_t. Which means it’s not slower than glCopyTexSubImage2D; it’s the same speed, accroding to PH.

-SirKnight

Yes, it did improve to the point where it was as fast as CTT for 2D textures. I haven’t tried rendering-to-cubemaps, so that might be the problem. I suspect RTT to be implemented as CTT in the NVIDIA 40.xx drivers. I can’t prove that but I’m almost certain that’s the case ( at least on GF3 for 2D textures ).

Also, the case that I have tested did not use autogenerated mipmaps.

[This message has been edited by PH (edited 09-27-2002).]

I suspect RTT to be implemented as CTT in the NVIDIA 40.xx drivers. I can’t prove that but I’m almost certain that’s the case ( GF3 at least ).

Well that would explain why they both run at the same speed now. If they do do this, it’s an ok hack untill its completely fixed. Of course I don’t want to say yes this IS what they do, b/c if not that would be bad. Any NVIDIA people around able to tell us if this IS what they do or not?

-SirKnight

I don’t think it’s going to change for the GF3/4 ever. Matt has already stated why RTT could be slower than CTT. I don’t remember the exact details but NV_texture_rectangle textures were supposedly more efficient for the hardware.

Originally posted by PH:
Matt has already stated why RTT could be slower than CTT.

But I don’t get that. Can’t the driver just do CTT internally? Then it would be equally fast (which is good enough for me right now, seeing that with RTT my sample drops to half the frame rate of CTT).

[This message has been edited by Asgard (edited 09-29-2002).]

I suppose they could do that for cubemaps too ( if they in fact do a copy ). Perhaps the ARB_render_texture spec is worded such that they cannot efficiently implement rendering to cubemaps.
You’ll have to ask NVIDIA for the correct answer though.

Hey, all, I’ve checked with the driver team and there are a couple of issues that were hurting RTT performance. These issues are being corrected as we type, and should be available in an upcoming driver.

Thanks -
Cass

Am i alone to find that scary ? Because it implies nobody really cared seriously about RTT’s implementation in OpenGL before… and you make it sound like it was an easy fix…

Y.

Well, I think it’s good that it gets fixed. I’m using RTT a lot and it annoyed me quite a bit that the GL extension for RTT is slower than doing CTT myself on NVIDIA cards.
The sad thing is, however, that all my RTT samples still run a whole lot faster in DirectX 8 (I’m working on a platform- and API-independent engine, so basically the code for my samples is the same for OpenGL and DirectX; other samples not using RTT give pretty much equal performance).