pbuffer + VAR = SLOW

Hi,
I found that render to texture (pbuffer) is very slow. I’n using pbuffer from NVidia SDK 8.0.
Like this (viewport 256x256):

PBuffer *pbuffer;
pbuffer = new PBuffer("rgba");
pbuffer->Initialize(256, 256, false, true);

...

if (enabled)pbuffer->Activate();
RenderScene();
if (enabled)pbuffer->Deactivate();

And just without binding pbuffer, just rendering (the scene is big). with enabled=false (not using PBuffer) I have 150 FPS, and with pbuffer I got 40 FPS.
I know that is should be slower, but not 110 FPS!

What do you think about that?
Maybe I’m doing something wrong?

Thanks for help

Are you using floating point buffers? What graphics card are you using?

As you can see I only use “rgba” with no floats. I took this from NVidia demos (there was something like ati_float=16, but I don’t wank floats.
I just bought GF 6800 so I wanted to see how this great new “Render to texture” thing works…

I made new program (now I used RenderTexture from NVidia SDK 8.0 initialized just with “rgba” parameter):

if (pbuffer_enabled)RenderTex->Activate();
RenderScene();
if (pbuffer_enabled)
	{
		RenderTex->Deactivate();
	}
	else
	{
		glBindTexture(GL_TEXTURE_2D, tex);
		glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
		glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
		glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
		glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
		glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, 256,
			256, 0, GL_RGB, GL_UNSIGNED_BYTE, NULL);

		glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, 256, 256);
	}

glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

	glEnable(GL_TEXTURE_2D);
	if (pbuffer_enabled)RenderTex->Bind();
	else glBindTexture(GL_TEXTURE_2D, tex);

DrawQuad();

Results:
Scene that has single quad (64 tris) -
glCopyTexSubImage2D: 200 FPS
RenderTexture: 140 FPS
Very big scene (couple of thousand tris)
glCopyTexSubImage2D: 50 FPS
RenderTexture: 4 FPS

So where is this great new RenderTexture faster than normal glCopyTexSubImage2D??
What am I doing wrong?
I’m using NV implementation of PBuffer so I think it should be good.

UPDATE:
I found where is the problem. PBuffer doesn’t like when there are thousands of triangles drawn using glDrawElements or similar functions. It works even faster when I draw triangles using simple glVertex3f.
No I’m gonna find where is the bug. Keep your fingers crossed :wink:

UPDATE2:
It work fast if I don’t use GL_VERTEX_ARRAY_RANGE_WITHOUT_FLUSH_NV. I don’t get it. I cannot read vertices from AGP mem because pbuffer works slowly then.
I think that it is a bug in driver or something, because lots of AGP reads causes pbuffer to work veeeery slowly (pbuffer is in APG mem I think).
Or mayby I have to do something to speed it up ?

You need to separately enable the vertex array range in each context, because it’s not shared with ShareLists(). Further, you have to re-establish the array range (using VertexArrayRangeNV()) in each context, although you only need to allocate the memory once.

That being said, the Vertex Buffer Object extension is recommended these days; Vertex Array Range is deprecated for new software.

Thanks! It really helped!

Are you sure about VBO and VAR? I know that VBO is betten on ATI (is there VAR anyway?), but on NVidia cards is VBO faster? I’m not so sure about that.

You know, it helped, but pbuffer still works slower than glCopyTexSubImage2D. I tested for 256x256 … 1024x1024 - for 1024 there is no difference (24 FPS), but for 256 there is big difference: 190 FPS using glCopyTexSubImage2D and 130 FPS using pbuffer.
I use pbuffer with “rgba depth stencil” now.
I don’t know why it works slower.
Using VAR or not doesn’t change anything.

Pbuffers are slower than glCopyTexSubImage2D for smaller textures because of the context switch (which tends to be fairly expensive - especially on nVidia AFAIK). I wouldn’t expect it to be as slow as what you are experiencing (30% or so drop) though.

I’ve had some problems withe the nvsdk 8.0 pbuffer too a while ago. I had to change the pbuffer code from nvidia to get them working at proper speed. I regret to say that I don’t have this code anymore.

but try :
pbuffer = new PBuffer(“rgba texture”);
instead of
pbuffer = new PBuffer(“rgba”);

Here are sone tips from Zbuffer to modify the pbuffer code in order to set up your pbuffer correctly:
http://www.chez.com/dedebuffer/

Greetz,

Nico

Thanks for help.
Of course I had “texture” (I have “rgba depth=24 stencil texture2D”).
This link that you posted is how to make “glCopyTexSubImage2D from pbuffer” as fast as Render-to-texture. And what I’m trying to do is how to make Render-to-texture faster than “glCopyTexSubImage2D from screen(backbuffer)”, because my Render-to-texture (this SDK 8.0 implementation) is slower than simple glCopyTexSubImage2D for 512x512.
In scene where I use only one pass (just one texture) render-to-texture is 1 - 10 % slower, and for scene that has 9 lights (10 passes, 5000 tris) pbuffer has 80 FPS and glCopyTexSubImage2D has 120 FPS.
And I found that binding and rendering pbuffer is not so expensive. I cut the “quad rendering” part from code (so after pbuffer render there is only black screen) and I found that just rendering to pbuffer is much slower than to backbuffer. It is really slower (from 120 FPS to 80 FPS for 10 passes using Cg profile FP30 fragment program (diffuse and specular lighting)).
I just don’t get it… What did you do to your render-to-texture-from-SDK8.0 that it worked well?

UPDATE:
I found something. I have to set VAR two timer: for normal and for pbuffer. I found that if I don’t set for pbuffer then pbuffer works as fast as CopyTex (of course not in the same program, because if I don’t set VAR for normal it works extremly slow). So I think problem is solved.
There is something bad with VAR (so the opiniotn to use VBO maybe great now!).
For one light pass pbuffer is faster from 290 to 400 FPS, and for 9 ligts it is faster but not so much (0 - 10%).

Thanks everyone for your time!