The performance of shadowmap.

i use the same size of shadowmap texture(512X512).
first, i create the shadowmap with glCopyTexSubImage2D, the FPS is 330.
then i switch to pbuffer, but the FPS is 270.it is puzzling, because pbuffer eliminates the bus requirement of transfering the depth data.
any suggestions?
thanks

it is puzzling, because pbuffer eliminates the bus requirement of transfering the depth data.

In theory. Some implementations are unable to implement pbuffers in hardware and must internally do a copy.

The more likely problem is that the expense
of changing contexts and/or drawables costs more than the copy.

Thanks -
Cass

I found render-to-texture PBuffers to have a lower fillrate than windows, at least on NVidia hardware.

Originally posted by cschueler:
[b]
I found render-to-texture PBuffers to have a lower fillrate than windows, at least on NVidia hardware.

[/b]

If you’re rendering exclusively to pbuffers (that is, not calling MakeCurrent every frame), I would be surprised to see a difference. However, it does depend on the driver version - this feature does generally get better perf over time.

Originally posted by cass:
The more likely problem is that the expense
of changing contexts and/or drawables costs more than the copy.

I’m somewhat shocked by that. I knew that changing contexts was slow but I didn’t realized it was so slow.

Is there a possibility context switched will be faster in the future? I heard the last VPU from 3Dlabs (P10) got something to accelerate this task. I don’t know if this is true and if it gives serious advantages, but I hope this capability will be improved.

Originally posted by Obli:
[b] [quote]Originally posted by cass:
The more likely problem is that the expense
of changing contexts and/or drawables costs more than the copy.

I’m somewhat shocked by that. I knew that changing contexts was slow but I didn’t realized it was so slow.

Is there a possibility context switched will be faster in the future? I heard the last VPU from 3Dlabs (P10) got something to accelerate this task. I don’t know if this is true and if it gives serious advantages, but I hope this capability will be improved.

[/b][/QUOTE]

The context switching overhead could be reduced. Some of it’s due to the general driver design assumption that it doesn’t happen terribly often.

ARB_superbuffers should provide a significantly lighter-weight mechanism for offscreen rendering within a single context. That will hopefully be a more attractive API and more efficient to use.

Thanks -
Cass

Cass, is the ARB specification for ARB_superbuffer already approved and available? When there will be drivers available that support this extension?

Kosta

Originally posted by cass:
If you’re rendering exclusively to pbuffers (that is, not calling MakeCurrent every frame), I would be surprised to see a difference. However, it does depend on the driver version - this feature does generally get better perf over time.

I tested this with a minimal setup: Render everything into a pbuffer and then one final texture blast onto the screen. (So there’s one context change per frame)

Not only is glClear() slower by an order of magnitude, but when I move the camera close to objects so triangles get large, the framerate drops more than when rendering directly into the screen.

All this on a GeForce2 MX, with detonator 45.xx

If it would be approved, we would have known about it… It looks like frozen - no news for a month!!! I’m going to the damn net avery day only to see if it’s already there… but …

Originally posted by cschueler:
Not only is glClear() slower by an order of magnitude

That’s strange, you’re the second one to complain about slow clears for pbuffers(see beginners forum). I wonder why a pbuffer clear would be that much slower(unless its a 128bit pixel format)?

Originally posted by cschueler:
[b]
I tested this with a minimal setup: Render everything into a pbuffer and then one final texture blast onto the screen. (So there’s one context change per frame)

Not only is glClear() slower by an order of magnitude, but when I move the camera close to objects so triangles get large, the framerate drops more than when rendering directly into the screen.

[/b]

You mean two context changes… (Back and forth)

How are you getting the data into the texture - render to texture or CopyTexSubImage2D? Have you tried just rendering the scene to the pbuffer without copying it to the window? (ie. just do the context change, maybe display a frame rate but don’t actually transfer the data from the pbuffer).

Maybe it’s the transfer that is slowing you down and not the actual rendering?

Originally posted by rgpc:
[b] How are you getting the data into the texture - render to texture or CopyTexSubImage2D? Have you tried just rendering the scene to the pbuffer without copying it to the window? (ie. just do the context change, maybe display a frame rate but don’t actually transfer the data from the pbuffer).

Maybe it’s the transfer that is slowing you down and not the actual rendering?[/b]

I render directly into the pbuffer.
I could try your suggestion, but to make a point, I also see performance scale badly with large triangles and overdraw when rendering into the pbuffer.

I suspect the swizzled memory layout of pow-2 textures to be responsible for the poor rasterizing performance.

Sidenote: Do you have NVSDK and the NV effects browser? There’s a pbuffer example, and even this pbuffer examples never gets above 20 fps or so, even on new cards.

Originally posted by cass:
[b]…
ARB_superbuffers should provide a significantly lighter-weight mechanism for offscreen rendering within a single context. That will hopefully be a more attractive API and more efficient to use.

Thanks -
Cass
[/b]

That’s good news however I am somewhat curious about compatibility of that functionality. I am not really worried since i’m targeting mainly NV3x and RD3xx hardware however if it runs even on lower end hardware, much better.

Obli, I guess it should run on a older hardware. It is just another solution of memory management. I assume it would be the VBO - like situation.

Originally posted by Zengar:
Obli, I guess it should run on a older hardware. It is just another solution of memory management. I assume it would be the VBO - like situation.

Wow, I hope you’re right. This would be a real “final solution”
Can’t wait for them.

It looks like frozen - no news for a month!!!

There was no news for some time since the GDC when superbuffers was first revealed as an idea. Since then, the only real news we had was a Power-Point presentation by ATi. The spec they presented had pretty decent basic functionality, but it was too restrictive in terms of which buffers you could use in a frame buffer (everything having to be the same size, etc), which other API’s don’t require.

Originally posted by cschueler:
I render directly into the pbuffer.

So you use Render to texture (not copy)…?

Originally posted by cschueler:
[b] Sidenote: Do you have NVSDK and the NV effects browser? There’s a pbuffer example, and even this pbuffer examples never gets above 20 fps or so, even on new cards.

[/b]

I certainly get more than 20fps in the pbuffer examples (not sure of the actual figures). My own project uses pbuffers for dynamic cubemaps and gets 100-200fps on a geforce 3 and 600-800fps on a 5900 when using Render to texture. The same project gets around 150fps on the 5900 if I use pbuffers and copy.

Not only is glClear() slower by an order of magnitude, but when I move the camera close to objects so triangles get large, the framerate drops more than when rendering directly into the screen.

All this on a GeForce2 MX, with detonator 45.xx

You’re obviously fill bound. Or to be more precise, memory bandwidth bound. The GeForce 2MX has very little bandwidth, it’s basically memory bandwidth bound at 640x480 in 32 bit colour (at lest my old one was). Try decreasing colourdepth to 16 bits and I’m willing to bet you’ll see a large increase in framerate.

Originally posted by harsman:
You’re obviously fill bound. Or to be more precise, memory bandwidth bound. The GeForce 2MX has very little bandwidth, it’s basically memory bandwidth bound at 640x480 in 32 bit colour (at lest my old one was). Try decreasing colourdepth to 16 bits and I’m willing to bet you’ll see a large increase in framerate.

Sure, the MX has litte fill, but my point is, rendering into pbuffers gives even less fill.