OpenGL depth buffer copy

cedric_guillemet · July 17, 2003, 11:17pm

Hi.
I have to copy the depth information from a buffer to the active zbuffer. glcopypixels is too slow and WGLCreateBufferRegion isn’t available on ATIs but works fine on nVidia. How can I do it on ATI in an accelerated way? Can I do it with the pBuffer?
Thanks for your help.

AdrianD · July 18, 2003, 5:10am

you can use pbuffer and make_current_read with copypixels to accelerate your depthbuffer writing. this works fine with newer radeons (ie.9500pro/9600/9700). but it still sucks on “older” ones like 7000 or 8500.
according to the ATI-DirectX9-Optimization papers, it’s not a good idea to write into the depthbuffer on radeons. because this will disable any hierarchical depthbuffer optimization. and:
“[…] Also most of the RADEON family chips only support swizzled depth buffer formats, which means CPU will have to un-swizzle depth buffer and swizzle it back whenever buffer is locked. This can totally destroy application performace.[…]”

and that’s true. the only conclusion is: you can’t write in to the depthbuffer fast enough on (older)radeons.
but wait, isn’t hardware shadow mapping an enhanced depthbuffering method ? it can be used to simulate a dapthbuffer, when you simply use the same projection/modeview matrix for the depthbuffer and colorbuffer.
(for code see. shadowmapping demos from nvidia-SDK. esp.the 8-bit versions)

you can create a 8 bit-depthbuffer using standard opengl with the tex_env_combine extension. or even 16/24 bit depthbuffering using register combiners for nvidia(ok, they don’t need it because they have good working ARB_buffer_region support ) or using ATI_fragment_shader for radeons.

this will work fine (with some restrictions, but it’s better than no support at all) on evey graphics card which supports projective texturing correcty…
…and there we are again: ie. radeon7000 doesn’t!!!(Kyro-cards also)
(i did not tried how it works when i create the mapping-coords on the cpu)

so when you want to write into your depthbuffer constantly, you have to code at least 3 different solutions for different hardware… and this will still not cover all graphics HW available.

i am currently looking for a 4th way…

zeckensack · July 18, 2003, 5:37am

Surprisingly, today I’ve finally discovered an acceptably performing method to write depth

Basic idea:
Allocate a vertex buffer, same size as you have pixels on screen. Fill in a vertex position for each pixel center (to prevent antialiasing issues).

Draw that with glDrawArrays(GL_POINTS,0,640*480);, you get the idea.

Bonus point: you can source colors from a second vertex buffer (I suggest using disjoint arrays).

My little protoyping app needs ~17ms to write color and depth this way, using regular malloced system memory vertex arrays. Of course this still sucks somewhat, but then the fastest way to draw depth directly (glDrawPixels(<…>,GL_DEPTH_COMPONENT,GL_FLOAT,<…> ); ) takes 30ms, and that’s without color.
I should expect a further small boost from using VBOs.

(Radeon 9500Pro, Cat 3.6)

[This message has been edited by zeckensack (edited 07-18-2003).]

zeckensack · July 18, 2003, 5:50am

Addendum:

GL_POINTS drawing to depth alone times in at 11.5ms (per 640x480 frame) vs the already mentioned 30 ms for glDrawPixels with GL_FLOAT source format (the fastest one).

A combined drawpixels to color and depth takes 60ms vs the 17ms achieved w GL_POINTS.

Btw,
ATI people, if you’re reading, I’ve found an implementation bug.

void
blit()
{
[i]	glColorMask(1,1,1,1);[/i]
	glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);

	glDrawPixels(640,480,GL_RGBA,GL_UNSIGNED_BYTE,shredder+offset);

[i]	glColorMask(0,0,0,0);[/i]
	glDrawPixels(640,480,GL_DEPTH_COMPONENT,GL_FLOAT,shredder+offset);
}

If I remove the glColorMask calls, I get a black screen. Removing the depth writes alleviates the problem. Ie, writing to depth kills the color buffer if done in this way, which it clearly shouldn’t. I haven’t verified whether writing color similarly kills the depth buffer, you may want to check this.

[This message has been edited by zeckensack (edited 07-18-2003).]

system · July 18, 2003, 6:38am

zeckensack,

I don’t know why you are getting a black screen, but you should also disable depth testing, depth writing while doing your glDrawPixels, unless of course you are already doing it.

zeckensack · July 18, 2003, 7:10am

V-Man,
I’ve skipped my init function, but depth testing is disabled. I’m not aware of any depth values being written while drawing to the color buffer. The wording of the spec is not too clear what happens, but table 3.6 (IMO) indicates that a pixel write always has exactly one target buffer, and the others should not be touched.

The 1.4 spec elaborates on this issue further when it discusses stencil writes (section 4.3.1).

Are you suggesting that any drawpixels call should always produce (and write, if not masked off) color, depth and stencil?

system · July 21, 2003, 9:50am

Sorry, I was away for a while.

Maybe I have misunderstood the spec but I think it says something like “glDrawPixels generates fragments and these fragments have an associatd color/depth/whatever values just like any other values and they go through the standard pipe.”

From what I gathered by actually executing a program is that depth is written and depth testing is performed if you dont switch them off. The depth value comes from the 3rd component of glRasterPos.

Of course, you need to use GL_RGB and related color types for this to occur.

I just testing these things again on my ATI. GDI generic proves me right. ATI proves me wrong.

glDrawPixels(…, GL_DEPTH_COMPONENT, …);

just doesnt work on ATI no matter what I do, but works as expected on GDI generic.

Certain raster positions that are suppose to be valid (and are on GDI) are invalid as far as ATI renderer is concerned.

In any case, I have to turn writes off in my apps.

AdrianD · July 21, 2003, 12:37pm

acutaly in real life it does not matter what the spec says. the only thing that really matter is how the drivers behave…
most users/gamers are not interested in openGL spec details, and they don’t even care about it. they only start the app, and when it crashes or looks wrong - even because of a driver bug - they always blame you and your application for it. they simply say: “but my other games work!”.
usualy it’s hard to explain to a customer, that the application he paid for does not work, because he uses crappy drivers which does not follow the spec. the usual answer is: “what?! you know the problem? why don’t you fix it ?”

btw. that’s what the publishers also say to you(“i don’t care whos fault it is. all i want is you to deliver a correctly running application on all systems.end of discussion.”)

and that’s the reason why i created so many versions for this depthbuffering problem.

so back to the actual thread:

zeckensack, i have tested your propsed method, but this method has still a drawback: it works only with reasonable fps on fast hardware(gf3/ati9500pro). but on this hw it works fine simply using glDrawPixels.(and they are allready fast accelerated with buffer_region or pbuffer)
i tested it with my low-end testing system(a PIII450 with a low-end card. it’s currently a KYRO which has also depthbuffer-writing problems)
on this card i can only draw a region of 150x100 pixels. any bigger regions are far too slow…(=less than 5 fps)
i think with a faster cpu i could get some more fps, but i don’t think it would be much faster.

i have some other thoughts about this problem:
actualy all i have to care about, are the regions of the depthbuffer where my 3d-objects in the scene are placed. in my current game are 1-25 craracters in one screen at the same time visible. of course, when there are 10-25 characters in the scene, the camera is placed very high above the scene. so it could work with this method, when i create a screen-space boundingbox for every character/object and update only this regions.
(hmm could i use the stencilbuffer to limit the area of the depthbuffer to update ?)

…but it is a lot of work. oh, no, just in this moment i realized, that i have this routines allredy done! my stencil shadow optimization code has some scissor-optimizations. all i have to do, is to rewrite it a little bit.

another new idea: speaking about crappy hardware with crappy drivers i can’t rely on VAR/VAO/VBO support. but i can use display lists!
i have to check out how they are accelerated and how they could be used…i better start now.

cedric_guillemet · July 22, 2003, 10:47am

Did any one tried with the pbuffer? copy from framebuffer to pbuffer and pbuffer to zbuffer?

AdrianD · July 22, 2003, 5:57pm

Originally posted by cedric guillemet:
Did any one tried with the pbuffer? copy from framebuffer to pbuffer and pbuffer to zbuffer?

yes, read my first post. because i tried it i can say so much about this topic… it works but it does not solve the actual problem.

zeckensack · August 21, 2003, 1:45pm

Unanswered stuff. Hmmm …

Originally posted by AdrianD:
zeckensack, i have tested your propsed method, but this method has still a drawback: it works only with reasonable fps on fast hardware(gf3/ati9500pro). but on this hw it works fine simply using glDrawPixels.(and they are allready fast accelerated with buffer_region or pbuffer)
i tested it with my low-end testing system(a PIII450 with a low-end card. it’s currently a KYRO which has also depthbuffer-writing problems)
This is bad, I must admit. I own a Kyro myself, so I can sympathize.

However, I think it’s not inherently a “low end” issue, it really is a Kyro issue, because of the way the hardware works. You get one distinct visible surface per screen pixel, which is actually the absolute worst case you can throw at a Kyro. It’s not as bad on other “low end” cards.

Every card will suffer somewhat. You’ll probably only get one pixel pipeline to actually do something.
If I take a Radeon 7200 for example (which I can conveniently grab off my shelf), point rendering is bottlenecked at 166MPix/s. Vertex transform is bottlenecked much earlier, at 15~20M points/s. This is still enough. In practice, my 9500Pro isn’t that much faster than my 7200 (~50%).
My Geforce2MX is a little slower than the 7200, but still faster than glDrawPixels to depth.

Overall, I found this to be a very acceptable path for dynamic depth writes.

Your bad results on Kyro are IMO a combination of hardware abuse and overloading the weak processor (for T&L; this method takes a lot of transform power).