PDA

View Full Version : Laying down depth information



zed
03-04-2009, 01:13 PM
Im curious as to what others are doing

an ideal senerio would be would be able to blit the depth info from the framebuffer to another buffer (but since you cant do this with differing sizes its a no go, though u can stick the depth info in a texture and then draw that as any size u want, i.e. the same size argument seems a bit weak)

anyways I find the fastest method when I wanna lay down the same depth info in another texture/FB is to just redraw the whole scene again.
Is this what others have also experienced?

ta zed

knackered
03-04-2009, 02:41 PM
Don't worry, I'm listening to you, zed.
I do all my drawing to a fully equipped fbo, if that's any help. The windows framebuffer is so 5 years ago.

Jackis
03-05-2009, 02:36 AM
Actually, I didn't play with that.
But I can suppose, that if you have not very high-tesselated geometry, so, as you said, it might be more efficient to draw the whole scene again. Because alternatives are not very convenient - full-screen quad with depth-replace shader or full-screen point cloud, each pixel has it's point, and accessing depth texture in vertex shader, recomposing output homogenous position.
I did the latter method, and it was 1.5X faster, than doing depth replace in fragment shader.

zed
03-05-2009, 12:02 PM
Don't worry, I'm listening to you, zed.

and so you should, for my word is gospel


The windows framebuffer is so 5 years ago.

FBO + AA only came about a couple of years ago, not to mention that not all hardware supports it (btw 'not to mention' is a nonsense sentence.)
theres also the fact rendering to a FBO first + blitting to a screen is quite a bit slower IIRC >10% last time I checked, then just rendering to the framebuffer.

true using a FBO gives far more flexibility, but it does have some drawbacks


Because alternatives are not very convenient - full-screen quad with depth-replace shader or full-screen point cloud, each pixel has it's point, and accessing depth texture in vertex shader, recomposing output homogenous position.
I did the latter method, and it was 1.5X faster, than doing depth replace in fragment shader.sounds funky, so funky that I've no idea what youre talking about :)

knackered
03-05-2009, 01:55 PM
FBO + AA only came about a couple of years ago, not to mention that not all hardware supports it (btw 'not to mention' is a nonsense sentence.)
theres also the fact rendering to a FBO first + blitting to a screen is quite a bit slower IIRC >10% last time I checked, then just rendering to the framebuffer.

true using a FBO gives far more flexibility, but it does have some drawbacks
You've backed me into a corner, zed. I don't like this corner. It smells of urine and despair. What will become of me?

Jackis, I too was left with a feeling of disorientation after reading your prose. Speak ye bullshit?

Ilian Dinev
03-09-2009, 11:03 PM
What Jackis meant about depth-replace:
draw a fullscreen quad with this frag shader:
void main(){ gl_FragDepth = texture2D(depthTex,coord).x; }

What Jackis meant about point-cloud:
draw 1280*720 points with this vert shader:
void main(){gl_Position= gl_Vertex; gl_Position.w = texture2D(depthTex,coord).x;}

Jackis
03-10-2009, 05:30 AM
Thanks, Ilian, you're totally right - that's what I've meant.
You helped knackered to understand my "bullshit".

[EDIT]: Ilian, in my point's vertex code I did a little bit more sophisticated math, because of DepthRange and because of perspective divisions. So I left W as 1, and recomputed Z as texRECT(depthMap,...)*2.0f-1.0f to make it in NDC unit post-projection cube.

zed
03-10-2009, 11:30 PM
draw 1280*720 points with this vert shader:
void main(){gl_Position= gl_Vertex; gl_Position.w = texture2D(depthTex,coord).x;}
I'm shocked if this is optimal, 1+million vertices
Though I do take your word for it.

Jackis
03-11-2009, 02:45 AM
zed,
me, I was shocked too, but that was the case - points were faster (not much, about 20-40%), than doing depth-replace in fragment shader. That was for 256*256 FBO. May be, for large screens it would be even slower, I didn't test it actually.

Ilian Dinev
03-11-2009, 06:46 AM
Meh, my old 7600GT could dynamically generate a VBO with 1280x720x2 points, and draw 1280x720 lines at 100fps iirc, with blending enabled, VBO being transferred from a FBO via a PBO, and having the structure: "hvec2 pos; DWORD colorRGBA;". It was only problematic when all lines were 50+ px long and/or very wide.
:)

Brolingstanz
03-11-2009, 06:56 AM
I'm using what I term a Quasi Monte Carlo Chicken Bone Simulation to construct a probability lattice for a hierarchical depth discarding predicate. The idea is to throw an assortment of virtual chicken bones whose shapes -- largely described by cylinders and spheres -- uniquely determine the outcome of the simulation but are not in feature completeness known a priori. A "roll of the bones" is then followed by a Minkowski sum of those bones falling within a predetermined volume of interest, principally the volume of the projected pixel under consideration. The volume of intersection of the Minkowski Bone and the probability lattice determines the likelihood that the depth test will fail, thus serving to predicate the depth fill semi deterministically with the desirable aperiodic property of NTMC. It's a crap shoot, in so many words.

CatDog
03-11-2009, 07:36 AM
Good heavens modus, what happened to your nickname?

CatDog

Brolingstanz
03-11-2009, 08:10 AM
Name's Stan - Housewares.

modus... was a bit moded and I got a good deal on a new one.

zed
05-24-2009, 12:32 AM
ro revist this old exiting topic

heres my pipeline simplified
http://www.zedzeek.com/junk/pipeline.png

Im actually finding it much faster (~30%)
when I need to redraw the depth info (3x in the above picture) instead of first creating a depth texture + then blitting that later on,
to just say bugger it Ill redraw the whole depth info again, i.e. I redraw an extra ~500 meshes per frame since its cheaper than 'quickly' bliting the depth info again.

I find this crazy, perhaps others here are doing the 'slow' way without realising it

Jackis
05-25-2009, 03:54 AM
Interesting.
What's the overdraw ratio for your typical scenes? If you have high overdraw, your approach must have been slower, then blitting depth somehow. But I wouldn't count on it and I totally believe your experience on that.
By the way, how did you perform depth blitting?

Madoc
05-25-2009, 05:13 AM
Yeah, I would basically ask the same questions as Jackis... You say ~30% faster and ~500 additional draw calls but what about your depth complexity? How much geometry? What method exactly are you using for the blit (have you tried ARB_copy_buffer)? What hardware is this on, you've tried on different hw? What about depth formats?

Still, presumably your depth complexity is just over 1, poly counts not very high etc. Doesn't make much sense (but then what about Jackis' points! lol!), would suggest some poorly optimised function on the driver side.

Jackis
05-25-2009, 05:55 AM
Well, Madoc, that's not LOL with points )) 2 years ago that was the only way to make some very specific task run quite fast. Yeah, it's not elegant at all, but depth-replace was more costly.

Madoc
05-25-2009, 12:49 PM
I'm not denying that it was faster, that's what I find funny. It seems completely absurd that drawing millions of points should be faster than very simple per fragment operations. It's either funny or just plain scary.

Jackis
05-25-2009, 02:32 PM
Actually, I was speaking only about 256*256 viewport, and that's only 64K points. May be, for larger resolutions it won't be the case, as I've mentioned. But right now it doesn't matter, because there are much more convenient way to blit depth ))
But okay, we're still waiting for zed's responses )

zed
05-25-2009, 10:04 PM
I havent tried ARB_copy_buffer, first time Ive seen it but from a quick look it seems to be only available on new hardware (+ new drivers cause it doesnt show up in my driver string), so thats out of the question.

Ill try a much simpler case in a couple of hours + report my results, IIRC theres quite a performance hit just from storing the depthbuffer in a FBO. but Ill get back to yous on that.

zed
05-26-2009, 12:13 AM
this is a simplified version no,DOF,glow etc just draw scene + then redraw depth X number times afterward (clearing first)
http://www.zedzeek.com/junk/KEA_2009-05-26_0.jpg
:o
whoops looks like my FBO version was using a fp16 texture by default hence a large slice of performance drop, also another possible cause for the slow down WRT linear depth is the normal zdepth has to be finished first (thus depth texture exists) before u can create the linear depth, thus that order is different that the redraw mesh method. I believe binding FBOs is the most expensive operation u can do in opengl.

240meshes in frustum, no occlusion test RGBA8
redraw all meshes, blit depth tex, number lay down depth iterations either by redrawing all meshes or blitting.
36.6 48.4 10
46.6 53.5 5
55.4 56.5 2
59.2 58.0 1

115meshes in viewfrustum
48.4 54.3 10
59.5 60.6 5
68.7 65.1 2
72.7 66.4 1

ok faced with this data, theres not really much in it for a standard scene (which would be laying down depth perhaps 2-3x extra)
sorry about the panic :cool:

Brolingstanz
05-26-2009, 01:58 AM
Awesome, zed. Makes me want to get back into game programming again.

zed
05-26-2009, 01:56 PM
ta Brolingstanz
actually thinking about this some more, Ive still got the feeling I was right with my initial assertion, its hard to test in a game with a lot of extra stuff going on which can influence the result.

Ill try to code up a simple demo tonight

zed
05-27-2009, 02:14 AM
http://www.zedzeek.com/overdraw_tester.7z
1,2,3 switch between the buffers (1,3 are most valid)
-= for adding rocks
[] for number of extra times to draw the depth info

summary, for 99% of cases redrawing the whole scene is gonna be quicker

I couldnt get glBlitFramebufferEXT working for the depth (I'm pretty sure ive had it working in the past, nvidia drivers would crash as apps exit if I tried, no errors reported)

http://www.zedzeek.com/screenshot.jpg

Sanctus
05-27-2009, 05:22 AM
If I come to think about this if you store normal and depth you can do the lighting individually as well without redrawing the geometry again and again. But you have to use the deferred tehnique for this I guess. Anyway nice performance :)
I tryed your demo but it's capped at 60 fps. Why is that?

Hampel
05-28-2009, 01:47 AM
I tryed your demo but it's capped at 60 fps. Why is that?

vertical sync; try to disable in driver settings...

Madoc
05-28-2009, 04:48 AM
Just tried the test program on my laptop (9800M gtx). 2 seems the fastest overall. 3 wins with very low depth complexity which is what I'd expect, but it takes a lot of depth complexity for 1 to beat 3.

What exactly does 2 do?

And yeah, I had to force vsync off in the driver CPL.

CatDog
05-28-2009, 06:46 AM
GTX285, 100 rocks, view not modified, default driver settings except vsync forced off:

1: FPS ~1600 ~69MVerts/sec ~90MTris/sec
2: FPS ~2000 ~78MVerts/sec ~120MTris/sec
3: FPS ~2000 ~155MVerts/sec ~240MTris/sec

CatDog

zed
05-28-2009, 04:37 PM
verison 2 doesnt do much, I was just testing to see if theres a speed difference between a FBO with depth texture2d vs a FBO with depthrenderbuffer.
btw no difference on my gf9500.

in a typical game scene, theres how many objects onscreen?
perhaps ~100
In this senerio if u need to write the depth again to the buffer then youre best off redrawing the whole scene again (at least on nvidia cards).

@CatDog - your cards to fast, whilst you can change the window size the FBO does not change as well.
question - how would I do this best?
I could destroy + recreate the FBO + corresponding textures but when the user is changing the windows size by dragging the borders this seems like overkill, is it the only way?


-----------------
btw heres a newer version (with vsync off ;) also I do culling of rocks outside the viewfrustum + ground plane gets drawn into the buffer)

http://www.zedzeek.com/overdraw_tester.7z
1,2,3 switch between the buffers (1,3 are most valid)
-= for adding rocks
[] for number of extra times to draw the depth info
left + right mouse buttons to change camera

Madoc
05-29-2009, 04:26 AM
This hasn't much to do with objects, it's mainly fill. You are either drawing a full view quad (depth complexity = 1) or a few objects with a presumably slightly lower fill cost. Until the overall cost (likely when depth complexity > 1) of the objects exceeds that of the quad you're probably going to get better performance from the redraw. Any relatively complex scene (i.e. with vegetation) taking up at least most of the view is bound to perform better with stored depth.

Your tester's results are just what I'd expect, nothing unusual about the results as far as I can tell.

CatDog's perf sounds right, even my laptop did ~1200 fps.

Edit: About resizing, I've never seen any adverse effects from just redefining the tex attachment dimensions and changing the glViewport parameters, why destroy the FBO?

CatDog
05-29-2009, 09:08 AM
Ok, another one using the new version and increased load.

GTX285, rocks=1000 depth-passes=1

1: FPS ~515 ~200MVerts/sec ~300MTris/sec
2: FPS ~515 ~200MVerts/sec ~300MTris/sec
3: FPS ~335 ~255MVerts/sec ~394MTris/sec

GTX285, rocks=1000 depth-passes=20

1: FPS ~325 ~125MVerts/sec ~190MTris/sec
2: FPS ~515 ~200MVerts/sec ~300MTris/sec
3: FPS <u>~33</u> ~270MVerts/sec ~410MTris/sec

GTX285, rocks=1000 depth-passes=100

1: FPS ~119 ~45MVerts/sec ~70MTris/sec
2: FPS ~515 ~200MVerts/sec ~300MTris/sec
3: FPS <u>uh...~7</u> ~270MVerts/sec ~410MTris/sec

It's also interesting to watch the Process Explorer. Mode 2 seems to have much less CPU load! In mode 3, the app get's jerky when depth-passes are increased.

*edit*
And a last one, out of curiosity:

GTX285, rocks=5000 depth-passes=100

1: FPS ~72 ~137MVerts/sec ~210MTris/sec
2: FPS ~138 ~263MVerts/sec ~405MTris/sec
3: FPS <1, screwed

CatDog

zed
05-29-2009, 06:02 PM
This hasn't much to do with objects, it's mainly fill. You are either drawing a full view quad (depth complexity = 1) or a few objects with a presumably slightly lower fill cost. Until the overall cost (likely when depth complexity > 1) of the objects exceeds that of the quad you're probably going to get better performance from the redraw. Any relatively complex scene (i.e. with vegetation) taking up at least most of the view is bound to perform better with stored depth.coincidentally, vegetaion is what Im doing today,
btw the camera can be moved so its pointing downwards, thus 100% screen has depth info



Edit: About resizing, I've never seen any adverse effects from just redefining the tex attachment dimensions and changing the glViewport parameters, why destroy the FBO? yes youre right, I realized afterwards theres no need to recreate the FBO

cheers all, for the info