PDA

View Full Version : Shadow Mapping FPS Woes - Pos. ATI Driver issue



bobvodka
04-01-2005, 10:34 AM
evening all, i've a slightly vexing issue with shadow mapping in general.

I'm trying to track down if this problem is a symptom of drivers, hardware or both.

I've got a X800XT PE running the Cat5.3 driver set, now when I run the shadow mapping code from Pauls Projects (http://www.paulsprojects.net/tutorials/smt/smt.html) I get a mind blowing 4fps out of it.

I've tested it on a 9800xt running the Cat4.9 drivers and it gives fps up in the couple of hundreds, two other people have tested it, one got fps in the mid 100s and is reporting a driver version of 6.14.10.6517 on his X300 and the other has a X800XT and is also getting around 4fps.

So, in an effort to track down if this is a driver or hardware thing I was wondering if a few people could run the test program and report back the fps;
clicky (http://members.gamedev.net/phantom/smt.rar)

I'm mostly intrested in ATI cards/driver versions to work this out, coz if indeed it is a driver thing someone has REALLY dropped the ball somewhere (my main suspect is the glCopySubTex() call for pulling the depth details into a texture)..

edit: yeah, as zeckensack has done below, which control panel version might be handy as well, i'm using the CCC edition

zeckensack
04-01-2005, 10:41 AM
640 fps
Radeon 9800Pro
Catalyst 5.3 (CP version) at defaults
Athlon 64 3200+
VIA K8T800Pro
Windows 2000 Pro SP4

ZbuffeR
04-01-2005, 11:43 AM
750 fps (after forcing vsync off, app. controlled was vsyncing)
Geforce 6800le
Forceware 71.84 (quality defaults)
Athlon 64 3200+
windows 2000 pro sp4

Java Cool Dude
04-01-2005, 07:55 PM
840 FPS
GeForce 6800 NU @ 360/800
AMD XP 1.92 GHZ
768 Ram

Roderic (Ingenu)
04-02-2005, 12:02 AM
803 FPS (after disabling VSync)
GeForce 6600GT PCIExpress
ForceWare 71.84 (quality default -vsync)
Athlon64 3200+

CrazyButcher
04-02-2005, 01:08 AM
320 FPS (no VSync) (anyone less than this ;) )
GeForce 4 Ti 4200
ForceWare 71.90
P4 2.2

Ysaneya
04-02-2005, 02:06 AM
5 fps

Athlon 64 3500+
Radeon X850 XT
Win XP Pro
Catalyst 5.2

Sunray
04-02-2005, 06:23 AM
530 FPS
Radeon 9700 Pro
Catalyst 5.3
Pentium 4 2.66 Ghz
Windows XP Pro

kansler
04-02-2005, 12:31 PM
618 fps
Geforce 5900XT
Forceware 61.77
Athlon 2600+

238 fps / 415 fps
Radeon 9500 / Radeon 9500 + 9700 hack
Cat 5.3
Athlon 1000

azcoder
04-02-2005, 01:49 PM
6.42 fps Radeon Mobility 9800 Driver version 6.14.10.6517

bobvodka
04-02-2005, 07:32 PM
Originally posted by azcoder:
6.42 fps Radeon Mobility 9800 Driver version 6.14.10.6517
Originally posted by Ysaneya:
5 fps

Athlon 64 3500+
Radeon X850 XT
Win XP Pro
Catalyst 5.2Now, its little blips like those two, specifically the second one as its basically the same chipset as mine, which makes me think someone at ATI driver central has really dropped the ball.. the first one I cant explain, but it doesnt look that good either.

Keep the reports coming, the NV ones are handy as a comparison thing, if Humus doesnt comment here before I get a chance to poke devrel about it all feedback will be handy :)

PsychoLns
04-02-2005, 11:27 PM
6.4 fps
X800 Pro
Catalyst 5.3 CP
Barton 3200+
WinXP

execom_rt
04-03-2005, 12:10 AM
Try to recompile the application using 'glCopyTexImage2D' for copying the shadow map back to the texture instead of glCopyTexSub-.

I had problems with glCopyTexSub-. (tracked down with GLIntercept) which returned invalid operation, or slow behaviour).

I suggest also to support the 'NV_render_depth_texture' extension if available and use a PBuffer.

Shadow mapping looks much greater on Geforce3.

Alas, the equivalent function on ATI is not available, simply the Radeon architecture doesn't support it (hope it will be in the R520, at last).

bobvodka
04-03-2005, 01:55 AM
The crazy thing is, on older hardware (9800s and below) and apprently the X300 the code works fine, heck going from a 9800xt to an X800XT is what made me notice (well, initally I thought it was my fault as I'd changed something else, talk about a waste of two days), it just seems to be something breaking in the X800 series as the few ATI results above show.

I knew about the other methods, but as you point out thats not much use on ATI hardware anyways. (infact, I was doing a pack to RGBA and extracting in a fragment shader which is fine for my target hardware of cards with GLSL support)

execom_rt
04-03-2005, 04:10 AM
Have you tried :

http://www.delphi3d.net/download/shadowmap.zip

it's another shadow map program using the plain GL1.4 implementation. There is a fps counter on it. Does it run slow ?

bobvodka
04-03-2005, 04:46 AM
errm, are you use that is a plain GL1.4 impl? Coz it totally blew up on me with an access violation and two nulls, so i'm suspecting an NV extension in use...

execom_rt
04-03-2005, 07:00 AM
the demo from delphi3d uses GL_ARB_multitexture, GL_ARB_depth_texture and GL_ARB_shadow.

You needs GLUT32.dll and it works on an ATI 9800 (no NV specific code)

yooyo
04-03-2005, 09:42 AM
1262 fps
6800U
FW 76.43
P4 2.8
WinXP SP2

yooyo

bobvodka
04-03-2005, 12:57 PM
Originally posted by execom_rt:
the demo from delphi3d uses GL_ARB_multitexture, GL_ARB_depth_texture and GL_ARB_shadow.

You needs GLUT32.dll and it works on an ATI 9800 (no NV specific code)hmmm i'm not convinced you got the right demo there;
- its short an fps counter
- a quick look at the source doesnt show it using any of those functions at all

and as fun as your replies are, they dont address the main issue which is it looks like someone, somewhere, has dropped the ball with copying to depth textures from the framebuffer.

execom_rt
04-03-2005, 02:08 PM
Sorry, i was linking the wrong file
it's shadowmap.zip, not shadows.zip ;)

Shadowmap (http://www.delphi3d.net/download/shadowmap.zip)

Hope it will work. it is using
glCopyTexImage2D(GL_TEXTURE_2D, 0, GL_DEPTH_COMPONENT, 0, 0, S_SIZE, S_SIZE, 0);
In Main.pas, line 104, which is different of Paul's Project which is using glCopyTexSub.

bobvodka
04-03-2005, 02:51 PM
ah, that one also scores a massive 4fps... :

execom_rt
04-03-2005, 03:49 PM
OK, so it's a driver bug, all I wanted to known.

Try to contact ATI for support, and maybe it will be fixed in the future version of Catalyst.

Alternatively, you can rewrite the whole shadow mapping in OpenGL by :

- Writing a vertex and fragment shader for shadow map generation (1st pass)

- Writing a vertex and fragment shader for shadow map test (2nd pass). Also you could implement a 2x2 PCF at the same time.

- using PBuffer (with floating point buffers for example).

bobvodka
04-03-2005, 04:21 PM
yeah, i've already done the shaders method before now (packing into a RGBA texture via GLSL), but having a broken copy command is kinda annoying anyways ;)

I'll poke ATI devrel at some point (I'm tempted to after the next driver release, which is about a week or so away), although i wouldnt expect a fix soon due to the driver pipeline being rather deep :|

I'm still amused that something like this managed to make it into the drivers in the first place and that it doesnt appear to have been noticed before... :eek:

execom_rt
04-04-2005, 12:02 AM
So, why don't you use a
PBuffer (http://oss.sgi.com/projects/ogl-sample/registry/ARB/wgl_pbuffer.txt) ?

+ GL_RGB_FLOAT16_ATI.

(Sadly you can't use the new EXT_framebuffer_object. It won't be supported by ATI, hardware issue again, due to the render to the depth buffer which is not supported).

Then you won't have to copy. Maybe the glCopyTexture is simply broken, so don't use it.

It would be nice to have a kind of 'certified OpenGL driver', like 'certified DirectX driver' where drivers are tested throughly by SGI or an other big company. The drivers could pass some rendering tests, test all the extensions etc..

Finally it's a bad news for me. I've implemented the shadow mapping using the ARB_shadow extension, i need to redo a completly new version using vertex / fragment shader, because apparently, it doesn't work on X800.
:mad:

LogicalError
04-04-2005, 01:37 AM
Originally posted by execom_rt:
Sadly you can't use the new EXT_framebuffer_object. It won't be supported by ATI, hardware issue again, due to the render to the depth buffer which is not supportedomg i hope you're joking..
Are you sure they won't support a partial or an emulated implementation?

execom_rt
04-04-2005, 01:52 AM
Well it is not supported.

DST is not an official DX9 feature and ATI didn't implemented it.

(remember the story with 3DMark'03 and DST support ? (google 3dmark DST, it was looking different, better in fact, on nVidia than on ATI)

- Creating a DST in DirectX 9 is not possible : function returns 'not available'.

- There is no proprietary OpenGL extension that support it (not WGL_NV_render_depth_texture equivalent).

Finally, EXT_framebufferobject is not available.

I think the specification tell that you need to support GL_DEPTH_COMPONENT as render target, but this will fails.

So either ATI implements a 'partial' EXT_framebufferobject with just render to a texture, or choose not implement it at all (what a surprise, ATI was against the 'ARBisation' of GL_EXT_framebufferobject, now you guess why ...

But with the ARB_fragment_program and GLSL, it is possible to write your own shadow mapping without using depth component texture (using ATI texture float).

Of course it will much slower that the nVidia render path, but it will be a 'portable' solution.

ATI had a long history of problems with ZBuffer : glReadPixel with GL_DEPTH_COMPONENT didn't work well on Radeon 8500, principally due to the 'HyperZ' thing that compress the ZBuffer, so reading back the Z Buffer needs the hardware to decompress the HyperZ, but that operation seems buggy at some point.

I think that what's happening is the HyperZ on X800 has changed again (now
HyperZ™ HD, before it was HyperZ III™+), so reading it back is even more slower.

Since no (?) OpenGL games is using that feature (DST), ATI didn't bother to optimize that part and is focused on other problems.

bobvodka
04-04-2005, 07:09 AM
"The framebuffer object extension is currently available in beta drivers from both NVIDIA and ATI. It will be fully supported on NV30 and R300 and later, and possibly on NV1x and R200 as well." (http://www.gamedev.net/columns/events/coverage/feature.asp?feature_id=75)

and until someone from ATI says otherwise I'll be beliving we'll get it (even if it doesnt support rendering to depth directly, i'm not that bothered as I dont intend on using pure depth only rendering anyways for shadow mapping stuff so the minor cost of packing and depacking isnt an issue)

zed
04-04-2005, 11:21 AM
Since no (?) OpenGL games is using that feature (DST)mine does, perhaps thats why ati hardware cant run it!

Humus
04-04-2005, 07:04 PM
Originally posted by execom_rt:
(Sadly you can't use the new EXT_framebuffer_object. It won't be supported by ATI, hardware issue again, due to the render to the depth buffer which is not supported).Exactly where did you get this information or are you just speculating? FBOs will be supported. I don't know about what the plans are for depth-renderables, but it's of course possible to implement by simply doing a copy under the hood.

Edit: And I don't see anything in the spec that makes depth-renderables required for FBOs.
I'll take a closer look at this issue tomorrow and report back what I find out.

execom_rt
04-05-2005, 02:00 AM
I guess that the GL_EXT_framebuffer_object has some error handling if one try to create a 'depth component' render target.

At worst, it would return an 'unsupported' error code, which is fine. It won't stop this extension to be shipped on ATI graphics board.

But what I'm wanted to say is that this extension is not shipping yet at the day of 05 of April 2005, so the immediate solution right now is using PBuffers, and using WGL_TYPE_RGBA_FLOAT_ATI for a 32bit floating texture and writing a shader that output the depth coordinates into that buffer), and not using glCopyTexImage2D at all.

And this is valid for everybody who tries to to implement shadow mapping in OpenGL. I guess ATI should write a 'white paper' of how to implement shadow mapping on their boards, and more generically how to implement shadow mapping in 2005 (using Pbuffer, 32 FP texture, implementing a 2x2 PCF in GLSL etc...).

The document 'Shadow mapping in Today's OpenGL Hardware' is getting quite old. I'm telling that for everybody : Don't read that document anymore, there is now better solution than this, which are far more portable.

For finishing, I'm linking that post (http://www.ampoff.org/modules.php?name=Forums&file=viewtopic&t=15&sid=929e2d5849cd4501bd0715f0715f0c825aadb)

With an implementation of shadow mapping in GLSL. Nicely explained. Alas, it missing the depth map generation + a 2x2 PCF which are easily doable.

I will probably post of an implementation I've just done in GLSL in a near future.

Probably a demo using GL_EXT_framebuffer_object, GLSL, 32 FP texture, 2x2 PCF and why not, a blur effect on the texture would be a nice plan of demo for you Humus ;)

bobvodka
04-05-2005, 07:13 AM
heh, oddly enuff it was that tutorial you linked to above where i learnt how to do the whole shadow mapping lark.

As for FBO, I'm holding out some hope it'll be in the next driver update (possibly the by the 7th, if not early next week by my guess), it really depends on the depth of ATI's driver pipeline... if not, well its a wait until next month, but I've got plenty todo in the mean time anyways :) (incase you are wondering, i very quickly developed a dislike for the whole pbuffer system, so i refuse to touch 'em with someone elses bargepole...)

hmmm 2x2PCF.. hadnt thought about adding that... might look into how that works and see if i can come up with my own demo.. might be handy for an up and coming project ;)

execom_rt
04-05-2005, 08:01 AM
For the PCF 2x2 You could write something like this :

- Based from the source code of the link, just defined RcpSampleSize = 1/TEXTURE_SIZE (ie 1/256 for 256x256 map), and r the final value -


vec2 texelpos = projectiveBiased.xy / RcpSampleSize;
vec2 lerps = vec2(fract( texelpos.x ), fract(texelpos.y));
float z = projectiveBiased.z;
vec4 k;
k.x = texture2D( shadowMap, projectiveBiased.xy ).r < projectiveBiased.z ? 1.0 : 0.0;
k.y = texture2D( shadowMap, projectiveBiased.xy + vec2(RcpSampleSize, 0) ).r < projectiveBiased.z ? 1.0 : 0.0;
k.z = texture2D( shadowMap, projectiveBiased.xy + vec2(0, RcpSampleSize) ).r < projectiveBiased.z ? 1.0 : 0.0;
k.w = texture2D( shadowMap, projectiveBiased.xy + vec2(RcpSampleSize, RcpSampleSize) ).projectiveBiased.z < z ? 1.0 : 0.0;
float r = mix( mix( k.x, k.y, lerps.x), mix( k.z, k.w, lerps.x ), lerps.y );This code really 'shines' with Floating point texture.

Else, for generating the Z-Depth



// Vertex shader
void main(void)
{
gl_Position = ftransform();
}

// Fragment shader
void main(void)
{
gl_FragColor = vec4(gl_FragCoord.z);
}
Alas, it is slow on my machine: Reading gl_FragCoord.z silently forces to sw rendering (I hate when it does that). I guess there is a better method for this one.

Humus
04-05-2005, 08:48 PM
Originally posted by execom_rt:
Alas, it is slow on my machine: Reading gl_FragCoord.z silently forces to sw rendering (I hate when it does that). I guess there is a better method for this one.Using depth bias and reading gl_FragCoord.z will get you into software rendering. Without the depth bias it should work fine. You can solve the problem by putting the bias on the other pass instead.

execom_rt
04-06-2005, 12:13 AM
Yes, removing the glEnable(GL_POLYGON_OFFSET_FILL); fixed it, it's running at correct speed now.

WyZ
04-06-2005, 01:44 PM
Hi,

I noticed the same problem about 6 months ago when I tried to implement depth peeling on my X800XT PE. The problem occured only when I copied the depth buffer into a texture. The problem did not seem to exist on my 9600 pro.

I just put the project on the ice while waiting for ATI to accept my registration in their developer program so I can report bugs and get access to more information. I guess they forgot all about me. Well, I will try to register for the third time in 6 months.

I will be watching this thread closely to see if anyone comes up with a solution.

Cheers!

bobvodka
04-07-2005, 02:21 PM
Just for the record, the OGL driver with the Cat5.4's has the same problem still (and no FBO extension either, so thats another month to wait.. )

Java Cool Dude
04-07-2005, 10:28 PM
Speaking of Shadow mapping, does this demo run on ATi boards?
http://www.realityflux.com/abba/C++/Point%20Shadow%20Maps/PointShadowMaps.zip
http://www.realityflux.com/abba/C++/Point%20Shadow%20Maps/PointShadowMaps.jpg

Chuck0
04-07-2005, 11:17 PM
Originally posted by Java Cool Dude:
Speaking of Shadow mapping, does this demo run on ATi boards?
http://www.realityflux.com/abba/C++/Point%20Shadow%20Maps/PointShadowMaps.zip
http://www.realityflux.com/abba/C++/Point%20Shadow%20Maps/PointShadowMaps.jpg yep it does (radeon 9700 cat 5.1)

Adrian
04-08-2005, 12:41 AM
Originally posted by Java Cool Dude:
Speaking of Shadow mapping, does this demo run on ATi boards?
http://www.realityflux.com/abba/C++/Point%20Shadow%20Maps/PointShadowMaps.zip
http://www.realityflux.com/abba/C++/Point%20Shadow%20Maps/PointShadowMaps.jpg I get an '<X>Unable to create pbuffer' error in the log file.

I have a GF5900 Ultra 75.90.

Bryan Dudash
04-12-2005, 05:07 PM
Originally posted by Adrian:

Originally posted by Java Cool Dude:
Speaking of Shadow mapping, does this demo run on ATi boards?
http://www.realityflux.com/abba/C++/Point%20Shadow%20Maps/PointShadowMaps.zip
http://www.realityflux.com/abba/C++/Point%20Shadow%20Maps/PointShadowMaps.jpg I get an '<X>Unable to create pbuffer' error in the log file.

I have a GF5900 Ultra 75.90.Just figured I'd pass a long a little info that we found when running this app. On the FX5900, we get the same error. But this is due to an app bug. For some reason, the program is passing an invalid pixetFormat value to wglCreatePbufferARB() when run on an nv35. The value is very large, much larger than the last index of the last pixel format in the list. And so, wglCreatePbufferARB correctly fails to create the pbuffer with this invalid format.

Thanks!

-B

PsychoLns
04-13-2005, 12:08 PM
Originally posted by bobvodka:
Just for the record, the OGL driver with the Cat5.4's has the same problem still (and no FBO extension either, so thats another month to wait.. )Ehm.. 5.4 doesn't expose the GL_EXT_framebuffer_object but all the entry points seem to be there - has anyone tried using them? (hacking some fbo example to not test for the extension..)
And btw.. the 5.5 beta is running those 6 fps too

execom_rt
04-13-2005, 01:00 PM
Yes, this is something i've noticed in OpenGL Ext. Viewer, seems they already started to implement the GL_EXT_framebuffer_object.

Maybe there is an registry tweak in order to enable it ...

bobvodka
04-13-2005, 01:16 PM
Well, I've just tried it out by forcing GLee to load the extension, it links fine however when validating a framebuffer it returns 0, so I think thats a clue that it isnt ready yet ;)

If you skip the validation part you just get a nice black texture (and probably numerous glError()s but I dont test for them) and I'm pretty sure I havent ballsed up (I used the code from NV's GDC05 talk as a base to work from).

So, on the upside I do have a program to test FBOs in future, on the downside its not even slightly working, which based on what I was told the other day about having to 'wait a couple of months' I'm not overly shocked about :)

Java Cool Dude
04-13-2005, 03:39 PM
Originally posted by Bryan Dudash:
I have a GF5900 Ultra 75.90.[/qb]Just figured I'd pass a long a little info that we found when running this app. On the FX5900, we get the same error. But this is due to an app bug. For some reason, the program is passing an invalid pixetFormat value to wglCreatePbufferARB() when run on an nv35. The value is very large, much larger than the last index of the last pixel format in the list. And so, wglCreatePbufferARB correctly fails to create the pbuffer with this invalid format.
[/QUOTE]

This is the code that I wrote that handles the creation of a pBuffer, please point out any mistakes that I've unintentionally done.

</font><blockquote><font size="1" face="Verdana, Arial">code:</font><hr /><pre style="font-size:x-small; font-family: monospace;"> pixelBuffer.initialize(TCMSize,TCMSize, 24, 0,
PBUFFER_FORMAT_FLOAT |
PBUFFER_TEXTURE_CUBE_MAP |
PBUFFER_RENDER_TO_TEXTURE);

bool GLPBuffer::initialize(int newWidth, int newHeight,
int newDepthBits, int newStencilBits,
int pBufferFormat)
{
//Check for pbuffer support
if(!GLEE_WGL_ARB_extensions_string

Java Cool Dude
04-13-2005, 03:41 PM
^ Unregistred huh?

</font><blockquote><font size="1" face="Verdana, Arial">code:</font><hr /><pre style="font-size:x-small; font-family: monospace;">bool GLPBuffer::initialize(int newWidth, int newHeight,
int newDepthBits, int newStencilBits,
int pBufferFormat)
{
//Check for pbuffer support
if(!GLEE_WGL_ARB_extensions_string