Cube maps for shadows on NV40

Hi,

I am trying to implement shadow mapping for point lights. What I was hoping to do is to use a cube map so I can do just one look up int he shader (as opposed to six look ups).
I am writing my code targeting NV40 class hardware.

So here are my questions.

  1. Does NV40 support DEPTH_COMPONENT CUBE_MAP targets?
  2. I did code up a path where I am rendering to a (32,0,0,0) bits pbuffer an I would like to copy it to a (32,0,0,0) bits cube maps. Using the LUMINANCE_FLOAT_ATI format, this is really slow. Using the R32_FLOAT_NV format, it appears not to work at all.

So I am wondering what I should do to get this to work at a decent speed of should I just have 6 maps (I really hope not)?

--X

You can try to render in standard rgba. You can find details in Humus demo “Shadows that rocks” http://www.humus.ca/index.php?page=3D&ID=28.

yooyo

No card as of yet supports shadow cubemaps. But there are things you can do to use shadow mapping with point lights. As yooyo said you can use a standard rgb cubemap and do the shadow calculations your self in a shader. But this will not allow you to take advantage of the hardware shadow map features of the GPUs since you’re doing everything yourself. But you still have to render 6 times to fill the cubemap. A plus about doing the 6 shadow map method is that for one you don’t have to always render 6 shadow maps (it depends on what is currently in view) and the other thing is you can vary the size of each shadow map according to the scene. For example if one area of the scene can get by with a 512x512 SM but another part of the scene needs a 2048x2048, you don’t have to use 6 2048 SMs taking up a lot of memory and slowing you down for no good reason. From what I understand this is what John Carmack is doing in his next engine. And of course there is always the dual parabloid method allowing you to only have to render 2 times for each light. But this method is not all the great b/c it requires you to highly tesselate everything, even surfaces where it doesn’t make sense to have a high tesselation, like a wall or floor. It’s an interesting method to implement and play with but I don’t see it really being used seriously.

-SirKnight

Thanks for the comments.

So one option I’d like to have is to use a 16-bit float cube map where each face is only one channels (say LUMINANCE_FLOAT16_ATI) that would recomputed at runtime by drawing to a pbuffer and using a copy to texture. It would appear that I cannot get this working at a decent speed on nv40 (i.e. it gos from 30fps to about 1fps when the copy is turned on).

Ideas on this anybody?

Thanks!!!

--X

So one option I’d like to have is to use a 16-bit float cube map where each face is only one channels (say LUMINANCE_FLOAT16_ATI) that would recomputed at runtime by drawing to a pbuffer and using a copy to texture. It would appear that I cannot get this working at a decent speed on nv40 (i.e. it gos from 30fps to about 1fps when the copy is turned on).

Ideas on this anybody?
This is a fine idea, but until EXT_FBO starts getting stable implementations, you’re going to have to accept some performance problems. However, a guess as to the cause of the performacne drop would be that perhaps the NV40 doesn’t really support luminance float 16 textures (which would suck, btw), and has to emulate them with RGBA versions. You can always try an RGB F16 texture, where you render the same output to all three color channels.

I’m going to assume that LUMINANCE_FLOAT16_ATI is what nvidia is calling R16F format in their GPU Programming Guide. If that is true (I’m pretty sure it is) then this format is not supported. Thus what Korval said is true, it’s being emulated. However, R32F is supported, so try that. I’m going to guess this would be LUMINANCE_FLOAT32_ATI. I havn’t looked it up which is why that’s a guess. :wink:

I don’t have a direct link but go to nvidia’s dev site and look for the GPU Programming Guide. It tells you everything about supported texture formats among a ton of other stuff you should know about.

-SirKnight

Here is a doc from nvidia dev page that could be intersted… http://developer.nvidia.com/object/nv_ogl_texture_formats.html

According to this paper LUMINANCE_FLOAT16_ATI are substituted with LUMINANCE_ALPHA_FLOAT16_ATI.

yooyo

A plus about doing the 6 shadow map method is that for one you don’t have to always render 6 shadow maps (it depends on what is currently in view)
yeah its what i do, simple to implement

and the other thing is you can vary the size of each shadow map according to the scene. For example if one area of the scene can get by with a 512x512 SM but another part of the scene needs a 2048x2048, you don’t have to use 6 2048 SMs taking up a lot of memory and slowing you down for no good reason
i can only seeing this working well with static lights, still every little bit helps

i can only seeing this working well with static lights, still every little bit helps

Yeah probably so. I havn’t tried doing this, it’s just something I have thought a little about. For dynamic lights I wouldn’t imagine re-evaluating 6 SM sizes would be a good thing to do every frame. Perhaps every x number of frames or something. I don’t know, this is something that needs to be done and played with to see exactly what happens. Although for static lights this would definately be a good thing.

-SirKnight

According to this paper LUMINANCE_FLOAT16_ATI are substituted with LUMINANCE_ALPHA_FLOAT16_ATI.
Sounds like an alignment issue or something. That a single 16-bit float texture is something the hardware can’t handle. However, for a depth texture, you might consider having a full 32-bit floating point depth range.