Fast *Dynamic* Cubic Environment Map

What is the most efficient way to do fast dynamic cubic environment maps using GL extensions (particularly using Nvidia GF2 & 3 hardware)?

I am currently simply rendering the scene 6 times into the 6 cube texture maps, using the GL_TEXTURE_CUBE_MAP_ARB family of extensions. This is obviously very slow, but the most straightforward way to do it.

I had been thinking of rendering the static geometry only once and then rendering the dynamic geometry every frame into the cube maps. Though I really don’t think that will completely do it in terms of the speed I need.

It would seem that there would be a way to submit the geometry only once to the T&L HW (Nvidia GF2/3) and have the T&L do the 6 cube maps plus the 1 final displayed render of the geometry with some obscure but simple GL extensions switches. Though its not quite obvious (can’t find any documentation) on how to do this short of writing my own custom vertex shader (which I am not going to do, though if someone has example code I would be happy to beg/borrow/steal it ;-)…

BTW: Just in case it matters for static geometry I’m using display lists and for dynamic geometry I’m using glInterleavedArrays/glDrawElements w/GL_TRIANGLE_STRIP.

“It would seem that there would be a way to submit the geometry only once to the T&L HW (Nvidia GF2/3) and have the T&L do the 6 cube maps plus the 1 final displayed render of the geometry with some obscure but simple GL extensions switches.”

<sarcasm> Of course there is. Why wouldn’t there be an extension to render the “scene”, even though OpenGL has no concept of a “scene”, 6 times and then render the entire “scene” again, all from different perspectives. </sarcasm>

Even if such an extension existed, it wouldn’t have much better performance than doing it manually. You’d only be saving the bandwidth of sending the vertex data 7 times.

A vertex program can’t do it either, since it only operates at the per-vertex level (not the per-1000 vertex level that you need). Nor does the language allow for looping.

Dynamic cube mapping has its drawbacks. It’s just not going to be fast, since it requires rendering at least a portion of the scene 6 times.

would be nice to support simply rendering onto a cubemap-buffer ( a pbuffer with cubemap-stuff ) and like that all can be rendered in one pass, no clipping needed, nothing…

tip for you:

sort your data into 6 spaces, for every side one, and just render this data for the specific side…

micahp, there’s probably no really fast way to do dynamic cube mapping. The way I’d go about speeding it up (and mind you, I haven’t done it yet) would be to use really low level LODs and eliminate altogether objects that don’t take much screen space. Unless the user is really close to the object, and the reflection is really good (i.e., the object is close to a mirror, and doesn’t have other things blended to interfere with the reflection), then the cube map can be pretty low-res and inaccurate, and still look convincing. This means that you can render a lot less geometry than you’d do when rendering the scene for display. You may even remove some rendering effects or passes, like detail textures. This could result in a pretty nice speedup.

This is what I have been doing for my thesis for my degree. I have managed to increase the speed by up to 600% over normal techniques. The only problem is that it can use a lot of memory. Basically, render all the static objects for the cube maps, but save the colour and depth buffers using the buffer region extention, one for each face. Then you restore the buffers just before you render the dynamic objects, and then generate the cube maps. The quality of the reflection is actually better too, because you can render inter-reflections for the static parts of the scene. One drawback is that the cube mapped object can’t translate a relatively large amount.

I think this method is better than using the pixel buffer because it can be set so the region regions use main memory instead of video memory at almost no extra cost.

The technique gives the best performance increase when there is a high ratio of static to dynamic polygons in highly complex scenes. Lower ratios of static to dynamic polygons do not work anywhere near as well, but you should still expect atleast a 160% increase in performance. My computer is only a Pentium 200 with the PCI version for the geforce 2 MX, so the higher bandwidth of an AGP port should increase the performance gain even further.

Certainly openGl has no explicit concept of a scene, but it does have a concept of a render primitive (Tris, points, strips, fans, etc). It would seem to make sense that somehow (using gl and/or custom shader code) one could submit the render primitive once with multiple rendering contexts/targets specified.

If this is just crazy talk, which it seem to be, then what is the best way, at the GL level (with extensions) or lower (shaders), to set up and organize my data? Note I’m am not talking about application level culling and LOD techniques but GL and lower level. VARS & Fences? Vertex buffer locking/unlocking extensions? P-Buffers? Stencil buffers?

BTW What I am trying to do is highly realistic water with reflection and refraction and a ton of dynamic objects on the water. If you take a look at the following excellent demo, this is what I am trying to achieve:
http://cgi3.tky.3web.ne.jp/~tkano/tlwater.shtml

First, do some gross fustrum culling for all of the 7 different renders.

Second, yes, Vertex_Array_range and fences. glDrawRangeElements too. Triangle strips are good too. If you’re looking for performance, these are the extensions you are looking for.

Originally posted by Korval:
[b]First, do some gross fustrum culling for all of the 7 different renders.

Second, yes, Vertex_Array_range and fences. glDrawRangeElements too. Triangle strips are good too. If you’re looking for performance, these are the extensions you are looking for.[/b]

I am certainly using these techniques. The question is how can I use these techniques more effectively. Here is my rendering pipeline:

foreach object render:

	glInterleavedArrays(GL_T2F_N3F_V3F, 0, this-&gt;mesh.verts);
	glEnable(GL_TEXTURE_2D);
	glColor4f(1.0f, 1.0f, 1.0f, 1.0f);

	uint32 mat = 0;

mat);

	for(int i = 0; i &lt; this-&gt;mesh.numElements; i++)
	{

		if(mat != materialBlock[this-&gt;mesh.meshElemArray[i]-&gt;texIdx].ui32[0])
		{
			mat = materialBlock[this-&gt;mesh.meshElemArray[i]-&gt;texIdx].ui32[0];
			glBindTexture(GL_TEXTURE_2D, mat);
		}


		glDrawElements(GL_TRIANGLE_STRIP, this-&gt;mesh.meshElemArray[i]-&gt;numVerts, 
			GL_UNSIGNED_INT, this-&gt;mesh.meshElemArray[i]-&gt;vindices);		
		
	}

About as simple and reasonably fast as it gets for generic/vanilla gl (of course the static objects are compiled into a display list).

To get the cubemaps I go through this render cycle 6 additional times using GL_TEXTURE_CUBE_MAP_ARB and glCopyTexSubImage2D.

Now ignoring ideas of culling, LOD, static vs. dynamic geometry, or any other application level optimization. What if anything can be done to improve on this GL code for the specific purpose of creating dynamic cube maps?

If, like in that demo, only the water is reflective, and not the objects, then I don’t think that you even need a full cube map. The water never reflects things below it, and you may even be able to use just one reflection image (depending on the possible view angles).

Originally posted by ET3D:
If, like in that demo, only the water is reflective, and not the objects, then I don’t think that you even need a full cube map. The water never reflects things below it, and you may even be able to use just one reflection image (depending on the possible view angles).

I’m not sure that I completely follow the logic that having the water (or any single object) as the only reflective surface means that I don’t need a full cube map.

Now I might be able to get away with only doing 5 dynamic reflection maps plus one static map for doing the refraction texture (everything thats underwater). This is not the precise problem I’m asking for help on.

If you want to take into context that demo I pointed out above with the problem I am having, then my question would be to this group: How would I implement the exact same effect using GL + Nvida extensions? My application has roughly the same amount of geometry, but runs at 1/6th of the framerate trying to get the exact same effect. Is it because the demo uses D3D??? I doubt it… And I doubt the guy who wrote the application is using any “fancy” culling techniques or assumptions about what is being or not being reflected/refracted… Though he may not be using cube maps at all…

I don’t think you will be able to get a good reflect effect of objects floating on water, because cube mapping does not work very well with objects that are relatively close to the cube mapped surface. Also, even if the water is rippled it will still be essentially flat and therefore the cube mapped image will again not work very well. Cube maps generally work much better on surfaces with a higher degree of curvature.

I suggest you look at the waves demo written by nVidia. I think they do it by generating a single texture map and adjusting the texture co-ordinates depending on the waters surface normals. I don’t know if custom shaders can be written to speed this up.

micahp, here’s an explanation of what I meant. It’s not “having a single object”, but “having an object that reflects in a limited angle range”, like the water. The cube map faces correspond to different angles of reflection. If some of these angles can never be seen, then there is no reason to generate that face of the map. For example, if a half sphere is standing on a table, it will not reflect the table itself, and there’s no need to draw the bottom face of the cube map. Depending on the shape of the reflector and where it is, you could possible limit reflection to one face of the cube map. This might be a nice way to introduce some reflection effects in a scene without them costing a lot.

I haven’t really looked in depth about optimizing real time cube maps. The best thing you can do is reduce the size of the cube map renderings significantly. This speeds up the process lots.

As for PBuffers, I think they’re actually slower, as you have to make calls to change the gl context.

[PLUG]
real time dynamic cube map demo on www.nutty.org :slight_smile:
[/PLUG]

Nutty

Actually, a half sphere on a table would require the whole cube map to get proper reflections. Think about looking at it from directly above. At the edges of the half sphere, the reflections will be straight down into the table.

Maybe a better example would be a mirror or piece of aluminum foil lying on the table.

j

Originally posted by Nutty:
[b]I haven’t really looked in depth about optimizing real time cube maps. The best thing you can do is reduce the size of the cube map renderings significantly. This speeds up the process lots.

As for PBuffers, I think they’re actually slower, as you have to make calls to change the gl context.

[PLUG]
real time dynamic cube map demo on www.nutty.org
[/PLUG]

Nutty[/b]

Actually I based my original coding of the dynamic cube map from your example! Excellent, keep it up!!!

For all you contributors to my enquiries I thank you all very much.

I think I have an idea on how to get most of the effect I am looking for with out dynamic cube maps. Rudi_P has a very good point:

“I don’t think you will be able to get a good reflect effect of objects floating on water, because cube mapping does not work very well with objects that are relatively close to the cube mapped surface. Also, even if the water is rippled it will still be essentially flat and therefore the cube mapped image will again not work very well. Cube maps generally work much better on surfaces with a higher degree of curvature.”

So the idea is to use a traditional static cube map for the environment and then use a mixture of dynamic & pregrenerated static reflection maps and project them onto the water like shadow maps, with some clever distortion of the UV’s. Plus a static refraction map for everything under the water. Good old traditional game programmer slight of hand! :slight_smile:

Originally posted by j:
Actually, a half sphere on a table would require the whole cube map to get proper reflections.

Oops I guess I can always blame senility

i know its a stupid question, but do u use the newest drivers? its simply cause nuttys demo works smoth here with more than enough frames with 12.00 but with 6.50… the latest officials… i have less than one fps ( after disabling the automipmap )

Yes I always use the latest beta drivers. I think I developed that one with 11.01.

Very odd that it runs so slow with 6.odd drivers. Hardware cube mapping support has been in those drivers for ages.

My home pc is awaiting a new power supply… so I haven’t really time to test anything out. Or do any of them 3D texture tests people were on about in another thread.

Nutty

AHA! I know what it is. glCopyTexSubImage. I remember Cass anouncing it had been severely optimized by Matt in the beta detonators. 6.?? drivers will be using the old slow version.

Nutty