"Better" Generation of Cubemap Mipmaps

Here is a suggestion for after GL3 is finished…

Given that the hardware doesn’t support filtering around the seams for cubemaps, and that borders are not really an option either, using mipmaps on a cubemap has technical issues with no true “clean” solution.

glGenerateMipmapEXT(GL_TEXTURE_CUBE_MAP);

Generates mipmaps on all the faces, but given no filtering around seams for cubemaps, per frame dynamically generating cubemap mipmaps are a mess to cleanup (manually filtering all the edges).

It would be nice to have GL support for generating cubemap mipmaps with filtered seams.

how’s Atom going?
I’m afraid I find the graphics way too abstract to appreciate the LOD and procedural techniques you’re touting as revolutionary. It looks a bit messy. Maybe if you used more distinct features it might reveal the sense of scale we’re supposed to be experiencing.

See PM…

:slight_smile:

Back to topic … but what about GL auto generation of seamless mipmaps?

I mean it seems like something ideal to not have to be done by the GL user (very inefficient). About the best way I could think of doing this (and did this) was to render all the new seams for all the mipmaps into a FBO using a special shader, and then a huge number of glCopyTexSubImage2D() calls to copy back the seams into all the cubemap mipmap levels (from the FBO).

I know, about 0% of OpenGL programmers are doing runtime generated mipmaped cubemaps, so at least this idea will be in the suggestion archives…

Hardware mipmap generation already exists for 2d textures, it would make sense to do it correctly for cubemaps too. Especially when dynamic cubemaps are generated, ie. to fake dynamic global illumination.

A hardware vendor extension would be great.

But … how’s Atom going?
I would like a PM too … Or you can post it here, no problem.

R600 and G80 supports filtering across mipmap edges (since it’s a requirement for DX10). For earlier hardware CubeMapGen is an excellent tool to filter the cubemap mips for you (won’t solve it for cubemap render targets though).

Are you saying that doing a bilinear lookup on the edge/corner of a cubemap in Direct3D 10 filters between texels on all 2/3 faces?

If so, and if there is a way to enable this in OpenGL then please post it here!

I can see this being “emulated” with shader operations (R600 uses hidden shader ops anyway for a cubemap lookup right?), but I was unaware of any direct hardware support.

Alright, since this is actually very OpenGL related…

Atom is going great, but development has slowed down since I started migrating code (full algorithms and parts of algorithms) from the CPU to the GPU (GPGPU stuff). While classic OpenGL stuff (pre SM4.0) has a somewhat obvious fast paths, SM4.0 opens up multiple ways (perhaps too many) to solve basic problems, and now it isn’t always obvious what method is going to be the best on the hardware.

Multiple possible OpenGL GPGPU pipeline examples,

  1. VERTEX SHADER ONLY. The advantage here is that you can make use of the input assembler hardware (fetching from attribute streams) and also now do texture lookups. Turn off the rasterization and simply use transform feedback to write into another VBO. However output is limited to up to 16 FP32 outputs per vertex. Great ability to expand data here without using the geometry shader. For example turn a single logical object (GL_POINT) into multiple objects for another vertex shader pass (simply write up to 16 FP32 outputs and have the next pass fetch from this interleaved data to create more GL_POINTS). Also VS VBO output can be mapped to a texture buffer object for integer indexed texture fetch in the next VS pass (instead of using the input assembler).

  2. VERTEX + GEOMETRY SHADER ONLY. Same idea as in (1.) except now can lookup VS results of adjacent vertices in the GS, and can write a variable amount of output data. From my experience, this path has not yielded good performance, perhaps from too much overhead in the GS pipe or more likely from the per vertex synchronous constraints that the GL pipe puts on the GPU thread scheduler.

  3. VERTEX + FRAGMENT SHADER. Obviously scatter with GL_POINTS, but now with many output options do to texture arrays and MRTs. The advantage of texture arrays in that you can attach layers of the texture array to different MRTs, and later in another VS or FS pass use the “r” coordinate to dynamically select a layer of a texture array. Another method here is to use stencil routing, multiple pixels, MRTs, or even separate color channels in the pixels (masking color channel write on multiple passes) are grouped into one logical “bin” with multiple “subbins”.

Anyway,

I’ve been doing a lot of GPGPU programming with method (1.) VS+transform feedback only. It is proving to be much faster even with many passes than using the geometry shader. I haven’t really seen many other people writing about this method for GPGPU or rasterization stuff, but it seems (and has been for me) tremendously useful…

For a project that describes itself as such:

Atom started with the idea to go back to PC gaming’s roots (low risk investment, experimenting with technology, fun timeless gameplay, taking a wild idea from concept to market), while taking advantage of the power of modern hardware. Atom in its final state will be a massively multi-player online 3D first person perspective game set in a atomic or microscopic cell like world where the players are small micro-organisms battling in a huge living host world.

You seem to be spending a lot of time dealing with graphics, hardware, and rendering performance issues, rather than, you know, the game itself. Which, as an MMO, is going to require lots of network coding, server-to-client stuff, etc. Rendering is going to be maybe 2-5% of your coding time.

SM4.0 opens up multiple ways (perhaps too many) to solve basic problems, and now it isn’t always obvious what method is going to be the best on the hardware.

That’s why I stopped caring about the minutiae. Maybe there’s method X that’s 10-15% faster than mine, but until I have a profiling run that says I need that 10-15%, I don’t see the point in bothering right now.

If you are thinking of rendering (and 10-15% improvements) here then you have completely missed the point of my reply.

The “GPGPU” paths are just that, migration of CPU limited GP computation to the GPU using efficient methods available through OpenGL 2.1.

Well I understand Korval’s doubts about porting MMORPG server code to GPU …

I guess it should be automatic if it’s supported on the R600 and G80. What’s it doing on your GPU?

G8x chipset with Linux GL drivers,

LEFT SIDE is from a middle level of mipmaped cubemap. Note the sharp seams between the faces of the cubemap (this is fisheye projection btw). This is also the results of the driver support glGenerateMipmapEXT() for cubemaps.

RIGHT SIDE shows results of averaging between the pixels on the seams. Which gives you an idea of what mip level is being used.

Now if the hardware was filtering across the faces you would see something completely different than both the left or right image … something that actually looked correct!

The right side is just a bandaid fixup for the seams. ATI’s CubeMapGen does a better job and feathers the blend more…

One other option would be to use a 1 pixel border on the faces of the cubemap at each mipmap level. Then copy in pixels from adjacent face seams into the border. Of course you would have to manually generate the mipmaps. The hardware would then filter the seams correctly even though it only takes into consideration one face when filtering. However the NVidia driver doesn’t support attaching a cubemap face with a border to a FBO, so you cannot do this dynamically (which is what I’m using).

The ideal solution in the case where the hardware doesn’t support filtering across the faces, would be for the ability to attach a cubemap face which had a border to a FBO, and have glGenerateMipmapEXT() correctly generate the mipmaps and correct for the seams (fill in border pixels) at the same time.

you could switch to nearest filtering and do your own interpolation in the shader, of course - but that would be much slower.

Yes. However, the corner case looked outright broken on G80 when I tested this a while ago (drivers may have fixed it since then, and in typical smooth cubemaps you probably wouldn’t noticed anyway).

I haven’t tested, but it ought to just work.

There’s direct hardware support for this. It’s quite a bit too much work to have this emulated in the shader.
Yes, the R600 has shuffled some work over to the ALU units for the cubemap lookup and projected textures, so a cubemap lookup will be a couple of ALU instructions too.

Unfortunately that would be a violation of the spec, even though it’s the desired behaviour in the vast majority of cases. Somewhat similar to point sprite clipping.

Unfortunately that would be a violation of the spec, even though it’s the desired behaviour in the vast majority of cases. Somewhat similar to point sprite clipping. [/QUOTE]

I have to agree, what if you are using the cube map for non-environment based rendering. eg some form of math lookups etc.

You would not want one platform blending these values while others don’t.

Hmm. Good point. I just tested and indeed, it’s not filtering across faces. I suppose there will need to be an extension for this functionality.

With texture borders, the filtering should work correctly. (The FOV at rendering has to be a little bit more than 90 degrees.)

Texture border would fix the problem, except that at least with the NVidia drivers you cannot attach a cubemap face which has a border to a framebuffer object to render into.

BTW, if someone did manage to get this working please post how. Perhaps this will get fixed on newer drivers…