Ptex extension

Pls, develop an extension to work with Ptex textures.
We really need that for CAD, tools, etc…

PTex is a library developed by Disney to map a texture to each face in the model, getting awesome texture detail. The only main problem is that this is performed only by software. We need a way to load Ptex textures efficiently into the GPU VRAM and to perform the proper filtering across faces.

Thanks.

And while they’re at it, give us glEnable(GL_REYES). Actually, I take that back: just give us GL_ARB_Photorealistic_Renderman.

If you’re going to ask for completely unrealistic things, why go for the little stuff?

Did you even look up how ptex is? Did you read academic paper that is it’s source? Your response is beyond rude.

At any rate looking over the ptex paper (casually) I see the following:

  • [li] It looks quite feasible to make an OpenCL and/or CUDA renderer that does ptex. As a side comment there are (google for them) CUDE/OpenCL renderers using raycasting that get something like 60fps on Geforce2xx class hardware. Though I have to admit that quite likely nowadays most renderfarms use OpenCL/CUDA to make bits faster.[*] Looking over the paper, it could be feasible to implement ptex using GL4 and tessellation. The sticky issues of the edges has me nervous of saying “it is guaranteed to be possible”, but I see a potential path…plenty of details missing though. I am not saying subpixel quads, but the idea has potential. The method itself to me looks quite feasible to be hardware accelerated, though saying drawing quads in place of triangles might be a touch heretical to some :smiley: . The custom filtering is encouraged in some ways nowadays in GL4 since there is textureGather, textureLOD, etc.

Mind, we are not going for ultra-ultra tessellated models in interactive bits, but I can see it as feasible to do ptex: it parallelizes well, it uses local memory access. Additionally the academic paper pointed out that using ptex reduced the I/O and CPU load on their renderfarm. That alone says something too.

Isn’t Ptex just a form of texture streaming? As I understood it, it circumvents complex UV parameterization by simply assigning each quad (patch?) its own texturespace. All per-patch-textures together are somehow stored in a gigantic out-of-core representation.

Additionally the academic paper pointed out that using ptex reduced the I/O and CPU load on their renderfarm. That alone says something too.

I think this is due to the fact that they only load what they really need. Patches seem to offer a fine-enough granularity to get along with a fairly small working set in memory.

I agree with Alfonse, this technique has nothing to do in GL as-is. But maybe we could wish for some better support of streaming textures, like incomplete mipmap-pyramids or some kind of feedback which parts of a texture were referenced by texturing in the last frame…

“incomplete mipmap-pyramids” ? You do know that you can cut off the mipmap pyramid at some level, do you?

Just use glTexParameteri with GL_TEXTURE_BASE_LEVEL and GL_TEXTURE_MAX_LEVEL.

Other than that, I do agree that some kind of help from the GL to better be able to implement streaming would be nice.

Jan.

“incomplete mipmap-pyramids” ? You do know that you can cut off the mipmap pyramid at some level, do you?
Just use glTexParameteri with GL_TEXTURE_BASE_LEVEL and GL_TEXTURE_MAX_LEVEL.

Yep, but GL_TEXTURE_BASE_LEVEL gives you no guarantees on memory usage. How do you know that GL won’t allocate the memory for the whole pyramid once you do the first glTexImage() call on the texture object? Also, GL won’t free the memory for miplevels<GL_TEXTURE_BASE_LEVEL.
Texture streaming is not mainly about successively loading textures, but to keep the memory usage minimal.

Well, if you see it like that… OpenGL NEVER gave any reliable guarantees about memory usage. We can’t even query the amount of available memory of a GPU through GL, because we are not supposed to rely on such info to configure our algorithms.

So in that regard you either use GL_TEXTURE_BASE_LEVEL and GL_TEXTURE_MAX_LEVEL and hope for the IHVs to optimize that case (or ask them) or well, you do something where you have more control personally (like mega-texturing). But yes, in the end you are correct, OpenGL does not provide us with many options to be implement such things easily and with strong guarantees.

Jan.

http://www.disneyanimation.com/library/ptex/

I had seen that PTex stuff a year ago.
I don’t know anything about streaming textures.
It seems to be about giving each polygon its own texture. I guess the surface are parametrized (bezier surface or NURB or catmull-clark surface) so perhaps it is 1 texture per patch, not 1 texture per polygon.
This avoids texture mapping work. It allows the artist to paint on the surface.

But 1 texture per polygon (or parametric patch) is ridiculously too much.

There are other algorithms that do automatic mapping of a single texture onto an entire mesh model. Then the artist paints on the model.
Perhaps map 3 or a few textures onto that model.

On top of that, Ptex is patented:

http://www.wipo.int/pctdb/en/wo.jsp?WO=2009012464

That would cause trouble for the Mesa implementation:

http://www.phoronix.com/scan.php?page=news_item&px=NzU3Nw

But 1 texture per polygon (or parametric patch) is ridiculously too much.

One can view ptex as “one texture” per patch and a way to handle when the texture values on the edges. As for the streaming texture thing, that is being done now already (though we are still talking traditional UV texture mapping there)

But saying “1 texture per patch” makes one go down the wrong road of thinking that a texture is just a 2D image where as Ptex just says that a texture is image data to be applied to a mesh. Ptex was done in core CPU, parallelized well and reduced IO (compared to standard UV texture mapping) suggests that thinking of texture as only a mipmap pyramid of 2D images is not the best way to go… for hardware acceleration what one wants/needs to make it feasible are good local memory access (ptex has this) and highly parallelizable (again ptex has this). The current model of UV mapping is painful, ask most modelers this and they hope for a tool (one colleague likes “BodyPaint”) which generates UV mapping into one common 2D image, but it all smells like trying to force one to use UV mapping when there might be better ways to get texturing done. Doing automatic UV generation is hard and usually requires human intervention (atleast that is what the adherents of Ptex will say).

What it is not is traditional texture mapping since we are not talking simple UV’s anymore, but I’d imagine the main criteria (parrell, local memory and about drawing 3D stuff) are there and it does fit with doing 3D graphics of course. As of now fitting it into the current “draw triangles model” of GL is not so simple, but my knee jerk thoughts are still the same, I don’t view this as out in left field. It seems doable in future (or maybe even current) hardware. The trick is to have a different view into the image data applied to polygons and perhaps to lessen the grip on the tyranny of triangles. Indeed, in a large number of places, the thinking is quads, not triangles (and to get them to render every quad then becomes 2 triangles, but interpolation then is potentially junked or one needs to make a geometry shader).

As for the patent stuff, plenty of bits of GL4 are patented: floating point render targets, tessellation and more (see https://www.khronos.org/files/ip-disclosures/opengl/ ).

Well, we really need a few things to accelerate Ptex:

  1. Image array support allowing different image sizes.
  2. A texture filtering method using face adjacency.
  3. A texture fetching based on a faceID instead of UVs ( which will index the image array ).
  4. Improved GPU<->CPU transfers( not a problem for Fusion APUs).

Just some thoughts:

  1. A texture filtering method using face adjacency.

This is likely the biggest hurdle. The trick is that when one makes the patch, one also sends along the edge data so that a fragment shader can do the right thing. This is the part I do not see how to do easily or efficiently with the current GL API. For the filtering parts, GL4 has textureLOD, textureGather, etc to let one do custom filtering more efficiently, but the “across” the edge bit has me nervous.

  1. Image array support allowing different image sizes.

The second part of major pain in current GL. The way out of it is to tile the images into a common image and then do the filtering one self. Pain comes into play to place those tiles well into a big rectangle, etc.

  1. A texture fetching based on a faceID instead of UVs ( which will index the image array ).

In GL3 and GL4 primitives are given a primitiveID, so that part of the battle is already in place. The ugly-fugly is to use the faceID to get to the correct image data… the typical way it is done down is dependent texture look up: faceID–> location in large 2D texture –> create UV’s to that sub rectangle.

  1. Improved GPU<->CPU transfers( not a problem for Fusion APUs).

This area is the core pain in doing texture streaming. However, one can find working examples of texture streaming for not so new hardware already.

With this in mind, it appears to me that the hardest, nastiest part in getting Ptex to work for todays hardware is the filtering on edges and that comes down to somehow transmitting the edge data and seeing that within a fragment shader correctly.

No, but what was the peak CPU mem B/W vs. GPU mem B/W ratio again? :frowning: On integrated GPU, you pay the piper repeatedly and ceaslessly, whereas with add-on GPU you pay once up-front (the cost of which can be amortized over multiple frames) and then you’re done. …and then there’s the difference in shader core count. Bottom line, when talking perf, Fusion/Sandy Bridge isn’t a good example to hold up. But low power? Sure.

Also, with today’s mechanisms, streaming of texture data isn’t hard. What’s really ugly is the huge expense of dynamically allocating/reallocating GPU memory to upload that texture data to. It’d be hugely beneficial for streaming purposes to speed that up so it’s practical.

Driver guys will have to chime in here. What do “they” need to make this “much” more efficient (API changes, abstraction changes, etc.), to where it could be done dynamically at run-time with consistent frame rates, rather than on startup?

Re dynamic allocation, it’d also be nice to have a mechanism to force the allocation to occur, rather than the current form where (under some circumstances) the driver seems to wait until you actually issue a render with the texture bound to make it get the texture allocated and ready on the GPU.

I’m not doing any texture streaming, but aren’t you suppose preallocate all your textures (glTexImage2D, glTexImage3D)? You just leave the pixels undefined until you need it.

Right. That’s what you have to do. But after you’ve done so, those preallocated textures have both:

  1. [li] a fixed internal format, and[*] a fixed resolution (WxH + MIP chain)

Life would be so much simpler if the memory could be retasked cheaply (resized, retyped). Then the need for the domain-specific preallocation hacks which can still fail just goes away.

I agree with Dark Photon. The texture allocation is painful in OpenGL. There is something in NVIDIA CUDA that we can all admire. There is low level memory allocation API in GPU space in CUDA. Something like malloc/free. Then you can create a texture out of this data. The texture is more or less a definition how the driver should interpret the data (width/height/format/…). So you can allocate one large memory block and create many small textures in there. This approach is more low level then OpenGL but you have the situation under better control.

Having this possibility in OpenGL shader would be fantastic.

About the texture dynamic alloctation… perhaps we won’t need it for PTex ( although would be fantastic to have it ).

We could allocate a BIG texture and “atlas” the PTex faces.
Then, we just need an optimized reactangle dynamic insertion like ightmap packers do:

http://www.blackpawn.com/texts/lightmaps/default.html

Probably what Megatexture does.

Yeah, but then there’s the loss of HW-accelerated filtering (MIP/linear, aniso, wrap mode, etc.). Gonna roll your own, or just forgo it?

You can use HW filtering if you add a border around each subtexture with pixels copied from topologicaly adjacent subtextures. This has some drawbacks:

  • subtexture bleeding at low mipmaps
  • not quite correct at verts shared by more/less than 4 subtextures, though good enough in most cases
  • not perfect at edges shared by adjacent subtextures with different resolutions

Implementing ptex in hardware would be hard. Implementing it with regular shaders will be very slow. Maybe there can be a low-level functionality that can facilitate ptex, as well as other filtering/wrap modes, including those that are currently implemented with fixed function(anisotropic filtering, cubemap wrapping, etc). Like using a filtering shader to determine the number, weights, positions and miplevels of the texture samples.

Now that the AMD Radeon 7XXX support Partial Resident Texture I wonder how they are solving the filtering across edges problem…

http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review/6

And also wondering if there would be an OpenGL extension for that…