Further separation of sampler objects

mhagain · September 18, 2012, 8:09am

Right now in D3D I can do this:

// C++ code
context->PSSetShaderResources (0, 1, &texture);

// HLSL code
SamplerState sampler0 : register(s0);
SamplerState sampler1 : register(s1);

Texture2D tex0 : register(t0);

tex0.Sample (sampler0, ...);
tex0.Sample (sampler1, ...);

As in, bind the texture to the pipeline once and once only, taking up just a single texture unit, but sample it twice using different sampling parameters each time. This is useful for post-processing effects where you may have different “control” images in each colour channel of a single texture, and may wish one to be clamped but others to be wrapped, for example.

In GL I can’t; I need to bind the texture twice, to two separate texture units, owing to the way sampler objects use the same texture units as texture objects. This (1) is awkward, (2) burns an extra texture unit unnecessarily, and (3) not really reflective of the way hardware works.

The solution is simple - define a separate set of “sampler units” using an API similar to glVertexAttribBinding - let’s call it “glTextureSamplerBinding”. Two GLuint params, one for the texture unit and one for the sampler unit to “bind” it to (not too happy about use of the word “bind” here - I’d prefer “attach”, but precedent is set by glVertexAttribBinding so “bind” is at least consistent); a texture unit may be “bound” (attached) to more than one sampler unit, and the default is that each texture unit is “bound” (attached) to the same sampler unit as it’s number (i.e. texture unit 0 “bound” (attached) to sampler unit 0 and so on).

Alfonse_Reinheart · September 18, 2012, 8:54am

not really reflective of the way hardware works.

What evidence do you have that hardware works in this way?

mhagain · September 18, 2012, 10:09am

http://www.x.org/docs/AMD/R6xx_R7xx_3D.pdf

Page 14, section 3.6: “Texture Setup”. Enough?

kRogue · September 18, 2012, 11:29am

I have learned my lesson from about UBO’s… if mhaigan says hardware works that way, at the very least AMD hardware does. What I don’t follow is how the current GL API with bind a texture to a texture unit and set the uniform to what texture unit to sample from does not do exactly what is requested… i.e:


glActiveTexture(Foo);
glBindTexture(GL_TEXTURE_BAR, TexId);
.
.
.
glUniform(sampler0, Foo);
glUniform(sampler1, Foo);

Oh, wait me is dense… you are wanting to decouple the sampler from the texture unit… smack head… so ideally a “texture unit” does not specify how to sample, just what to sample… question is how to fit it into the current GL API… it is alreay inconsistent since it is called texture in C-code and sampler in GLSL… in a hacked way, there would be a new uniform call for samplers:


glSamplerUniform(GLuint location, GLuint textureUnit, GLuint samplerName);

where it says the sampler at the named location uses the named textureUnit, but filtered through the named sampler… or introduce new type to GLSL to get it right, but the naming will all be messed up.

Alfonse_Reinheart · September 18, 2012, 1:16pm

[QUOTE=mhagain;1242606]http://www.x.org/docs/AMD/R6xx_R7xx_3D.pdf

Page 14, section 3.6: “Texture Setup”. Enough?[/QUOTE]

OK, allow me to rephrase:

“What evidence do you have that all hardware works in this way?”

I’m not saying it doesn’t. But you’re not talking about an AMD extension here; you’re talking about an OpenGL feature, which needs to run on a variety of hardware, AMD, NVIDIA, and Intel alike.

Even if all relevant hardware (GL 3.x+) can support it, I’m just not seeing a burning need for exposing this ability of the hardware. So you save one or two texture units. You’ve got 16 per-stage to work with; it’s not like you’re going to run out. Is that really worth adding more ugliness and confusion to a part of the API that’s already incredibly ugly and confused.

Can you cite any performance advantages from this? Does each texture unit have a separate cache in some hardware, such that accessing the same texture through one unit won’t necessarily get cache hits from accesses from the other unit? This isn’t like ARB_vertex_attrib_binding, which likely offers real performance advantages. And that at least made the API prettier and less confusing (no more binding to GL_ARRAY_BUFFER followed by glVertexAttribPointer); I’m counting the days until I can rewrite the Vertex Specification page on the wiki to be in terms of vertex_attrib_binding (with an explanation after the fact of how glVertexAttribPointer works).

Also, this feature as described seems to be missing half the point of the D3D equivalent. Namely, the ability to use different sampling parameters on textures from within the shader. That is, if we’re going to have this feature, it ought to be done via shader logic, not external OpenGL calls. Just give shaders the ability to define samplers within them, then to use their pre-made samplers on textures arbitrarily.

mhagain · September 18, 2012, 2:28pm

D3D doesn’t have the ability to use different sampling parameters from within the shader; you’re confusing HLSL with the Effects framework. The Effects framework is just a wrapper, the shader compiler will pull out those sampler states and just make standard API calls behind the scenes. D3D provides an Effect state filter object that you can hook up to an Effect and see it happening; you can see it happening in PIX; it happens. Besides, and even if emulated through the API, the capability isn’t that great anyway. Look at GL_ARB_sampler_objects, issue 7:

Separating samplers and texture images in the shader allows for a run-time combinatorial explosion of sampler- texture pairs that would be difficult to count.

Anyone who’s ever written D3D code using the Effects framework can vouch for the truth of this. Beyond simplistic techdemos and tutorials it’s a nightmare to manage.

On to the evidence. Here’s the NVIDIA GTX280: GT200: Nvidia GeForce GTX 280 analysis | bit-tech.net

Each texture processing cluster inside the GT200 core has access to its own texture sampling unit, meaning there are ten texture samplers in a fully-functional GeForce GTX 280 graphics processing unit. These texture sampling units are able to address and also apply bilinear filtering to eight textures per clock; alternatively, each texture sampler can address and filter four 2:1 anisotropic filtered or four FP16 bilinear-filtered pixels per clock.

And here’s an NVIDIA patent discussing the difference between textures and samplers: http://www.freepatentsonline.com/7948495.html

Again, it’s clear that “textures” and “samplers” are different things. Which makes sense because a texture is just a big bunch of data, a sampler specifies how the texture is sampled.

Does it add ugliness and confusion to the API? I argue no. It provides a cleaner and more sensible separation of data from state.
Does it provide real performance advantages? Probably not, although it doesn’t trash prior state on that second texture unit I mentioned so that’s an (admittedly minor) advantage.
Does it fix problems with the API? Hell, yes. By not recognizing the difference between textures and samplers, by not recognizing that there may be different numbers of each, it’s possible to call glBindSampler (31, sampler) and possibly overflow a hardware limit.

aqnuep · September 18, 2012, 3:51pm

I would like to clarify a few misunderstanding:

The number of hardware texture sampling units on a particular GPU has nothing to do with the number of texture units in OpenGL terminology. In this case, even though the GF 280 has only 10 texture sampling units it doesn’t mean it has only 10 OpenGL texture units, it just defines the maximum texture sampling bandwidth of that GPU.
D3D10 in fact supports 128 bound textures and 16 bound samplers and you can mix-and-match them, it’s not an Effects framework specific thing. Both where hardware limitations of earlier GPU generations, I don’t think that latest generation of GPUs have any such limitations.

I think the fact that OpenGL doesn’t have separate state for textures and samples, but all are tightly coupled into a so called “texture unit” has historical reasons. First, before D3D10 class hardware the texture and sampler state in fact was one common state. Why we don’t have it now I think has to do with backwards compatibility as while in core OpenGL the texture unit is really only a pair of a texture object and a sampler object which could be trivial to separate, however in compatibility OpenGL the texture unit also has the so called texture environment state that affects both texture state and sampler state so it is almost impossible to separate texture and sampler state in OpenGL if one wants to maintain backwards compatibility.

Alfonse_Reinheart · September 18, 2012, 6:00pm

Here’s the NVIDIA GTX280: http://www.bit-tech.net/hardware/gra...ture-review/11

I don’t see anything there that states that NVIDIA hardware has a division between hardware texture units and hardware sampler units. Indeed, it says quite the opposite: “Each texture processing cluster inside the GT200 core has access to its own texture sampling unit”. Each texture unit has its own sampler.

Just like OpenGL says.

Also, note what Aqnuep said.

Again, it’s clear that “textures” and “samplers” are different things. Which makes sense because a texture is just a big bunch of data, a sampler specifies how the texture is sampled.

Does it add ugliness and confusion to the API? I argue no. It provides a cleaner and more sensible separation of data from state.

But we already have that separation of data from state now. We have texture objects and we have sampler objects. You combine them into a texture unit.

What you’re proposing is creating another layer of indirection, where a “texture unit” has a texture object and a reference to a sampler that is bound to an independent set of “sampler unit” bind points. That doesn’t make the API less confusing.

Does it fix problems with the API? Hell, yes. By not recognizing the difference between textures and samplers, by not recognizing that there may be different numbers of each, it’s possible to call glBindSampler (31, sampler) and possibly overflow a hardware limit.

It wouldn’t help because you still need backwards compatibility with the way things are now. If you’re using ARB_vertex_attrib_binding as a reference, note that it specifically states that there are (at least) 16 formats and 16 buffer binding points. Thus, it can redefine glVertexAttribPointer entirely in terms of the new API.

Right now, texture units have textures and samplers. If you create this indirection, where the number of available sampler units can be less than the number of texture units, then you don’t have the ability to redefine glBindSampler in terms of the new API.

kRogue · September 19, 2012, 12:39am

Right now, texture units have textures and samplers. If you create this indirection, where the number of available sampler units can be less than the number of texture units, then you don’t have the ability to redefine glBindSampler in terms of the new API.

Um… not true at all… you can create a level of indirection as follows, tweaking the above I did:


/*
 Bind the samplerObject named samplerName to the samplerUnit, the value of
 samplerUnit must be in the range [0, numberSamplerUnits)
*/
glBindSampler(uint samplerUnit, uint samplerName);

/*
 Set the sampler uniform to use the texture at the named texture unit
 but to sample from the named sampler unit
*/
glUniformSampler(uint uniformLocation, uint textureUnit, uint samplerUnit);

There is a hitch in that since texture units implicitly consume a sampler unit the above is a little icky-ish in that sampler units is the number of hardware sampler units minus the number texture units reported by GL…

Alfonse_Reinheart · September 19, 2012, 7:55am

There is a hitch in that since texture units implicitly consume a sampler unit the above is a little icky-ish in that sampler units is the number of hardware sampler units minus the number texture units reported by GL…

Yeah, that “hitch” is what makes it not backwards compatible.

mhagain · September 19, 2012, 8:34am

At this stage you come across as though you’re just looking for excuses to be disagreeable.

Were VAOs backwards compatible with client-side arrays? Is glVertexAttribPointer backwards compatible with other glPointer calls? Do VBOs modify the meaning of all glPointer calls?

It doesn’t need to be backwards compatible, it’s modified behaviour, a simple glEnable is all that’s needed to tell the driver “OK, I’m using this modified behaviour now, so if I call glBindSampler (31, samplerNum) don’t assume that you need to fix up the world-of-crazy I’ve just fed you but give me a nice clean error instead”.

Alfonse_Reinheart · September 19, 2012, 9:45am

Were VAOs backwards compatible with client-side arrays? Is glVertexAttribPointer backwards compatible with other glPointer calls? Do VBOs modify the meaning of all glPointer calls?

These are all backwards compatible changes because old code still works; that’s what it means to be backwards compatible. Yes, VAOs do work with client-side arrays; the spec is quite clear on this. Texture storage is backwards compatible with the old glTexImage functions because those functions still exist. Creating a new way to do things is how you maintain backwards compatibility.

There’s a difference between user code doing something (which was impossible before the extension) that changes how a function works, and user code doing nothing at all which was perfectly legal before and is now broken. glBindSampler takes a texture unit. It ranges from 0 to GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS. That’s the specification. If you change what glBindSampler takes, such that it now binds to a different range of legal values, perfectly functioning code become broken.

That is the definition of backwards incompatible changes: when old code that worked before stops working.

It doesn’t need to be backwards compatible, it’s modified behaviour, a simple glEnable is all that’s needed to tell the driver “OK, I’m using this modified behaviour now, so if I call glBindSampler (31, samplerNum) don’t assume that you need to fix up the world-of-crazy I’ve just fed you but give me a nice clean error instead”.

So, you want a glEnable that switches what the parameter of glBindSampler means. And you consider this to not “add ugliness and confusion to the API”?

The reason why the gl*Pointer thing with VBOs is such a terrible API is because the very meaning of a function changes based on information not provided in that function’s signature (that, and the need to pretend an integer is a pointer). The reason why glActiveTexture/glBindTexture is confusing is because you’re using two different functions to do one simple thing: bind a texture to a texture unit. Again, there’s the non-local information: glBindTexture’s meaning changes based on the most recent glActiveTexture call. And now you’re proposing to take a simple, obvious command like glBindSampler and inflict the same API cruft upon it.

This sort of thinking is exactly how OpenGL got into the API hell it’s in now. glActiveTexture was added because it was the easiest, backwards-compatible solution to multitexturing. The whole GL_ARRAY_BUFFER thing was added because it was the easiest, backwards-compatible solution to VBOs. In both cases, they overloaded existing APIs, keying off of a switch from a new API, so that they didn’t have to create entirely new functions.

Isn’t that what your whole DSA crusade is about? Making is so that non-local information doesn’t affect how commands operate? For someone so gung-ho about wanting DSA everywhere, it’s interesting that you’re willing to deliberately inflict more of this kind of API on OpenGL.

mhagain · September 19, 2012, 3:37pm

I’m not saying I want a glEnable, I’m suggesting it as one possible approach that could solve this, admittedly in an ugly way, but it’s better than just being negative about everything. That’s what this is all about - identifying if this is a problem, identifying if it’s worth solving and talking around possible solutions. Suggesting something positive rather than constantly being first to jump in with bad vibes.

Alfonse_Reinheart · September 19, 2012, 5:34pm

I’m not saying I want a glEnable, I’m suggesting it as one possible approach that could solve this

… if you didn’t actually want it, why would you suggest it? It’s like you’re saying things but you never actually mean what you’re saying. If you suggest something, I’m going to take a wild leap and assume that you actually want what you’ve suggested and will respond accordingly.

That’s what this is all about - identifying if this is a problem, identifying if it’s worth solving and talking around possible solutions.

But I don’t believe it’s worth solving. Even if you were to find a way to implement it without sacrificing backwards compatibility or making the API make less sense, I don’t believe that the idea itself has any real merit. It’s exposing a possible hardware limitation that was always there but never seemed to bother any real applications before now.

This is a solution looking for a problem. An OpenGL suggestion is useful only if it solves a real problem for users. You have yet to demonstrate that this does. The absolute most it buys you is saving a single texture unit. Running out of texture units per-stage is hardly a pressing concern for OpenGL developers.

Every piece of hardware has its own idiosyncrasies. We shouldn’t be modifying OpenGL for the sole purpose of exposing some limitation on one hardware platform. AMD approved ARB_sampler_objects, just like everyone else; indeed, over half of the credited Contributors to the extension are from AMD. If they felt that it was an onerous burden and wanted to expose a secondary sampler limit, they could have changed it then to match their hardware better.

AMD seems to have tamed the “world-of-crazy” in their drivers, so what’s the problem?