DOOM3 texture unit use.

Interesting to note Carmack’s comments on the final use of various texture units in the DOOM3 shaders (for the ARBfp path):

!!ARBfp1.0 
OPTION ARB_precision_hint_fastest;

# texture 0 is the cube map
# texture 1 is the per-surface bump map
# texture 2 is the light falloff texture
# texture 3 is the light projection texture
# texture 4 is the per-surface diffuse map
# texture 5 is the per-surface specular map
# texture 6 is the specular lookup table

I won’t post the actual business end of the code, it’s there if you look.

P.S. as a complete aside, is this a possible disadvantage of ASCII based API interface calls? This could be dumped for glslang C like source code too, no dissasembly or reverse compilation required. I suppose it’s not exactly the crown jewels, but it is interesting to the curious.

It’s strange that he picked texture unit 4 for the diffuse map. Most hardware that “cheats” tends to assume that TMU0 is the primary texture.

Maybe this is Carmack’s way of getting hardware developers to stop doing stuff like that.

Depends how you define primary. AFAIK they’re all active when this shader is used and the images for the various per surface terms are the same resolution. Remembering that this shader gets split into two for other multipass implementations that may have set the order early at the application level and he merely ran with it for simplicity. Illumination & attenuation in the first 4 units, material properties in the later 3 would be the right split for a 4 texture unit multipass. (EDIT: dang, the bumpmap screws up that theory with specular.)

OTOH Given the dependency of other instructions on fetches from cubemaps & bump maps and the cache friendly nature of something like a cubemap fetch per fragment (the cubemap is only used to normalize the interpolated light direction vector) it may even make sense to keep them coherent (if indeed there’s an implementation preference).

Hmm, a lot of those textures seems to be purely lookup tables. It would make sense on R200 level hardware to use textures for that due to instruction set limits, but I wouldn’t be surprised if things would run faster and look better on R300 hardware if the normalizing cubemap and the specular lookup table was replaced with math. Maybe even the light falloff texture too.

There are two versions of the shaders, one uses math for some stuff, the other does dependent reads, but still has the texture unit comments.

Edit: this is a bit bogus, despite comments the code does most of the same fetches (it’s clearly been edited a few times with some code commented out, in one case there’s a math normalization commented out, in another it is implemented but this makes the shaders similar w.r.t. fetches if not dependent reads), but it was captured on NVIDIA 6800 hardware, I don’t know what happens on other hardware.

FYI: You don’t have to use a interception tool to get the shaders, they are in the data files.

Simply rename pak000.pk4 to pak000.zip and open it up with a zip program. The shaders for all code paths are in the directory “glprogs”.

For speed, nVidia suggests normalizing wildly changing vectors with math, due to texture cache misses, while using cubemaps for all smoothly varying vectors. I wonder if this is different with ATi hardware.

Anyway, I’m surprised there’s no explicit stage for an environment reflection cube, unless that doubles in the specular unit (simple 2D fudge). Don’t tell me there’s no reflection going on. They need one more unit to make an even 8 anyway!
:slight_smile:

Id is most neighborly to share.

Thanks for the pointer sqrt.

Q, there are other shaders for environment mapping. You cannot correctly include an environmental reflection term in a shader that is applied when stencil shadow testing is on. Sure you could ignore the texture based illumination terms but that stencil test is going to win in the end. :slight_smile:

Whew, that’s good news. I’d like to get a look at this stuff myself.

Depends how you define primary.
The diffuse texture; the one that, regardless of shading system, is virtually required in order to have decent graphics.

but I wouldn’t be surprised if things would run faster and look better on R300 hardware if the normalizing cubemap and the specular lookup table was replaced with math.
I would, to some degree.

Remember, you get 1 vector op and 1 texture op per cycle. If you can keep both busy, then you’re better off than being math-heavy. It’s almost impossible to use more texture accesses than math ops, so it’s best, performance wise, to move as much as possible into textures. Cache misses aside, of course.

Aside from that, I noticed that JC is using program parameters for all and everything. As far as I knew up today is, that updating those parameters is fairly expensive. So I usually (ab)used gl-state variables to upload my data (especially matrices). I did not benchmark it against using program parameters.

So my question is, is using program parameters now a recommended practice? Do I gain anything by using glstate to upload data into the vertex and fragment programs?

Originally posted by Korval:
Remember, you get 1 vector op and 1 texture op per cycle. If you can keep both busy, then you’re better off than being math-heavy. It’s almost impossible to use more texture accesses than math ops, so it’s best, performance wise, to move as much as possible into textures. Cache misses aside, of course.
Don’t forget filtering. You only get one bilinear filtered sample per cycle.

So my question is, is using program parameters now a recommended practice? Do I gain anything by using glstate to upload data into the vertex and fragment programs?
I don’t think that native glstate was ever any faster than program parameters.

Don’t forget filtering. You only get one bilinear filtered sample per cycle.
Good point. But, then again, you’re usually not trilinear/anisotropically filtering lookup tables.

I noticed that there are no self-shadowing in Doom 3? Why, annoying popping in cut scenes?

Originally posted by Korval:
Remember, you get 1 vector op and 1 texture op per cycle. If you can keep both busy, then you’re better off than being math-heavy. It’s almost impossible to use more texture accesses than math ops, so it’s best, performance wise, to move as much as possible into textures. Cache misses aside, of course.
Well, with that many textures active at the same time, and I assume fairly short shaders (guessing, haven’t bought the game yet, will do that tommorrow) I wouldn’t be surprised if the cost of accessing textures would be the largest performance defining factor.

Originally posted by skynet:
[b]Aside from that, I noticed that JC is using program parameters for all and everything. As far as I knew up today is, that updating those parameters is fairly expensive. So I usually (ab)used gl-state variables to upload my data (especially matrices). I did not benchmark it against using program parameters.

So my question is, is using program parameters now a recommended practice? Do I gain anything by using glstate to upload data into the vertex and fragment programs?[/b]
Constant updates can be expensive if you do it excessively, but there’s seldom any reason to be paranoid about it. The cost is usually comparably small and hidden by the cost of actually executing the shader, unless you have really small batches and draw few pixels. In any case, it’s certainly not recommended to abuse gl-state. It won’t speed anything up, you’re just specifying the constant another way. The GL-state is there for convenience, not for performance. Abusing it only leads to unreadable code.

Doom3…great to see per-pixel lighting in something more than a spinning cube demo at last.
Although I had the same feeling when I first played Chronicles of Riddick on the xbox months ago.
Dorbie, i’m a bit puzzled as to why you think the texture stage number is important in any way in these shaders…

Sunray, there is self shadowing in Doom3. The player doesn’t seem to cast a shadow, but that’s not a technical issue, objects and monsters self shadow in the technical sense, and Carmack’s reverse makes player shadow clipping a non issue. I’ll bet you dollars for donuts you can turn player shadow casting on via the console, but this is definitely not a technical limitation, it would cost a bit extra to draw, but it is either an artistic or performance compromise (probably artistic). It’s the sort of crap that get’s debated inside game companies (heck I worked for idiots who ripped working stencil shadows out of a game at the last minute to make XBOX look like PS2 so don’t be surprised by the outcome of ‘artistic decisions’).

Knackered, the texture useage is interesting and was the subject of debate a while back, although it was pretty much predicted, the actual numbers were just in the comment from the shader code, I wasn’t enumerating. Yup Riddich looked good, AKAIK it pretty much got the lighting & shadowing right with interesting shaders (a bit shiny though).

I’ll bet you dollars for donuts…

Better be Krispy Kreme. :smiley:

It seems I was right about getting a speed boost by changing the shader to use math instead of lookup tables, but I didn’t expect this much:
http://www.beyond3d.com/forum/viewtopic.php?t=14874

At best in game up to 40% faster with max AA/AF by just replacing the dependent texture read with POW, and in the timedemo about 18% faster. :eek: