Originally posted by mlb:
So there’s no first-level cache in the nv40?
There may well be two levels, but I wouldn’t call them L1 and L2. In any case, the lowest level texture cache has no measurable impact on the outside world. You just see the large one.
And the largest texture cache level on NV40 looks like it is 8k (thx for the fresh set of numbers). Twice the amount of NV3x, and the same amount as R300. The new/other/additional vs NV3x cache level, if present, must be smaller.
You could call the larger one L2 and the smaller one L1, probably. Or you could call the larger one L1 and the smaller one L0, or predecode buffer, or anything you fancy. Or (and I prefer that) you could just call the large one “the cache” and leave it at that.
A texture cache has an effect that can be measured. It increases performance for smaller data sets. You can verify its effects, and you can see performance gains in the real world.
Having an extra decode buffer farther away from memory has no such effect. It’s purely an engineering detail. It may affect die size, power, manufacturability, clock speeds and what-have-you. Who knows … and who really cares …
The fact of the matter is that ATI might have the same mechanism in their chips, to help simplify the texture sampler itself, to make broadcasting texture data all over the die easier, whatever, and they just don’t care to market it as a cache. We really can’t tell from the outside of these, to us, black boxes. If we can’t tell the difference, marketing shouldn’t imply that we can, IMO.
…
but what about the maximum speed-up? Is it really only 1.3
Yes, in this specific case it is. The left column is fillrate. As you can see this is a rather simplistic scenario, just perfectly isotropic single-texturing. Texture memory access, even when spilling the cache, should be perfectly coherent.
If you want more real-world impressions, you could play around with extreme negative LOD bias vs LOD bias 0.0 in some games, at modest resolutions (I suggest the UT series, as it’s easy to make this setting in the .ini).
Negative LOD bias, besides making textures look crappy in motion but oh-so-sharp on screenshots, kills coherency. The texture caches thrash all the time, which IMO reproduces the performance profile that you would get without any texture cache quite well.
I’m sure you’ll see much worse than a 30% drop. Be wary of app specific “optimizations” though …