Post Transform Cache
The Post Transform Cache (sometimes called the "post-T&L cache") is a hardware feature that modern GPUs have to improve rendering performance. It is part of the rendering pipeline. It is a memory buffer containing vertex data that has passed through the vertex processing stage, but has not yet been converted into primitives.
Vertex processing with vertex shaders is a very strict process. A set of vertex attributes enter the vertex shader, and a set of post-transformed data comes out. The output of this stage is based solely on the inputs. Therefore, if you can detect that you have received the same vertex attribute inputs, you do not have to do vertex processing at all. Instead, if the outputs for that input attribute set is in the cache, the cached data can be used. This saves vertex processing.
In the absolute best case, you never have to process the same vertex more than once.
The key for testing whether a vertex has been processed before is the index of that vertex. Therefore, when doing non-indexed rendering, you have no access to the post transform cache.
If the index for the vertex is in the post transform cache, then that vertex data is not even read from the stream again. It skips the entire read and vertex processing steps, and simply adds another copy of that vertex's post-transform data to the output stream for primitive assembly.
As with any memory buffer, there is a maximum size to the cache. In the early days of post transform caches, when they used fixed-function pipelines and not generic vertex attributes, the size of the cache was measured in the number of vertices it could store. In current days, since the format of a vertex is generic and variable, the caches are more traditional memory buffers. Thus the number of vertices that can be stored in the post transform cache nowadays depends on how many outputs you write from your vertex shader.
Even so, you can expect vertex shader-based hardware to allow for a fairly large number of vertices, on the order of 20+ at least.
The size of the post transform cache can have an affect on how you optimize your triangles. If you optimize your mesh for a large number of vertices in the cache, this mesh may get poor post transform cache behavior if the cache cannot contain as many vertices. Some optimization algorithms do not care about vertex cache size at all, however.
Using the cache
As long as you do indexed vertex rendering, you will have some chance of using it. However, there are strategies one can employ to maximize the number of cache hits one gets when rendering a mesh. These strategies are primarily for optimizing the vertex index list, though some of them can suggest changes to the vertex attribute data as well.
This is a small library that NVIDIA developed quite a while ago. It takes a list of triangles to define the topology (just the indices of the vertex) and returns either a large triangle strip or a set of triangle strips.
The library does have some problems. It cannot handle a set of triangles where more than 2 triangles share the same edge. Even though it is an NVIDIA library, it can work just fine for non-NVIDIA hardware, as the functions take a parameter specifying the size of the post transform cache (in number of vertices).
This algorithm, developed by Tom Forsyth is a more modern algorithm. Unlike NVTriStrip, it creates an ordered triangle list, not a strip. Thus, the index data may be larger than for a triangle strip.
Unlike most other algorithms, it does not care about the size of the cache. It is generally useful, able to get quite good performance in both small and large cache situations for most arbitrary meshes.
Details can be found here.
A regular grid is, topologically, a regular grid of vertices with triangles between adjacent vertices. Note that this is only topologically speaking; the actual positions of the vertices can be anywhere.
Optimizing a regular grid for a vertex cache is somewhat easier than for a regular mesh. Details for an algorithm to do this are found here.