They donât use Virtual Memory and paging because that would make them SLOW.
A common misconception. Virtual memory wouldnât make them slow. Indeed, some (Carmack, but I donât like appeals to authority) suggest that it would make them faster. And itâs not an unreasonable suggestion.
If youâre not using the top mip level of a texture, itâs wasting space when itâs on the GPU. Having a virtual memory system would allow the GPU to only have the texture data that are needed. Which means youâre now more likely to find your texture to be resident now. Which means you can use more textures without a significant performance penalty.
The upload would be somewhat slow, and the GPU would have to wait for it to complete before continuing, but it wouldnât be so bad. Certainly compared to the alternatives.
What we have now is a thrashing system. If you try to use more textures in a frame consistently than you have memory for, you thrash a whole bunch, thus hurting performance. Plus, the CPU (driver) has to get involved, so youâre hurting CPU performance too.
Now, the current plain memory approach is faster for the best-case: all textures in use are resident. But if you dare use more textures than you have memory for, youâre immediately screwed far more than virtual memory would hurt you.
So, in effect, youâre looking at a situation as something like this:
Plain memory, no thrash: 100%.
Cache memory, no thrash: 99.9% (the VM cost when not paging in is virtually negligable).
Plain memory, thrash: 25%.
Cache memory, thrash: 75% (bringing in less data and not involving the CPU. Much.).
And do note that, because the cache case stores less data per texture (only what is needed), itâs going to thrash much later (in terms of number of textures) than the plain memory case.
The virtual memory approach gives you the developer the right to break the cardinal rule of GPU development: never use more textures than you have memory for.
dual core? - how many âcoresâ are on todays GPUâs?
1
To be fair, all the parallelism and so forth in a GPU doesnât stop it from still rendering one triangle at a time. And, thereâs no concurrency because GPUs are specifically designed to make race conditions impossible. Thatâs the primary reason you canât read from a pixel.
A true core must be a completely independent processing unit. And GPUs only have one independent processing unit: the GPU. It may have many different pieces, some in parallel with one another. But thatâs not much different from the pipelines in modern CPUs.
While I agree with your general ideal that âmlbâ is being rather unfair in his exaggerated portrayal of the graphics card industry, youâre simply doing the same thing, but from the other direction.