Very interesting read:

Good to finally get high performance asynchronous data streaming to and from device memory.

But, the thing that annoys me is the note on page 14: "Having two separate threads running on a Quadro graphics card with the consumer NVIDIAŽ Fermi architecture or running on older generations of graphics cards the data transfers will be serialized resulting in a drop in performance."

Why in hell not enable it for consumer products (if, and i may be wrong here, the hardware feature is present on all high end Fermi chips)? Texture streaming is extremely important there to. I am working in a scientific visualization context and we do have access to Quadro Boards, but we can not afford these a lot cards for every workstation where we develop and demonstrate large volume and image rendering software. The Fermi Quadro boards currently are extremely expensive, so access to them is almost impossible to us.

The data transport to the GPU is almost always the main bottleneck for us, so the decision to cut this feature (next to quad buffer stereo) is very sad. And i can imagine that D3D, at least for some games, will make use of the extra copy engines... So D3D gets stereo rendering (ok, i know no QBS, but still) and the other cool GPU features.

Sorry, but i get mad at such decisions.