
Originally Posted by
aqnuep
Alfonse is perfectly right. The internal swizzling/tiling used by the hardware is not something that you should forget about. This can change from vendor to vendor, and from GPU generation to GPU generation. Even if there would be any meaningful way to expose to the application this layout, the number of different layouts an application might have to handle would be impossible to tackle.
Also, considering that once the application knows the swizzling, the uploads to these swizzled structures would be non-trivial thus it would not even reach the best case scenario of a pure CPU memcpy. GPUs on the other hand have DMA engines or other ways to directly perform copies from linear to swizzled memory at full speed, without utilizing any CPU power, thus a memcpy to PBO plus a hardware upload is almost guaranteed to be faster despite the intermediate copy.