I have to point out that this two step method is unusable on AMD cards because the glCopyBufferSubData function is very slow in their current OpenGL implementation.

The best way how to deal with read-backs and buffers in general on AMD is to use AMD_pinned_memory extension.
http://www.opengl.org/registry/specs...ned_memory.txt