Loading textures without GPU clocks?

I am looking for a method for loading resources asynchronously. Many posts direct me to this page Opengl PBO . for CPUs, this method is a DMA method since it does not take CPU clocks. But the author did not clarify what is the situation on the GPU. Does it need the GPU to copy the data from main memory to video memory? or the GPU has its own DMA controller, so that the GPU can also be free of these loading operations?
If the GPU does have its own DMA controller, I think this is a perfect method. Is there any method better?

Thanks in advance!

What you are asking for is not possible on most of the graphics cards.

Downloads and uploads still involve GPU context switch and cannot be done in parallel with the GPU processing or drawing. Except on NV Quadros. The technology is known ad “dual copy engine”.

I think it is possible on Fermi architecture generally, but for “unknown” reason it is not enabled in the drivers.

PBO is one method if you want to do it asyncronously. How it is implemented doesn’t concern OpenGL.

Another method is to use another GL context.
Some people create another window inside another thread and they setup the same pixelformat as the first window. Then they share GL resources. They use that second GL context to upload textures and other resources which would be available to the first GL context.

Does it need the GPU to copy the data from main memory to video memory? or the GPU has its own DMA controller, so that the GPU can also be free of these loading operations?

What does it matter? If it does need the GPU, then there’s nothing you can do to change or get around that.

It does matter, since loading can be done in parallel with rendering, but … it is not possible/allowed on the most graphics cards now.

PBO enables asynchronous data transfer from “drivers memory” to graphics card. If CPU can do something better in the meantime, PBO can save some time for that. But, it doesn’t save GPU time. It is “CPU asynchronous”, but not “GPU asynchronous”.

A programmer doesn’t have to know the actual implementation, but it does concern OpenGL implementation (i.e. drivers’ implementation).

Shared context-group is possible method to isolate loading in a separate thread and upload immediately when the resource is ready in the main memory, but it imposes very fine synchronization using sync-objects. It requires GL 3.2+, and is not very easy for the beginners. Since two or more contexts cannot execute GL simultaneously, it is better to prepare resource in a separate thread, but still upload to GL in the single (drawing) thread. In this way the synchronization is much easier and the main thread have a control over the amount of data being uploaded (and correlated with the time needed for the drawing itself).

It seems that this “dual copy engine” has been disabled in the OpenGL driver for the geforce. But it is available for CUDA. I will try to use CUDA to transfer data. This feature is useful when rendering massive data.

Thanks for all of you!

May I ask you how?

May I ask you how? [/QUOTE]

Sorry, Aleksandar, actually, i do not know how.

I read this idea here http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=285231

“In CUDA, parallel transfer and kernel execution is possible http://outerra.com/images/cuda_transfers.png (red is for kernel and green/grey for download). The transfer can be upload or download it doesn’t matter.” it is from l_hrabcak.

I have not tried CUDA yet. and kind of busy these days. I will try later.

thanks for your help.