How to efficiently work with the GPU with OpenGL?

I have learned that for a CPU it is best to pack data into less, bigger blocks of memory which will be operated on by continuous repetitive routines which change as rarely as possible. Ideally, you should put all the data which has to be processed with the same routine into one continuous buffer, work through it, and start on another buffer which requires a different routine. This works well because The CPU is designed with few but powerful cores, which are capable of tackling any task fast but usually do not have to deal with that much different data.

But since a GPU is designed with hundreds of small cores, data should rather be split up as much as possible right? What are the paths i could take to get the best of the graphics card?

Generally true. Memory fetches which miss the cache and conditional branch mispredictions have a cost, and this helps to minimize that cost.

This works well because The CPU is designed with few but powerful cores, which are capable of tackling any task fast but usually do not have to deal with that much different data.

Not sure where you got that, but this statement is so general as to be pretty much useless.

But since a GPU is designed with hundreds of small cores, data should rather be split up as much as possible right? What are the paths i could take to get the best of the graphics card?

If you’re talking about GPU use via OpenGL (since you’re posting on the GL forums), the GPU driver generally handles the splitting and multitasking fairly well internally on its own. From your perspective, it’s a good idea to provide data to the GPU in a manor not unlike you’d provide to the CPU (e.g. “pack data into less, bigger blocks of memory”, “put all the data which has to be processed with the same routine into one continuous buffer, work through it, and start on another buffer”, etc.). And when you’re doing memory fetches inside of shaders, follow some of the same strategies you’d follow on the CPU: prefer sequential accesses to random accesses all over memory; if you’re going to be doing a random fetches across a region of memory, use a cache and techniques to reduce the amount of data fetched (e.g. MIPmaps, texture cache, …), etc.

The exception where you need to start thinking more specifically how your problem maps to all those GPU compute cores is when you’re writing compute shaders or very specialized GPU shaders.

Haha i can imagine that, i watched a youtube video on GPUs, and they were describing CPUs like that to compare them to GPUs.

Thanks for the information, so the best way to provide the GPU with data is to pile it up into as few as possible big batches and flush it to it all at once so the driver can immediately distribute the data accordingly?

Pretty much. And ensure all your communication is one-way (CPU->GPU, not GPU->CPU). In fact, in some cases it’s a really good idea to tell the GPU how to generate work for itself (GPU->GPU), just to keep the CPU out of it.

…and flush it to it all at once…

Generally, let the API flush it through the driver to the GPU for you, except in cases where you know you need to give it a push.

There are a few exceptions to this, but they are few.