I have learned that for a CPU it is best to pack data into less, bigger blocks of memory which will be operated on by continuous repetitive routines which change as rarely as possible. Ideally, you should put all the data which has to be processed with the same routine into one continuous buffer, work through it, and start on another buffer which requires a different routine. This works well because The CPU is designed with few but powerful cores, which are capable of tackling any task fast but usually do not have to deal with that much different data.

But since a GPU is designed with hundreds of small cores, data should rather be split up as much as possible right? What are the paths i could take to get the best of the graphics card?