Saving vertices data in GPU and returning it to CPU when it's needed.

Cofyka · September 22, 2018, 1:10am

So my idea is when a model is created its vertices are uploaded to GPU assigned with VBO but vertices from CPU are deleted and index to every vertex is remembered, and when you need to make changes to a model you return a specific value from GPU and edit it. Is my system efficient? So instead of having duplicates on CPU and GPU, I can have them only on GPU and edit them from CPU whenever I need it, and if the communication between them is constant then I save data to CPU in order not to get it back every time I update it. If this is a good idea, what are the functions for returning vertices from VBO? BTW I’m using OpenGL es2.0 android.
[ATTACH=CONFIG]1853[/ATTACH]

Cofyka · September 23, 2018, 6:44am

Can someone help me? I’m pretty sure this isn’t something new, but I can’t find anything on the internet related to this question. Here is my system explained visually.
And the basic question is: Is my system efficient? Or is the data flow between CPU and GPU more expensive than having duplicates on CPU and GPU? If so, If I optimize the data flow by saving vertices on CPU if it’s going to be updated multiple times in a row, will that fix flow problem?

GClements · September 23, 2018, 10:39am

glGetBufferSubData() or glMapBuffer() (etc)

You’re out of luck. ES 2.0 buffer objects are write-only.

Dark_Photon · September 23, 2018, 6:15pm

No. Especially on mobile GPUs (e.g. Android).

So my idea is when a model is created its vertices are uploaded to GPU assigned with VBO but vertices from CPU are deleted and index to every vertex is remembered, …
when you need to make changes to a model you return a specific value from GPU and edit it.
Is my system efficient? …
I’m using OpenGL es2.0 android.

Your results are going to vary based on the GL driver and your use case, but in general no, this is not efficient.

The GPU pipeline is optimized for data to flow one direction: from the CPU to the GPU. Feeding data back from the GPU/driver to the CPU is likely to cause stalls.

However, you can get these stalls uploading from the CPU to the GPU too – especially on mobile GPUs! You need to be extremely careful of when the GPU or driver may still be using buffer objects or textures that you’ve provided it. If for instance you attempt to change the contents of a buffer object from the CPU (e.g. via glMapBuffer or glBufferSubData) while some command you’ve previously issued to the GPU is still in the pipeline (has not fully completed) that is using that buffer object, it is not uncommon for the OpenGL ES driver to block/suspend your CPU thread right there – aka stall – until such time as the GPU and driver are finished with all other commands in the pipeline that reference that buffer object. Performance-wise, that hurts! It’s tricky to use buffer objects efficiently on mobile for dynamic data updates from the CPU.

I would recommend you first implement your batch rendering using client arrays only. Client arrays are much easier to use on mobile efficiently. Benchmark any migration to buffer objects against this client arrays implementation. If it’s not faster, stay with client arrays until you figure out how to make the buffer object implementation faster.

A logical first content type to transition to buffer objects is static content. That is, content that you upload once from the CPU and then never change again for the rest of the run of your application process. This avoids the problems I’m talking about, and is somewhat of a no-brainer on mobile unless you have “a lot” of static data (as compared to the amount of available GPU memory, that is). Still, verify performance with this against just using client arrays for the static content.

Then if you decide to try your hand at using buffer objects for dynamic content (the content that changes at runtime, like you’re talking about), I would recommend you avoid updating the contents of any buffer object from the CPU until at least 2-3 frames after you issued the last GL command that instructed the GPU to read from that buffer object. And don’t read back the contents of buffer objects or textures if you’re trying to maximize performance! This is a pipelined system, and you want to keep the CPU working ahead of the GPU and the driver.

Again, compare the performance of using buffer objects here against client arrays (paying particular attention to frames where you update buffer object content from the CPU using, e.g., glBufferSubData or glMapBuffer). Use a profiler here to ensure that you don’t trigger pipeline stalls or flushes here. Using buffer objects efficiently for dynamic content on mobile is definitely tricky. That said, you can do it by round-robining across a set of 2-3 buffer objects, or by using unsynchronized glMapBufferRange (which you may not have) with some tricky magic to block only on vertex transform work completion, not the entire pipeline (aka GL_SYNC_GPU_COMMANDS_COMPLETE in glFenceSync). But this requires some careful thought, and in the latter case some extension support, since you’re on GLES2.

As always, check the OpenGL ES manual from your GPU vendor for specific recommendations on efficiency. And get to know your GPU profiler well so that you can visually see these bottlenecks clearly.

Cofyka · September 25, 2018, 1:00am

Thanks :).