[QUOTE=twippe;1289178]I just tried the orphaning method and it is worse now
…
I use opengl ES 2.0[/QUOTE]
I’m sorry. I missed seeing the ES2 mention the first time through. Which GPU(s) and GLES drivers?
My first response was primary geared toward feeding data via buffer objects to desktop GL, where the most common vendor GL drivers tend to support more buffer streaming capabilities than GLES.
On mobile it’s different. Buffer streaming capability is more limited (especially in ES2 drivers/GPUs, which are getting old), the drivers tend to be less compliant (partly due to the poor match of tile-based GPUs to GLES), and the consequences of feeding vertex data to GLES poorly via buffer objects can be much more severe than on desktop GL due to GPU architecture differences, blocking the draw thread for as much as 1-2 full frames when you “get it wrong”.
Your best bet here is to get very familiar with the OpenGL ES Programming Guide for your GPU and GLES driver. It should provide recommendations on how to get the best performance when updating buffer objects on their GLES implementation. If not, contact your GPU vendor or check their developer support forums for this information.
In the absence of this valuable vendor GLES driver info, just use client arrays for streaming vertex and index data to the GPU for starters. Particularly in ES2 drivers, the vendor has probably spent some time ensuring that streaming of vertex data through the API and to the GPU with client arrays is efficient. The only batch data I wouldn’t stream via client arrays for starters would of course be vertex data that is defined on app startup and doesn’t need to change at runtime. There you’d of course create and populate buffer objects for those on startup and then just use them at draw time (which shouldn’t result in any draw thread blocking – aka implicit synchronization)
If you do want to try your hand at some buffer object streaming without vendor driver guidance, here are some recommendations. First, the issue is this. Mobile pipelines are “very” deep (frames deep). This is necessary to minimize the RAM bandwidth needed for rasterization such that slow CPU DRAM can be used instead of fast VRAM common on discrete desktop GPUs. Consequently, the amount of time between 1) when a draw call referencing a buffer object has been submitted to the driver and 2) when the driver/GPU is actually finished reading from that buffer object can be a fairly long time compared to desktop GPUs. If you try to change a buffer object within this period, the driver may hard-block your draw thread until the GPU reaches #2, depending on driver architecture. However, wait until after #2 to change the buffer object, and you’re usually OK. So the generally recommended strategy for avoiding these draw thread blocks is to not change a buffer object until ~3 frames after you last submitted a draw call to the driver reading from it.
To really see how your GL command stream is being executed on the hardware, you want to use the GPU vendor’s profiling tool. That will tell you a lot, and clearly point out the places where you are doing something inefficient like blocking in the driver or not keeping the GPU units busy. It’ll show you when you successfullly clear a bottleneck, which you can’t always tell from profiling the draw thread.
If I understand correctly this is how I implemented :
glBindBuffer(…);
glBufferData(…, NULL, …);
…
I’ve definitely seen GLES drivers that don’t support orphaning, choosing to block (aka “implicitly synchronize”) in this case instead of orphan. It could be your driver does this. Check your GPU vendor’s OpenGL ES Programming Guide for details.