Perofrmance issues reusing a large vertex buffer

I have been experimenting with point clouds and have hit a performance issue and would like some comments.

I created a 5 million vertex buffer and loaded it with points via locking and copying the points. Then rendered the frame.

Nsight reports 150+ fps on my nVidia card.

I then loaded a new set of data using the same VAO/VBO and my frame rate dropped to 40 fps.

If I delete and recreate the buffer instead of reusing the existing buffer, I don’t see the performance hit.

I have tried allocating the buffer both as dynamic and static. I have not used streaming as the data set can be loaded for some time before the user swaps data sets.

IIRC, NVidia has a policy that if you update an existing VBO it downgrades the VBO from GPU memory to driver memory. That’s the default behavior. You have to play some tricks if you know you want a different behavior.

You have to play some tricks if you know you want a different behavior

Thanks that would explain the drop.

Would these tricks end up costing as much as just deleting and recreating the buffer? If not can you give more details. And do you know if AMD has similar logic.

No idea on AMD. The tricks are actually very cheap to apply. Before updating the buffer, IIRC just make the buffer resident and/or query the buffer’s address (glMakeBufferResidentNV or glGetBufferParameterui64vNV( …,GL_BUFFER_GPU_ADDRESS_NV, … ).

Note that while these are cheap and simple to apply, they place your buffer further from the CPU so if your buffer updates from the CPU don’t stream well (e.g. you are using blocking buffer update calls), then doing this could cost you performance. Try both ways and see. Read on Buffer Object Streaming (and streaming VBO) in the OpenGL wiki and forums for tips on making your buffer updates stream well. If you are reusing the buffer contents many times and seldom updating, this could be a win.