Need help with VBO updating performance

Hi there I am new here and just started my OpenGL journey last month. I got a problem when making a simple voxel editor and need some help with the performance.

Here is my implementation:
[ul]
[li]all blocks are stored in a 3D array, access one by blocks[x][y][z] for example
[/li][li]one block contains six faces
[/li][li]one face contains four vertices and their attributes (coordinate, color and normal)
[/li][li]a vector keeps track of all active blocks which are set active
[/li][li]three vectors keep track of all vertex attributes from active blocks and are used to bind with VBO
[/li][/ul]
First, when initializing, I create 323232 block instances, set them active and also make three STL vectors for vertex attributes (coordinate, color and normal respectively). Second, I go over all the active blocks and put all attributes to these vectors by push_back function. So now I have all the attributes stored in vectors. Last, I generate three VBOs, send the pointers of these vectors to bind the data and draw them by glDrawArrays.

My problem is that it takes around 10 seconds to draw the first frame for 323232 blocks, and 5 seconds for 202020. It means when I have such many blocks in my editor, it takes much time to compute the next frame even if I just delete one single block because I need to set that block disabled, go over current active blocks vector to update my attribute vectors and bind them again. And I found the the attribute updating part took the longest time.

Here is my code fragment:


// data
Block blocks[LENGTH_X][LENGTH_Y][LENGTH_Z];
vector<Block> activeBlocks;

GLuint *vboIds = NULL;
vector<GLfloat> vtxCoords;
vector<GLfloat> vtxColors;
vector<GLfloat> vtxNormals;

// initialize blocks
for (GLuint x = 0; x < LENGTH_X; x++)
{
	for (GLuint y = 0; y < LENGTH_Y; y++)
	{
		for (GLuint z = 0; z < LENGTH_Z; z++)
		{
                        /*
                        below is to translate between index (greater than 0) and position (both negative and positive)
                        and it is for the implementation of mirror editing (along x axis or z axis)
                        for example, if it is 9*9*9 now, then:
                        index:  0   1   2   3   4   5   6   7   8
                        coord: -4  -3  -2  -1   0   1   2   3   4
                        */
			coord_x = x - (GLfloat)(LENGTH_X - 1) / 2.0;
			coord_y = y + 0.5;
			coord_z = z - (GLfloat)(LENGTH_Z - 1) / 2.0;

			blocks[x][y][z] = Block(coord_x, coord_y, coord_z);
			
			blocks[x][y][z].SetActive(true);

			activeBlocks.push_back(blocks[x][y][z]);
		}
	}
}

// set vertex attributes data
vtxColors.clear();
vtxCoords.clear();
vtxNormals.clear();

for (int i = 0; i < activeBlocks.size(); i++)
{
	// 6 faces
	for (int j = 0; j < FACE_PER_BLOCK; j++)
	{
		// 4 vertices
		for (int k = 0; k < VERTEX_PER_FACE; k++)
		{
			vtxCoords.push_back(activeBlocks.at(i).GetFace(j).GetVertex(k).x);
			vtxCoords.push_back(activeBlocks.at(i).GetFace(j).GetVertex(k).y);
			vtxCoords.push_back(activeBlocks.at(i).GetFace(j).GetVertex(k).z);

			vtxColors.push_back(activeBlocks.at(i).GetFace(j).GetColor().x);
			vtxColors.push_back(activeBlocks.at(i).GetFace(j).GetColor().y);
			vtxColors.push_back(activeBlocks.at(i).GetFace(j).GetColor().z);

			vtxNormals.push_back(activeBlocks.at(i).GetFace(j).GetNormal().x);
			vtxNormals.push_back(activeBlocks.at(i).GetFace(j).GetNormal().y);
			vtxNormals.push_back(activeBlocks.at(i).GetFace(j).GetNormal().z);
		}
	}
}


// update VBO
vboIds = new GLuint[3];
glGenBuffers(3, vboIds);

glBindBuffer(GL_ARRAY_BUFFER, vboIds[0]);
glBufferData(GL_ARRAY_BUFFER, 
             sizeof(GLfloat) * 3 * VERTEX_PER_FACE * FACE_PER_BLOCK * activeBlocks.size(),
             vtxCoords.data(), 
             GL_DYNAMIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(GLfloat), 0);

glBindBuffer(GL_ARRAY_BUFFER, vboIds[1]);
glBufferData(GL_ARRAY_BUFFER, 
             sizeof(GLfloat) * 3 * VERTEX_PER_FACE * FACE_PER_BLOCK * activeBlocks.size(),
             vtxColors.data(),
             GL_DYNAMIC_DRAW);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(GLfloat), 0);
	
glBindBuffer(GL_ARRAY_BUFFER, vboIds[2]);
glBufferData(GL_ARRAY_BUFFER,
             sizeof(GLfloat) * 3 * VERTEX_PER_FACE * FACE_PER_BLOCK * activeBlocks.size(),
             vtxNormals.data(),
             GL_DYNAMIC_DRAW);
glEnableVertexAttribArray(2);
glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(GLfloat), 0);

glBindBuffer(GL_ARRAY_BUFFER, 0);

//draw
glDrawArrays(GL_QUADS,
             0,
             VERTEX_PER_FACE * FACE_PER_BLOCK * activeBlocks.size());


I think this implementation could be better, so I was wondering if there is something I could change to make it run faster. I was looking for a more efficient way to update vertex attribute vectors but could not figure it out. It is very likely that the way I use VBO is not correct, so I got this trouble. I would appreciate any suggestions. Hope my English is understandable. I am willing to explain anything as detailed as possible.

Thanks,
Boris

you dont update the whole vertexbuffer, just use “instanced rendering” and update the instance buffer dynamically
dont render stuff that is hidden behind other voxels, or behind the camera (or otherwise non-visible)
you dont need separate buffers for all your (non-instanced) attributes, i’d put all of them into the same buffer

try to measure the “frame time”, you’ve got many c++ function calls each frame
https://sites.google.com/site/john87connor/home/tutorial-13-performance


edit:

dont generate each frame new buffer objects!!!
just create them once, allocate enough space for each of them and use a “vertex array object”
to update a buffer’s content, use glMapBuffer() or glBufferSubData()

As john_connor said, drawing 323232 cubes is about drawing 200k quads. Only very few of them are effectively visible. Resolving this will allow to have a very high framerate. Check for examples the results of this query on google.

Also, consider drawing triangles instead of quads.

But do that only when you would have fixed the issue of creating VBO/VAO each frame.

Thanks for the suggestions!

After testing my block delete function, I found what made my program slow was this part:


// number of active blocks
for (int i = 0; i < activeBlocks.size(); i++)
{
	// 6 faces
	for (int j = 0; j < FACE_PER_BLOCK; j++)
	{
		// 4 vertices
		for (int k = 0; k < VERTEX_PER_FACE; k++)
		{
			vtxCoords.push_back(activeBlocks.at(i).GetFace(j).GetVertex(k).x);
			vtxCoords.push_back(activeBlocks.at(i).GetFace(j).GetVertex(k).y);
			vtxCoords.push_back(activeBlocks.at(i).GetFace(j).GetVertex(k).z);
 
			vtxColors.push_back(activeBlocks.at(i).GetFace(j).GetColor().x);
			vtxColors.push_back(activeBlocks.at(i).GetFace(j).GetColor().y);
			vtxColors.push_back(activeBlocks.at(i).GetFace(j).GetColor().z);
 
			vtxNormals.push_back(activeBlocks.at(i).GetFace(j).GetNormal().x);
			vtxNormals.push_back(activeBlocks.at(i).GetFace(j).GetNormal().y);
			vtxNormals.push_back(activeBlocks.at(i).GetFace(j).GetNormal().z);
		}
	}
}

I tried to make it a single buffer by putting them into structs which store three attributes, but it did not improve either. My implementation of deletion is that every time I click and delete a block, I set that block disabled and update my active blocks vector (go over all blocks and store if active), then use it to update all attribute vectors.

The reason I do it this way is that I could not figure out how to remove certain elements in a vector. Even if I record every index of block when doing push_back, when I erase any element, indices change, so I just clear the attribute vectors and re-store the new data from scratch. And these frequent push_back operations just make the program slow.

I know I need to update any vectors partially such as erasing the x, y, z float values of a vertex, but how can a vector do that? Or should I go for STL map? It does not seem to be an intuitive way… I just want to remove the whole attributes of the deleted block without touching anything else. Is there any better solutions?

I guess I should solve this problem and find out a better implementation or data structure before going further optimization although I am excited to do the hidden face culling part.