Johan Seland

10-13-2005, 07:55 AM

Hello.

We are currently developing a GPU-friendly method for adaptive subdivision of triangular meshes.

In one step of the algorithm, we wish to render pretesselated, (dyadically-)refined triangles, and calculate the surface in the vertex shader. This approach works well, and we get a nice rendered surface. During prototyping we just used immediate mode, passing triangle strips and integer coordinates as this:

for(size_t j=0; j<level; j++) {

glBegin(GL_TRIANGLE_STRIP);

for(size_t i=0; i<level-j; i++) {

glVertex3i(i,j, level-i-j);

glVertex3i(i,j+1, level-i-j-1);

}

glVertex3i(level-j,j, 0);

glEnd();

}We are now finishing the method, and planning to migrate to VBOs for a litte speed bump. My approach has been to store each level of refinement in their own VBO, using one degenerate triangle strip and an element array, and draw using glDrawElements.

This approach also works, but I notice a speed decrease of about 20% compared to immediate mode. My initial thought was that it was to much overhead involved in binding buffers etc, for the lower level of refinement so I have resorted to immediate mode for the lowest levels. However even when there are hundreds of indices in the VBOs it seem to be faster to use immediate mode.

Is there a lower bound on the number of vertices/indices when VBO's become efficient? Anyone who would like to comment on this observation?

We are seeing this behavior on GF6600, GF6800 and GF7800 series of GPUs, all on Linux (also when using the 81.63 series of drivers).

We are currently developing a GPU-friendly method for adaptive subdivision of triangular meshes.

In one step of the algorithm, we wish to render pretesselated, (dyadically-)refined triangles, and calculate the surface in the vertex shader. This approach works well, and we get a nice rendered surface. During prototyping we just used immediate mode, passing triangle strips and integer coordinates as this:

for(size_t j=0; j<level; j++) {

glBegin(GL_TRIANGLE_STRIP);

for(size_t i=0; i<level-j; i++) {

glVertex3i(i,j, level-i-j);

glVertex3i(i,j+1, level-i-j-1);

}

glVertex3i(level-j,j, 0);

glEnd();

}We are now finishing the method, and planning to migrate to VBOs for a litte speed bump. My approach has been to store each level of refinement in their own VBO, using one degenerate triangle strip and an element array, and draw using glDrawElements.

This approach also works, but I notice a speed decrease of about 20% compared to immediate mode. My initial thought was that it was to much overhead involved in binding buffers etc, for the lower level of refinement so I have resorted to immediate mode for the lowest levels. However even when there are hundreds of indices in the VBOs it seem to be faster to use immediate mode.

Is there a lower bound on the number of vertices/indices when VBO's become efficient? Anyone who would like to comment on this observation?

We are seeing this behavior on GF6600, GF6800 and GF7800 series of GPUs, all on Linux (also when using the 81.63 series of drivers).