Using VBOs (and PBOs)

I have a mesh that is generated off a B-Spline surface (say, 30x30), and I store the mesh vertices (and associated normals) in 2D arrays. I currently draw the mesh to the screen using glBegin and glEnd (GL_TRIANGLE_STRIP). I need to be able to draw this mesh in two different “modes”: one mode is normal, where I specify one color at the start using glColor3f and specify vertices and normals as usual. The other mode I will call “color mode”, where I turn off lighting, do not specify any vertex normals, and I shade each mesh vertex a different color to match a particular (u,v) parametrization (in particular, I color the R component with the u value, the G component with the v value, and set the blue component to 0.5… though the details aren’t too important). This results in a surface with gradient color, since each vertex is a different color.

Now, at certain times during the execution of my program, I need to render the surface into the back buffer in “color mode”, set the camera up at a certain angle, and use glReadPixels to examine the color of a certain pixel. At these times, I need to do this several hundred times in a row (as many times as there are data in my set, e.g. 500). With my current setup, this is (obviously) incredibly slow; for each of the 500 iterations, I draw the entire mesh using glBegin and glEnd (so that would be 900 or so triangles). Additionally, I would like the mesh to eventually be finer (say, 50x50 or higher), which would make this even slower.

I had thought about using a display list to render the mesh in color mode, however every so often I will need to deform the mesh and, therefore, change what is in the display list, which can’t be done without deleting the display list and recompiling.

After doing some research, I came across VBOs. However, at the current stage of my program (in addition to other work I need to do), it would require a fairly time-consuming overhaul of the rendering functions to change everything to VBOs. I would like to know: would changing all the rendering to VBOs significantly increase the speed of this part of my program? If the loop executes about 500 times, with a 30x30 mesh, it can take as long as 5 or 6 seconds, and I would like to get this step to 0.5 seconds or less. Also, I’ve heard that glReadPixels is a blocking function, which is probably causing me some significant overhead as well. Would PBOs be the solution I need here?

Some example code which is the essence of this part of my program is below. I would like to know if it is worth my time to invest in implementing the VBOs/PBOs (will it reduce significantly, or only a small amount?), or if anyone has general comments as to how I can speed this part of my program up. Thanks!

EDIT - I should note that the surface will never change during this loop, but the camera angle will. (The surface/mesh might change after this loop, but never during).


for(int i=0; i<500; i++)
{
	// set up camera according to data point i
	// ...
	// ...

	surface->render(COLOR_MODE); // render 1000+ triangles using glBegin(GL_TRIANGLE_STRIP)
	glReadPixels(..., GL_RGBA, ...);
	// do something with information retrieved from glReadPixels
}

There is a lot of optimization potential in what you described.
If the data is static during the 500 times rendering and can change after that, VBOs are just perfect for that.
Immediate mode is the slowest you can do in that case.
A display list should render optimally fast but if you’re concerned about display list compile time, stick with VBOs.

Use indexed rendering glDraw(Range)Elements because a simple mesh will make for excellent vertex reuse.
(Advanced: You can even maximize that vertex reuse by rendering the mesh in swaths instead of strips along an edge. Goal is to keep as many duplicate indices in the last 16 to 24 vertices sent.)

BTW, changing that rendering algorithm from using immediate mode to VBOs is not a considerable change. Make it work with standard vertex arrays first to get the rendering calls right, then add the VBOs on top.

The glReadPixels is blocking in a way that the rendering needs to finish before it can fetch the pixels. You said

Now, at certain times during the execution of my program, I need to render the surface into the back buffer in “color mode”, set the camera up at a certain angle, and use glReadPixels to examine the color of a certain pixel.

Does that mean you’re interested in only one pixel of every rendering?!
Then you should only render that one pixel, which will obviously be pretty fast. :wink:
You can experiment with glScissor to limit the rendering or better setup the projection matrix and viewport so that only that single pixel is rendered.
With the viewport trick per pixel you wouldn’t even need to readback the data after every frame but render the pixels side by side and read them in one sweep.

Be aware that rendering into a backbuffer doesn’t mean that pixel is owned by the context. If you have an overlapping window at that point the read back data is undefined. Read the OpenGL spec on pixelownership.
For true unclipped rendering you should use framebuffer objects (or the legacy p-buffer extension on (crap!) implementations not supporting FBOs).

50 * 50 * 2 triangles * 500 frames = 2.5 MTriangles.
A current HW can handle 100 to 200 million vertices per second. Depending on the fillrate requirements (which are ridicilously low if you actually render only one pixel) you should be able to do that easily in under a second.

Thanks so much for your help, Relic! After reading your post, I’m fairly embarrassed by how inefficient my code is. Allow me to explain and then ask a few additional questions. :slight_smile:

  • First of all, you recommend I use glDraw(Range)Elements. I’m obviously inexperienced with vertex arrays (and VBOs), so (temporarily ignoring your advanced “swath” tip), the best way to do this would be to put all my vertices (in, say, row major order) into an array, then compute a set of indices for one of the triangle strips, call glDrawElements with GL_TRIANGLE_STRIP, and then simply add some constant to each index to “bump it up” to the next row (and then draw glDrawElements again)? Additionally, if I have a single color for each vertex, as is the case when not drawing in “color mode”, do I have to have an enormous color array for each vertex, where each color is the same? Or can I still just call glColor3f once before doing any drawing in this case? (Clearly I will have to have such an array for drawing in color mode, however, since each vertex is a different color, correct?)

  • I’m most embarrassed to reveal that, yes, I am only interested in one pixel per drawing, and yet I draw the entire 800 by 800 (or whatever) viewport. :slight_smile: In particular, my algorithm works like this: I have an array of 3D data points (of size, say, 500). At each iteration in the above loop, I move my camera to location data[i], and then I use gluLookAt to look from that point along a given direction (specified by the 3D vector “normal”, which is fixed over all 500 points). I also call: gluPerspective(45, 1, 1, 100) to get an extremely narrow viewing angle (I’m trying to look at one specific pixel on the surface). I then draw the surface in color mode, and examine the pixel in the middle of the screen (width/2, height/2) using glReadPixels. I then store the color information in an array and move on to my next data point. So simply by changing the viewport to be 1 pixel by 1 pixel for all these “color mode” drawings, I can increase the speed of every drawing by a ridiculous factor? How can I render these 500 individual pixels “side by side” as you mention?

Again, thanks a lot for your help, Relic. I knew I was doing this inefficiently, but now I realize just by how much. :slight_smile:

No problem, knowledge comes with the Guru status. :smiley:

For the vertex array question lets look at a simple ASCII art.
Lets have a 3x3 grid of vertices (type Vertex3f) describing a 2x2 mesh of quads.
Put the 9 vertices in a vertex array using 3 floats per vertex starting with the bottom left and going right, then up.

#define NUM_VERTICES 9
float vertices[NUM_VERTICES * 3]; // Fill that with your vertices. For bigger meshes use a malloc() for runtime allocation.

The mesh indices would look like this


6-7-8
| | |
3-4-5
| | |
0-1-2

To render that thing in a single call without using tricks to strip the data around the corners, lets start with quads.
(If the mesh is not planar you should use triangles and choose tesselation along a minimum diagonal.)

The indices for the four quads would be (counter clockwise!)


unsigned short indices[] =  // Use unsigned short if the number of indexed vertices is smaller than 65536 for better performance.
{
0, 1, 4, 3,
1, 2, 5, 4,
3, 4, 7, 6,
4, 5, 8, 7
};

Simple as that. You see that in this list vertex indices of adjacent quads are used repeatedly, that’s the vertex reuse which the hardware will use to not transform that data again because it has just been seen recently and the transformed data is still in a chip cache, which is small like 24 vertices. This only works with indexed rendering obviously. (The swathing trick would be to not render more than e.g. 8 quads in the horizontal direction but go upwards earlier to reuse the vertices of the inner row (here indices 3,4,5)

If you have a color array, lets assume 3 floats per color that’s almost identical to the vertices array:

float colors[NUM_VERTICES * 3];

The rendering calls with that would be


glColorPointer(3, GL_FLOAT, 12, colors);
glEnableClientState(GL_COLOR_ARRAY);
glVertexPointer(3, GL_FLOAT, 12, vertices);
glEnableClientState(GL_VERTEX_ARRAY); // If you always use vertex arrays you can leave this on forever, i.e. call it once during init.
glDrawElements(GL_QUADS, 16, GL_UNSIGNED_SHORT, indices);

Yes, if you don’t have a color array, OpenGL uses the current color to render (except if lighting is on and color_material off.)


glDisableClientState(GL_COLOR_ARRAY);
glColor3f(r,g,b); // Assumes lighting off.
glVertexPointer(3, GL_FLOAT, 12, vertices);
// Still on: glEnableClientState(GL_VERTEX_ARRAY);
glDrawElements(GL_QUADS, 16, GL_UNSIGNED_SHORT, indices);

Done! Easy, eh?

The Viewport trick would be to set the viewport to individual pixels like glViewport(0, 0, 1, 1); for the lower left pixel inside the window, then just increase the x parameter per rendering.
You need to setup a scissor as well because glViewport does not clip!
Assuming your render area is 500 pixels wide and at least one high


glEnable(GL_SCISSOR_TEST);
for (i = 0; i < 500; i++)
{
  glViewport(i, 0, 1, 1);
  glScissor(i, 0, 1, 1);
  // Render your mesh i here.
}
glReadPixels(0, 0, 500, 1, GL_RGBA, GL_UNSIGNED_BYTE, pixels); // Read the 500 results. On NVIDIA boards GL_BGRA is faster for GL_UNSIGNED_BYTE.

gluPerspective(45, 1, 1, 100) is NOT an extremely narrow viewing angle, fovy is 45 degrees, that’s pretty normal.

Calculating the viewing frustum for the center pixel is simple linear algebra. Simply describe the viewing frustum with the coordinates of the four corners of the pixel you’re actually interested.
glFrustum describes that easier than gluPerspective which just calculates a glFrustum for you.

Thanks again, Relic. Sorry for the late reply, I have been busy the past few days. I would like to clarify a few more things before I begin coding, just to save myself some time. :slight_smile:

  • You say lighting needs to be OFF to specify a color with glColor3f before drawing my mesh with glDrawElements? How can I get lighting, then? When not drawing in color mode, I will want to draw the mesh fully lighted, by also specifying vertex normals.

  • As I mentioned, my mesh will be changing as frequently as the user specifies (it starts out 30x30, but will change when the user indicates in the interface), so I will have to be "new"ing and "delete"ing arrays all the time. Not a big issue, but just one I need to state.

  • My mesh is not planar, so I will have to use GL_TRIANGLES. Specifying my own indices like this (for the 2 triangles per “patch” on the surface) is faster than using multiple calls and GL_TRIANGLE_STRIP? I guess triangle strips just minimize the number of function calls as it is, which I’m already doing with these vertex arrays, but I would like clarification on that.

  • I see what you are doing with the glScissor calls there. However, it is conceivable that I may have, say… 2000 data points (or more), and my original viewport will obviously never be 2000 pixels wide. It doesn’t look like this is a problem with your code above, but you said “assuming your render area is 500 pixels wide…” which confuses me a bit. So some clarification there would be helpful. :slight_smile:

  • Yeah, I noticed my mistake with gluPerspective looking over my code. What I meant, I suppose, was gluPerspective(1, 1, 1, 100)… actually, I probably want that near clipping plane to be really close to the camera, because it’s possible my data point is extremely close to the surface, and I still want to be able to view the surface if it is less than 1 unit away. This might be the cause of some other bugs in my software, to be honest… the documentation is so clear, I don’t know what I was thinking when writing the code. Perhaps I need to go gluPerspective(1, 1, EPSILON, 100), for some small epsilon, eh? So, perhaps the final code should be:


gluPerspective(1, 1, EPSILON, 100); // #define EPSILON 0.0001
glEnable(GL_SCISSOR_TEST);
for (i = 0; i < 500; i++)
{
  glViewport(i, 0, 1, 1);
  glScissor(i, 0, 1, 1);
  // Render your mesh i here.
  // this involves gluLookAt with data point i, along some
  // constant "normal" direction (ie, data[i] + EPSILON * normal)
  // then rendering surface in color mode
}
glReadPixels(0, 0, 500, 1, GL_RGBA, GL_UNSIGNED_BYTE, pixels); // Read the 500 results. On NVIDIA boards GL_BGRA is faster for GL_UNSIGNED_BYTE.

To clarify, I want this code to return to me the color value (in array pixels) when I render the mesh (in color mode) from 500 different angles. Conceptually, I want to “draw a line” from the camera at data point i to the surface along direction “normal”, which is constant for all i, and then find the color of the surface at that point. In other words, I want to find the projection of each data point along a vector named “normal” onto the surface, then find the color of the surface at the projected point. I intend to do this by positioning the camera at data[i], looking along data[i] + EPSILON*normal (ie, in the direction of the normal), and then shrinking the viewing frustum so that the pixel viewable in the middle of the screen is undoubtedly the pixel I am looking for. The code above will achieve my desired effect, correct? This is really the easiest/best way to do it… I’m not missing something stupidly obvious? :slight_smile:

  • You say lighting needs to be OFF to specify a color with glColor3f before drawing my mesh with glDrawElements? How can I get lighting, then? When not drawing in color mode, I will want to draw the mesh fully lighted, by also specifying vertex normals.

Read the OpenGL specification on lighting.
If lighting is on, the color of the primitives are defined by the current material.
(OpenGL actually allows to call glMaterial in begin-end which is one of the worst ideas in this API. Don’t do that! Never!)
Since you’re using vertex array data, that’s not working anyway, but there is the glColorMaterial and glEnable(GL_COLOR_MATERIAL) state which allows to reroute glColor calls to parts of the current material. You might want to set it to its default ambient_and_diffuse state and then each vertex color inside your array or the global glColor will set the material’s ambient and diffuse color which is picked up by the lighting calculations.

  • As I mentioned, my mesh will be changing as frequently as the user specifies (it starts out 30x30, but will change when the user indicates in the interface), so I will have to be "new"ing and "delete"ing arrays all the time. Not a big issue, but just one I need to state.

Don’t new and delete the memory if the size doesn’t change, simple change the data inside. You could also keep it if it shrinks. new and delete should not be your problem here.
Also don’t delete the vertex buffer object you’re going to use in the future. Simply use glBufferData to update the data.
(You have to think about performance all the time.)

  • My mesh is not planar, so I will have to use GL_TRIANGLES. Specifying my own indices like this (for the 2 triangles per “patch” on the surface) is faster than using multiple calls and GL_TRIANGLE_STRIP? I guess triangle strips just minimize the number of function calls as it is, which I’m already doing with these vertex arrays, but I would like clarification on that.

A triangle strip is not an independent primitive, so you would need more drawing calls to render the mesh (e.g. one per row) unless you wrap around the corners. Using triangles will send a lot more indices (six per quad) but will only need one drawing call. (Drawing calls are not too expensive under OpenGL, under D3D that’s a totally different matter.)
The vertex reuse should take care of the duplicate indices, that is the independent triangles shouldn’t be too much slower than the triangle_strips. And you have the option to split the quads where you want. Just try both if you like. Nothing of the above is complicated.

  • I see what you are doing with the glScissor calls there. However, it is conceivable that I may have, say… 2000 data points (or more), and my original viewport will obviously never be 2000 pixels wide. It doesn’t look like this is a problem with your code above, but you said “assuming your render area is 500 pixels wide…” which confuses me a bit. So some clarification there would be helpful. :slight_smile:

That was just an example because you said you want to have 500 pixels and render a 800 * 800 image. The comment just says that this won’t work on a window narrower than 500 pixels because of the pixelownership test.
Just arrange the pixels in any 2D order you like. You don’t need to render 500 * 1, you could also render 100 * 5 or 25*20 pixels, whatever fits.

  • Yeah, I noticed my mistake with gluPerspective looking over my code. What I meant, I suppose, was gluPerspective(1, 1, 1, 100)… actually, I probably want that near clipping plane to be really close to the camera, because it’s possible my data point is extremely close to the surface, and I still want to be able to view the surface if it is less than 1 unit away. This might be the cause of some other bugs in my software, to be honest… the documentation is so clear, I don’t know what I was thinking when writing the code. Perhaps I need to go gluPerspective(1, 1, EPSILON, 100), for some small epsilon, eh? So, perhaps the
    final code should be:

The nearer the zNear plane the worse the ratio zFar/zNear. Make thar ratio as small as possible or your depth buffer precision will go down the drain!


gluPerspective(1, 1, EPSILON, 100); // #define EPSILON 0.0001
glEnable(GL_SCISSOR_TEST);
for (i = 0; i < 500; i++)
{
  glViewport(i, 0, 1, 1);
  glScissor(i, 0, 1, 1);
  // Render your mesh i here.
  // this involves gluLookAt with data point i, along some
  // constant "normal" direction (ie, data[i] + EPSILON * normal)
  // then rendering surface in color mode
}
glReadPixels(0, 0, 500, 1, GL_RGBA, GL_UNSIGNED_BYTE, pixels); // Read the 500 results. On NVIDIA boards GL_BGRA is faster for GL_UNSIGNED_BYTE.

No! The projection matrix really needs to be calculated for that one single pixel you’re interested in. The above code is using an arbitrarily chosen angle 1.0 and could map a whole rectangle of your rendered mesh to the single pixel and you’ll never know which one was picked, but you’re interested in the center pixel.

To clarify, I want this code to return to me the color value (in array pixels) when I render the mesh (in color mode) from 500 different angles. Conceptually, I want to “draw a line” from the camera at data point i to the surface along direction “normal”, which is constant for all i, and then find the color of the surface at that point. In other words, I want to find the projection of each data point along a vector named “normal” onto the surface, then find the color of the surface at the projected point. I intend to do this by positioning the camera at data[i], looking along data[i] + EPSILON*normal (ie, in the direction of the normal), and then shrinking the viewing frustum so that the pixel viewable in the middle of the screen is undoubtedly the pixel I am looking for. The code above will achieve my desired effect, correct? This is really the easiest/best way to do it… I’m not missing something stupidly obvious? :slight_smile:

Yes, that should work.

Depending on the nature of the data[i]'s position and normal distribution there could exist other ways.

You mentioned in your other post that I should use glFrustum to compute the frustum using “basic linear algebra” to pinpoint the middle pixel in the scene. I do not understand how this is done; glFrustum requires me to specify the 6 clipping planes which… what, surround the center pixel of my screen? This is in object space, not in image space, so how can I know where my six planes will be? I don’t know enough about where the middle pixel/surface is to specify these (if I knew a small bounding cube around the center pixel, I could probably find the u,v coordinate via other less computationally-expensive means). I think I am misunderstanding you here. :slight_smile: This also comes back to knowing how far my near and far clipping planes should be… my best “guess” is EPSILON to 100 because it really could be anywhere in there. I suppose I could shrink that 100 to 10 or so, but is that still going to create problems?

Though, on everything else you’ve helped me with, I believe I understand the salient points and can begin my implementation. Thanks for taking the time to help me out!

Previously you setup a 800*800 view and a zNear value to render your scene. That means you already picked a glFrustum. Taking that and limiting it to the corners of the center pixel you actually read out later is simple linear algebra. You just didn’t know what gluPerspective did for you.

Look at its implementation and you see how it calculates the parameters for the final glFrustum call it does.

Dang, I don’t find an implementation right now. There was one in the open source SGI OpenGL implementation on http://oss.sgi.com/projects/ogl-sample/

Now change that to take the corners of the pixel you’re interested in as delimiters for the left right top and bottom parameters.

Thinking about that there are two simple and elegant approaches to get the projection point to a single pixel:
1.) gluPickMatrix with a 1x1 size and the viewport of the original rendering you wanted to shrink or even simpler
2.) your original gluPerspective parameters with the field of view divided by the half size of your previous viewport height (45/800)
Might save you some headaches. :slight_smile:

Infil,

Let another newbie chime in here.

I just completed some code that manipulates the view frustum in a very similar way. Let me explain:

First, you pick appropriate near and far values for your frustum and then leave them alone.

The idea is this: With a given viewport and view frustum, a pixel lands at a given spot on the viewport and takes a given proportion of the height and width of the viewport.

Compute how far over the pixel is from the left and bottom of the viewport. (If the viewport is 100x100 pixels, and the pixel you’re interested in is at 13, 23, then the pixel is 13% of the way from left to right of your viewport, and 23% of the way from the bottom), and what percentage of the the height/width of the viewport it is. (for one pixel, it will take up 1/viewport width, 1/viewport height.)

Now, create a pixel frustum where the left of the pixel frustum is moved over by the same percentage of the original frustum width, and where the bottom of the pixel frustum is moved up by the same percentage. So for our example, the left side of the pixel frustum would be 13% of the way from left to right on the original frustum, and the bottom of the pixel frustum would be 23% of the bottom-to-top distance of the original viewport frustum.

So your new left value would be 13% of the left-to-right distance of the original frustum, and the right value would be at 14%, since one pixel takes up 1% of the original viewport.

See this open source library for some example code:
TR - OpenGL Tile Rendering Library

This code takes a viewport and cuts it into tiles that it then renders in pieces, then stitches together into a larger image. Your application is similar: You’re just creating a 1 pixel sized tile.

The key code you need is at the end of the routine trBeginTile.

The equivalent code I came up with might be a little more readable:


new_frustum.left = starting_frustum.left + (frustum_width * this_column * tileSize.width) / save_width;
new_frustum.right = new_frustum.left + frustum_width * this_tileSize.width / save_width;
			
new_frustum.bottom = starting_frustum.bottom + (frustum_height * this_row * tileSize.height) / save_height;
new_frustum.top = new_frustum.bottom + (frustum_height *  this_tileSize.height) / save_height;
			
glFrustum (new_frustum.left, new_frustum.right, new_frustum.bottom, new_frustum.top, new_frustum.near, new_frustum.far);

in my code, frustum_height and frustum_width are the size of the frustum for the original full-sized viewport, and save_height and save_width are the height and width of the large, multi-tiled image I’m generating.