Efficient VBO usage or VBO merging!

Hi all,

Sorry for this long post but I’m tring to put my problem in such a way that you can understand it really well :slight_smile: (btw, my mother tong is not english).

Well, here we go. Suppose I have the following triangle vertices:

float VerticesT1[] = {
-0.5f,0.0f,0.5f, //V0
0.5f,0.0f,0.5f, //V1
0.0f,0.5f,0.5f //V2
};

and indices:

uint IdxT1[] = {0, 1, 2};

and I use VBO to draw a triangle like this:

//buffering vetices
int vboVertices1;
glGenBuffers(1, &vboVertices1);
glBindBuffer(GL_ARRAY_BUFFER, vboVertices1);
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * 3 * 3, VerticesT1, GL_STATIC_DRAW);
glVertexPointer(3, GL_FLOAT, 0, (char *) NULL);

//buffering indices
int vboIndices1;
glGenBuffers(1, vboIndices1);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, vboIndices1);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(uint) * 3 * 1, IdxT1, GL_STATIC_DRAW);

//drawing
glEnableClientState(GL_VERTEX_ARRAY);
glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
glDrawElements(GL_TRIANGLES, 1 * 3, GL_UNSIGNED_INT, (char *) NULL);
glDisableClientState(GL_VERTEX_ARRAY);

So far, no problem, not really a dificult task to accomplish. But this is just the begining.

Imagine now that I add 3 more points (vertex) into the space:

float VerticesT2[] = {
0.25f, 0.25f, 0.75f,//V3
-0.25f, 0.25f, 0.75f,//V4
0.00f, 0.00f, 0.75f //V5
};

and want to display 4 triangles with the following indices:

uint IdxT2[] = { 3, 4, 5,
0, 5, 4,
1, 3, 5,
4, 3, 2};

where indices 0, 1 and 2 would correspond to the vertices of the first triangle (V0, V1, V2) and the indices 3, 4 and 5 corresponds to the 3 new points (vertices) (V3, V4, V5) in VerticesT2[]. The easiest way to draw these new 4 triangles listed in IdxT2[] would be to send all 6 vetices like this:

float verticesT1andT2[] = {
-0.5f,0.0f,0.5f, //V0
0.5f,0.0f,0.5f, //V1
0.0f,0.5f,0.5f, //V2
0.25f, 0.25f, 0.75f,//V3
-0.25f, 0.25f, 0.75f,//V4
0.00f, 0.00f, 0.75f //V5
};

into a new VBO:

//buffering vetices
int vboVertices2;
glGenBuffers(1, &vboVertices2);
glBindBuffer(GL_ARRAY_BUFFER, vboVertices2);
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * 3 * 6, VerticesT1, GL_STATIC_DRAW);
glVertexPointer(3, GL_FLOAT, 0, (char *) NULL);

given that the new index list will also be buffered:

//buffering indices
int vboIndices2;
glGenBuffers(1, vboIndices2);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, vboIndices2);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(uint) * 3 * 4, IdxT2, GL_STATIC_DRAW);

Well, it surelly does the trick, but the big problem is that it would mean a big waste of time/CPU occupation/RAM->VRAM bandwidth trafic/and all other consequences, since half of the vertices (3 out of the 6) are alredy in the VRAM due to previous data buffering.

Then, the goal here is to draw these 4 triangles by sending only the 3 new points in VerticesT2[] and the new indices IdxT2[] list. I tried to think about some solutions. One would be this one: create a big buffer like:

//buffering vetices
int vboVerticesBigBuffer;
glGenBuffers(1, &vboVerticesBigBuffer);
glBindBuffer(GL_ARRAY_BUFFER, vboVerticesBigBuffer);
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * (3 * 3 + 3 * 3), NULL, GL_STATIC_DRAW);
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(float) * 3 * 3, VerticesT1);

to drawn the first triangle like:

//…
glDrawElements(GL_TRIANGLES, 1 * 3, GL_UNSIGNED_INT, (char *) NULL);
//…

then draw the 4 triangles by passing only VerticesT2[] with:

//…
glBufferSubData(GL_ARRAY_BUFFER, sizeof(float) * 3 * 3, sizeof(float) * 3 * 3, VerticesT1);
//…

and draw like (i suppose we also send the IdxT2[] index table):

//…
glDrawElements(GL_TRIANGLES, 4 * 3, GL_UNSIGNED_INT, (char *) NULL);
//…

The problem of this approach is that I take the double of memory when creating the bigbuffer to draw the first triangle only. Another problem is that once I draw the 4 new trianges and want to draw the first one back, i can’t erase only the memory of the buffer correspond to the 3 new vertices that I won’t use anymore (delete the bigbuffer and create a new one for VerticesT1 is not an option).

You can ask yourself “why this dude want to complecate so much his life if he is hangling only 6 vertices???”. Well, the problem is that I don’t have only 6 vertices, but actually tens or hundreds of millions of vertices to handle (very big 3D models sharing vertices) and can’t afford waste VRAM place nor time to resend data that already is in VRAM.

A second solution I though would be to create two independent buffers with 3 vertices each (VerticesT1[] and VerticesT2[]) and temporally merge them to create a big one (functionally equivalente to verticesT1andT2[]), then i could draw the 4 triangles and, after that, I could split apart to delete one of them (VerticesT2[]) when I want to draw only the fist triangle. But the problem is that i don’t know if it is possible to do that xD The cool thing would be to be able to merge them at least virtually so that the 4th element of the big buffer would correspond to the 1st of the second buffer (virtually, which would avoid the data movement in the VRAM. By the way, this is what is done in the system RAM when we use malloc/new to create big arrays in C/C++: arrays elements are not necesserly physically consecutive in the memory but for the user they are “virtually” consecutive (we can do prt++ to access the next element)).

So I’m really in need of a solution to do so, and i hope that i’m not asking for something that actually is impossible with openGL 4.0!

Thank you for your time,

Leo

Interesting problem. And you communicate very well. Thanks for taking the time/words to explain your problem clearly. It doesn’t show that English is not your first language. And BTW, this is not a “Beginner’s” question :wink:

So ideally, what you’d like is to have 2 (or more) VBOs containing vertex attributes, and have a way to render a bunch of triangles with them, some of which would be comprised of vertices from “both” buffers. Same thing for IBOs (index lists) too.

Well, the normal way of pushing vertices down the pipe doesn’t really support this. There is one “position” vertex attribute that gates the whole process, and it is pulled from a single bound VBO (or client array). So clearly we need something … unconventional…

Well, we need positions fed in to force the vertex shader to fire. What if we pulled in some “dummy” positions, just so we could pull in the ones we “really” wanted from texture buffers (or buffer pointers) in the vertex shader?

So maybe what we do is use glDrawElementsInstanced to draw N instances (100,000, 200,000, …whatever you need) of a single triangle (GL_TRIANGLES) with basically some dummy positions. Ok, so now we’ve got the vertex shader firing for each vertex of each of those 100,000 triangles, with gl_InstanceID and gl_VertexID set to tell us which triangle and which vertex in that triangle we are currently processing.

Well, now we can use gl_InstanceID and gl_VertexID to look up the “real” positions/attributes in the vertex shader, and push them down the pipe.

Let’s suppose you bind 4 texture buffers to your vertex shader: 2 containing index lists in different buffers, and 2 containing vertex attributes in different buffers. Use gl_InstanceID/gl_VertexID to grab the right indices, use them to lookup the right vertex attributes, and the rest is shader processing like normal.

Instead of using texture buffers, if you’re on NVidia, you can use buffer pointers to access the various buffers directly in the shader using Bindless. But consider carefully, as this will only work on NVidia cards, though it gives you the capability to build more complex (and potentially linked) data structures in GPU memory.

Well, the problem is that I don’t have only 6 vertices, but actually tens or hundreds of millions of vertices to handle (very big 3D models sharing vertices) and can’t afford waste VRAM place nor time to resend data that already is in VRAM.

That’s the part I have trouble believing. People have been able to run fast-performing applications where they do skinning transformations on the CPU. Every character has basic skinning done, per vertex, on the CPU and is uploaded every frame. And these applications work just fine. PCI-E bandwidth is not particularly large, but it isn’t exactly a scarce resource either.

What you’re saying is that you have model A that has some vertices. Then you want to upload model B that has some vertices. But because they share some number of vertices, you want to save memory not duplicating the particular vertices that they share.

The first question I would want to ask is this: is this really a significant number of vertices? Enough to make the effort of what you’re trying to do worthwhile? That is, are you actually running out of video memory if you don’t do this optimization, or is it simply a concern?

I’m a bit curious as to what kind of dataset we’re talking about here. For this to work, positions, texture coordinates, colors, normals, and all other vertex attributes must be the same for each vertex that is shared across models. I could see how this might work if you’re only looking in terms of positions, but with all other attributes, it seems very unlikely that two models would share a lot of vertex data.

If this is some kind of data where there are distinct sub-sections of models that are used in combination with different sections, then it seems to me the best way to go is to break the common model pieces out into their own discrete models. You could even write a tool to generate these automatically by analyzing your data.

What if we pulled in some “dummy” positions, just so we could pull in the ones we “really” wanted from texture buffers

The problem is that buffer textures are limited to, on current hardware, 65536 elements. If he’s dealing in the millions of vertices, even filling up all 16 or 32 texture units with buffers texture might not be enough.

He would have to get into non-standard extensions to make this work.

Sorry, double-post

Nothing to add except that you can make your post even clearer by putting your code inside ‘code’ tags. Something like this:


float VerticesT1[] = {
                -0.5f,0.0f,0.5f, //V0
                 0.5f,0.0f,0.5f, //V1
                 0.0f,0.5f,0.5f  //V2
              };

Code tags are [ code] and [ / code ] without the spaces.

oh, ok, good to know. Btw, is it possible to edit my post??? I can’t find the option to do so…

EDIT: lol, sry, it seems that I can edit only if my post has no replies! Is it so?

EDIT2: btw, i’m preparing a response post to Dark Photon and Alfonse Reinheart.

The problem is that buffer textures are limited to, on current hardware, 65536 elements.[/QUOTE]

Are you sure about that?:

That’s 128 Mtexels, max. And here on an NVidia GTX285, GL_MAX_TEXTURE_BUFFER_SIZE reports 134217728, which is exactly 128 Mtexels.

He would have to get into non-standard extensions to make this work.

In other words, buffer pointers, via NVidia bindless.

oh, thank you, you are a gent :slight_smile: I try my best to waste the lesser the people’s time :slight_smile:

Well, I hesitated on posting it here or in the advanced. But, now I don’t if it is too late (I don’t know if moderators would appreciate to see the same thread in to different sections. Or, is it possible to move the thread to the advanced section?)

Given my not so advanced expertise in openGL, this is indeed what I call “ideally”. But maybe there is another way of doing so that really is ideal xD

So you are saying that the positions work individually for each VBO. So I can’t really do something like: inform to index buffer that index number 3 actually refers to position 0 in the second buffer, isn’t it?

Actually, I didn’t know the glDrawElementsInstanced() function, because it seems that it has been introduced into the version 3.1 of openGL, but the reference page I have is for the version 2.1 only (http://www.opengl.org/sdk/docs/man/). Where can I find the reference pages for the more recent versions (3.0, 3.1, 3.2, 3.3, 4.0)?

I did some research and I tried to figure out what this function does and what the variables gl_InstanceID and gl_VertexID tells us about the rendering pipeline. But, I still don’t get one thing. Suppose the only vertex attribute I care about is the vertex coordinates (for exemple, I’m drawing a wireframe version of my mesh). So I create two VBOs, one with vertices V0, V1 and V2 and other with V3, V4 and V5, and two IBOs, one with the index list of the fist triangle and the other with the index list of the 4 new triangles. Now, what am I supposed to do to draw the new 4 triangles from these 2 different VBOs using the second index list? Are you telling me to draw 1 instance of the first VBO and 4 instances of the second VBO and use the gl_InstanceID and gl_VertexID values to mixe up the vertex attribs and index values? Now my brain collapsed xD Please, clarify it to me :slight_smile:

yeah, that’s the gotcha, I would like to use a multiplateform thing, not constraint the end user to use this or that brand. If ATI plans to do so I could try use it.

Thx for your time, very kind of you.

Best,

Leo

Are you sure about that?

Well, I was…

Where can I find the reference pages for the more recent versions (3.0, 3.1, 3.2, 3.3, 4.0)?

They don’t exist. They webmasters say that they’re working on them for 4.0, but they’ve been working on them for 3.2 for months before that. For the time being, you have to work off of the specifications.

Suppose the only vertex attribute I care about is the vertex coordinates (for exemple, I’m drawing a wireframe version of my mesh). So I create two VBOs, one with vertices V0, V1 and V2 and other with V3, V4 and V5, and two IBOs, one with the index list of the fist triangle and the other with the index list of the 4 new triangles. Now, what am I supposed to do to draw the new 4 triangles from these 2 different VBOs using the second index list?

Your vertex shader code would look something like this:


uniform int sizeArray[2];
uniform samplerBuffer positionBuffer[2];
uniform samplerBuffer indexBuffer[2];

main()
{
  bool bFirst = true;
  int texelIndex = gl_InstanceID;
  if(gl_InstanceID > sizeArray[0])
  {
    bFirst = false;
    texelIndex = gl_InstanceID - sizeArray[0];
  }

  int index;
  if(bFirst)
    index = texelFetch(indexBuffer[0], texelIndex, 0);
  else
  {
    index = texelFetch(indexBuffer[1], texelIndex, 0);
    index -= sizeArray[0]
  }

  vec4 position;
  if(bFirst)
    position = texelFetch(positionBuffer[0], index, 0);
  else
    position = texelFetch(positionBuffer[1], index, 0);

  //position has your position.
}

Well, try to think of it as a tessellation like feature. Suppose I have a mesh with a given detail and I want to increase the detail by creating new triangles. For me, each triangle is subdivide into n subtriangles. Clearly, the index list of these new n triangles is completely different, then, the index list changes but the old vertex coordinates still will be used to draw the new triangles. Now suppose you have huge meshes that you need to subdivide (with a particular algorithm, not the openGL tessellation). It does make a big reuse of vertices that charge very badly the pci-e bus if we don’t handle it properly.

Then, what I actually have is a model A and a model B that actually is model A with new points to increase detail.

Yes, i have models of hundreds of thousands, tens of millions and hundred of millions of triangles = gigas of bytes. So, besides the optimization of the data structure in the system RAM (where I have to load the entire object to avoid multi readings on the disk), I have to optimize the use on VRAM because the goal in a long term is to display several of those objects and do several changes in the modelview matrix. So I can’t afford to display all the objects in the bigger resolution all the time, otherwise the VRAM would be saturate and swaps in the system memory could be done (whenever allowed by the card’s driver), slowing down really badly the smoothness of the scene.

For now I’m just worried about the geometry, topology and normal vectors. No texture, no colors. Since the several vertex can represent triangles in different levels of detail, I have one normal per vertex and per level of detail. So if you see how it could work, please, I’m all ears :stuck_out_tongue:

Some of the huge models (e.g, 300M triangles) are already presented to me as separated parts (but still some millions of triangles). I can’t split them more than this, or, at least I cant do an automatic thing to do it since the topology do not describe what the object are. The only way to split them would be manually, but it would be really a not cool task since the models are of anything (cars, airplanes, animals, humans, skeleton…)

Unfortunately I can’t use stuff of only one brand…

Alfonse, thx too for your time and for the concern on this. I’ll try to understand the code you put in your last post. Thanks very much :wink:

Cheers

Leo

It’s no big deal. Definitely don’t start another thread. This one’s fine.

[quote]
Well, the normal way of pushing vertices down the pipe doesn’t really support this. There is one “position” vertex attribute that gates the whole process, and it is pulled from a single bound VBO (or client array). So clearly we need something … unconventional…

So you are saying that the positions work individually for each VBO. So I can’t really do something like: inform to index buffer that index number 3 actually refers to position 0 in the second buffer, isn’t it?[/QUOTE]
With the normal way of submitting batches, right. For a single batch, each vertex attribute resides within one buffer.

But as mentioned, we can use some tricks to grab “what would be vertex attribute data” from GPU memory via other means than vertex attributes – namely texture buffers or buffer pointers.

Actually, I didn’t know the glDrawElementsInstanced() function, because it seems that it has been introduced into the version 3.1 of openGL

Yeah, well this has been out there for the last 2 years and 5 generations of GPUs as EXT_draw_instanced and ARB_draw_instanced. Even though it only went core early last year. So lots of hardware out there supports it (GeForce 8+ on NVidia).

Where can I find the reference pages for the more recent versions (3.0, 3.1, 3.2, 3.3, 4.0)?

Not sure. The latest Red Book probably discusses it.

But, I still don’t get one thing. Suppose the only vertex attribute I care about is the vertex coordinates (for exemple, I’m drawing a wireframe version of my mesh). So I create two VBOs, one with vertices V0, V1 and V2 and other with V3, V4 and V5, and two IBOs, one with the index list of the fist triangle and the other with the index list of the 4 new triangles. Now, what am I supposed to do to draw the new 4 triangles from these 2 different VBOs using the second index list? Are you telling me to draw 1 instance of the first VBO and 4 instances of the second VBO and use the gl_InstanceID and gl_VertexID values to mixe up the vertex attribs and index values? Now my brain collapsed xD Please, clarify it to me :slight_smile:

Oh, I think I see your confusion. Let me clarify:

Your “big VBOs/IBOs” – the ones that contain your “real” vertex attribute and index data – are not fed in as vertex attributes and index lists for the glDrawElementsInstanced batches (because of the one buffer per vertex attribute or index list restriction). They would be fed into the vertex shader for these batches as texture buffers (TBOs).

So what goes in for the vertex attribute and index lists for the glDrawElementsInstanced call? Dummy data for a “single” triangle (3 vertices), which you’re not even gonna use. You’re just gonna throw it away in the vertex shader.

In the vertex shader, you go poking into your TBOs to grab the “real” vertex indices (and buffer index) and then vertex attributes that you want for that vertex shader run.

One other point of clarification. With normal glDrawElements batches as you know, there’s one buffer per vertex attribute (VBO), and there’s one buffer for the index list (IBO). The elements of the IBO index into the elements of the VBO. There’s no need for any “buffer handle/pointer/index/etc.” in the IBO in this scheme because for each vertex attribute, there is only one buffer object. …

… HOWEVER, in your case you want the ability of having a single triangle pull vertex attributes from “multiple” buffer objects (e.g. vtx 0 from buffer 0 offset 2, vtx 1 from buffer 0 offset 3, vtx 2 from buffer 1 offset 99). So different from plain glDrawElements indices, your index buffers (accessed through TBOs) need to have both a buffer index as well as an offset within the buffer to support that – at least for the triangles that cross batches. The specific encoding is up to you of course since you’ll be deciding how you want to store and lookup data from your TBOs.

hi DP, I’m fusing my brain on trying to apply a code to do what you propose, but I’ve not being really successful. Could I ask you to give me a little example on how to draw these 4 triangles given that I have two VBO (with three vertex each) and one IBO (with the 4 sets of 3 index - from 0 to 5) that I posted in the orinal post (i.e., VerticesT1, VerticesT2 and IdxT2 respectively). With an example it would much more easy to me to expand it to my models. If you agree it would be really nice :slight_smile: In case not, I would understand.

The code that Alfonse posted seems to be GLSL (I think?), which I really don’t know.

Thanks for your time guys,

leo

The code that Alfonse posted seems to be GLSL (I think?), which I really don’t know.

You have to use GLSL or some other form of shader. OpenGL cannot normally do what you’re asking, so you have to do an end-run around its normal vertex attribute logic and pull the attributes yourself in the vertex shader.

What does exactly the tessellation do regarding the memory management? The geometry of the base mesh we pass is saved in a separate buffer and duplicated to form the new big geometry or it is displaced in new buffers? Or it is more complicated than this? Maybe I could try to copy what is done :smiley: Duno :stuck_out_tongue:

If shader programming is the only way to go, I’m afraid I’ll accept a less ideal solution, I don’t really want to brave the shade programing by now.

leo

Maybe I could try to copy what is done

What you’re doing is pretty rarely necessary, and incurs a non-trivial performance penalty in your vertex processing. So there isn’t a body of other programs that do something like this which you can copy from.

It is certainly doable with GL 3.0 and above. But it is not a switch you just turn on.

this is what I want to find out :smiley: If you know someone that you think that could know how to do it, I would appreciate you send him the link to this thread :slight_smile:

If you know someone that you think that could know how to do it, I would appreciate you send him the link to this thread

We have told you how to do it. I even gave you a bit of GLSL pseudo-code for how to do it. But like I said, it’s not simple. You will have to use shaders to do it.

You said, “If shader programming is the only way to go, I’m afraid I’ll accept a less ideal solution, I don’t really want to brave the shade programing by now.” There is no way to do what it is you want without shaders. So there is nothing you can do, nobody that I could send to the thread.

What you want is a non-shader-based way to do this. It doesn’t exist.

quite what I was afraid to admit, but well, reality maybe be cruel sometimes :slight_smile:

I thank you for your time. I’ll keep your code in mind and will try to implement it in a not so long future. By now I have to tune my tessellation like algorithm.

thanks again for your support,

leo

We could finally have a good solution for all this. Well, we did it quite a long ago now, but it’s just now that it occurred to me that I’ve never given a final word on this here, for completeness purpose and to honor the time that DP and Alfonse invested on this thread (and this maybe help someone else someday, who knows!) I would like to reference this post http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=279766&page=1 where we also discuss about this same problem. As Alfonse said above, the solution was indeed to use GLSL. We started from DP code and, with a few modifications, we could address our requirements.

thx again DP and Alfonse,

leo