Element Buffer Objects?

Hi Folks:

Element Buffer Objects sound like great things.

I don’t see an EBO option when exporting an OBJ model in Blender. Perhaps it has a different name.

It’s nice to see sample code that shows 4 vertices and an index array representing what took 6 vertices.

Where does the magic happen that turns 600,000 vertices into 400,000 vertices and an index array?

Are there downsides to using EBOs? I’m wondering how normals are represented, and about the overhead of having every access to a vertex be through a reference in an index array.

Thanks
Larry

there arent any

the “key word” is “indexed drawing” / “indexed rendering”

there is no magic, consider the info you give GL:
glDrawElements(mode, count, type, indices);
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glDrawElements.xhtml

– mode: like in “usual” rendering, tells GL how to assemble vertices to primitives
– cout: like in “usual” rendering, tells GL how many vertices to process
– type: unlike in “usual” rendering, tells GL the data type of the indices in the element buffer
(either GL_UNSIGNED_BYTE (1Byte), GL_UNSIGNED_SHORT (2Bytes) or GL_UNSIGNED_INT (4Bytes))
– indices: unlike in “usual” rendering, tells GL the offset in the element buffer in bytes, divide that number by the size of 1 element index (either 1, 2 or 4) and you’ll get the offset in vertices

instead of giving GL a range of vertices to process, you giv GL an element buffer and arange in that buffer, GL then draws indices from it, looks up the vertex data at this array index and processes that vertex

downside: ? maybe the use of an additional buffer you have to manage
upside: a (small ?) performance boost, maybe (a little ?) less memory

https://www.khronos.org/opengl/wiki/Post_Transform_Cache

the post-transform cache stores processed vertices, and will prevent processing 2 same vertices twice, to identify processed vertices it only compares gl_VertexID and gl_InstanceID, if these match, it’ll skip that vertex and look up the result in the cache

regarding .obj file format:
you’ll get different indices for position / texcoord / normal, but GL allows only to use 1 element buffer, not 3, so you have to rebuild that obj vertex array anyway

you have:
vec3 obj_positions[MAX1];
vec2 obj_texcoords[MAX2];
vec3 obj_normals[MAX3];

you goal is to have 2 arrays, 1 of type “Vertex” and anothe of type unsigned int (faces)
struct Vertex {
vec3 Position;
vec2 TexCoord;
vec3 Normal;
};

you have to make use of <map>s to re-sort the vertex data efficiently

Thanks John:

I’m just curious about this feature.

It appears that the magic happens inside of Assimp, and the indices are delivered to the application in aiMesh’s mFaces. I’m satisfied with that.

As always, I appreciate your input.

  Thanks
  Larry

Yep. All attributes must be the same. If one component of a normal is different, then you’ll have another index. If one component of the texture coordinate is different, then the same. Same for colors and any other attributes.
So, this does not go well with cubes, pyramids and so on.
It goes well with any model with no continuity break for all of the attributes.

Indexed rendering is actually the optimal path on desktop hardware and been so for almost 20 years: the items you’re concerned about are not problems.

Some historical context. In 1999 the game Quake III Arena was released and it’s lead developer wrote the following document providing advice for driver writers: Optimizing OpenGL drivers for Quake3

Quake3’s rendering architecture has been defined with the primary goal of minimizing API calls and focusing as much work as possible in a single place to make optimization more productive.

During gameplay, 99.9% of all primitives go through a single API point:

glDrawElements( GL_TRIANGLES, numIndexes, GL_UNSIGNED_INT, indexes );

In addition to removing duplicate vertices, which is typically the first thing people notice, using indices also confers two other advantages.

Your 3D card’s post-transform vertex cache can only work if you use indices.

A model that is represented by multiple strips and/or fans requiring multiple draw calls can be easily converted to a triangle soup that only requires a single draw call.

Thanks Again Folks:

Bypassing the vertex shader. That’s a powerful advantage of using an EBO on a model of an irregular mesh that might result in almost as many, or as many, vertices with the addition of a list of corresponding indices.

It seems like any camera or object movement would invalidate the vertex cache, at least for the moving object.

I appreciate the response to my question, and I know a lot more than I did before asking. But this conversation is getting to be quite a bit above my head.

The sample code in this tutorial includes an interface to Assimp, which I’ve now taken the time to study.

Larry

The vertex cache only persists for a single draw call. It just means that vertices which are common to multiple primitives aren’t recalculated. For a smooth surface which is tessellated into triangles, each vertex will typically be used by between five and six triangles.

This can’t be emphasised enough.

Typically when people consider the benefits of adding indices, they will focus on memory saving from reducing the number of vertices, and challenge the theoretical saving by providing examples such as cubes or pyramids, where the saving doesn’t actually exist. However, in real-world usage there does tend to be significantly higher vertex reuse.

Take the Stanford Bunny as an example: The Stanford 3D Scanning Repository - that clocks in at 35947 vertices and 69451 triangles, or in other words 35947 vertices and 208353 indices. So therefore each vertex is reused on average 5.7 times.

In that example, and if vertex ordering is set up for optimal cache coherency, we can cut our vertex processing overhead by a factor of almost 6.

That doesn’t always translate to a 6x overall performance increase, and it shouldn’t be expected to. Your program may be bottlenecked elsewhere (typically on the CPU or by fillrate). It does however ably illustrate that the typical cube-based challenges to indexing are quite bogus for the general case.

[QUOTE=mhagain;1287446]This can’t be emphasised enough.

Take the Stanford Bunny as an example: The Stanford 3D Scanning Repository - that clocks in at 35947 vertices and 69451 triangles, or in other words 35947 vertices and 208353 indices. So therefore each vertex is reused on average 5.7 times.

In that example, and if vertex ordering is set up for optimal cache coherency, we can cut our vertex processing overhead by a factor of almost 6.[/QUOTE]

how big are these caches “nowadays” ? for example, for a NVIDIA GT 640, where can i get the memory size of it, couldnt find it in the cards’ specs anywhere

Years ago (early 2000s), these post-transform caches had a fixed size in numbers of vertices.

For the last 10 years or so (since GPUs have gone the unified shader processor route), these post-transform caches are now sized dynamically based on the amount of shared memory that the Compute Units (Streaming Multiprocessors) have access to (AFAIK). The more “interpolator” data (aka varyings) you pass between the vertex and fragment pipes, the smaller your post-transform vertex cache is in terms of number of vertices.

This is something really useful to know. I thought post T&L cache was still fixed these days.