Indexing per vertex attribute

Hello all, this is my first post on these forums.

I come from a console platform background (most recently, GameCube) where we were able to access each vertex attribute (position, normal, texcoord, etc) with its own unique index. Recently I became interested in attempting to add OpenGL support to our engine, and found out that a single index represents an entire vertex for all attributes. In my opinion, this leads to unnecessary vertex duplication for shared vertices because not all vertices at the same position have the exact same attributes (for example, different texture coordinates) for all of the polygons attached to it. Having an index for each attribute would allow vertex data to be more compact, and repetitive data would be eliminated, resulting in overall better performance. Of course, how much performance would increase would be entirely dependant on the model itself (and how well the indices are reused).

I originally started a thread related to this on gamedev.net where I describe an example case. At the time I was curious if there was an extension I had overlooked, but it seems apparent that current hardware does not support this functionality.

I would like to propose that this functionality be added as a new extension into OpenGL. If necessary, I am willing to write up a complete specification with documentation and function calls necessary to implement it from a developer standpoint. However, I’m not sure on what other people’s opinions on this are. If there are a large number of developers who would find something like this interesting, then I would like to know how to proceed (sending mail to nVidia or ATI, perhaps?). If there isn’t much of a response, I’ll just stick with what we have now. :slight_smile: However, this specific functionality was very useful on the GameCube in keeping memory requirements small, and most definitely helped improve performance with data being more cache-coherent. I personally think this would be a very good benefit for the entire 3D community.

Thoughts/comments, anyone?

I don’t think it’s worth it.

  1. The memory save in typical cases is small. The only places where I can see a save is at hard edges (no shared normal) or seams between different materials (different texture coordinates). For low-poly models perhaps the ratio between edge vertices and smooth vertices is large enough, but otherwise I don’t think you’ll see a good enough save, or a save at all (all the extra indices take space too).

  2. It’s bad for memory access to grab vertices from all over the place.

  3. Hardware complexity increases a lot. For the post-shader vertex cache you currently only need to look at a single index to determine if it’s in the cache. With separate indices the cache would have to look at a large set of indices to see if they all match an item that’s in the cache.

i’d go along with humus on the basic points, but this would be a nice thing for sure if it fit into the current technology and didn’t cost anything. in that sense, the current implausibility is really more about how technology has been driven thus far by the typical usage model in games. with that in mind, i think the hardware details are largely irrelevant when considering the merits of a feature like this to begin with. obviously hardware can be changed/improved if the demand is there. but while there has been some noise in this area, no eardrums are bleeding.

i can live without a feature like this, but i wouldn’t be offended by its existence either… not if it was free, anyways ;-).

season’s greetings.

bpoint :
I would be very interested to have a read at the basic specifications needed for this. If you could write a short desc I think it will do my brain a lot of good.
As we are making a lot of research on non photo realistic rendering, we need to use a lot of different datas / vertex (several color channels, several mapping channels for custom continuous Z screen mapping, curvature for silhouette width, etc.) added to the standard ones.

We are often limited by vertex bandwidth due to the heavy replication. Doubtless, new techniques will need to use these datas too.

SeskaPeel.

This is a very common question, and here’s a common answer :slight_smile:

If I remember rightly, the gamecube uses an ATI chipset for its graphics - so, as we know ATI don’t think it’s a worthwhile feature (Humus speaks for ATI), so, that must mean that the OpenGL-based gamecube API must be transforming the vertex data on the cpu before submitting it to the ATI pipeline!
So, you shouldn’t really be using vertex data like this, bpoint.

Hello all.

How much attributes are exist? In GLSL spec it declared as const int gl_MaxVertexAttribs = 16. What on earth shall do with only one index? :slight_smile:

[EDIT: may the Lord help you, bpoint…]

http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/005059.html :

Originally posted by jwatte:
We actually did measurements on real, live meshes, and it turned out that saving them using two (or three, for colors) index arrays would result in LARGER files than just normalizing the vertices; mainly beacuse of the increased size of the extra index arrays.
This is enough for me.

Originally posted by bpoint

However, this specific functionality was very useful on the GameCube in keeping memory requirements small, and most definitely helped improve performance with data being more cache-coherent

I fail to see, how having data in random (for the GPU) locations would lead to better cache coherence?

If you are talking specifically about a BSP mesh (or for any other mesh where textures change and hence there is a need to change texture coordinates which therefore cannot be shared, and are hence duplicated) then i agree that it can lead to a quite a few redundant vertexes, sometimes as many as those in the base mesh (i.e. no. of vertexes doubles). But considering that even large BSP maps don’t contain more than a few thousand vertices i think that such redundancy hardly effects the performance both in terms of speed and memory usage. Modern day GPUs are capable of pumping millions of vertices easily per frame. Memory has gotten cheaper too, with even low end consumer cards featuring 256Megs of RAM. Realisticly speaking, if you are not loading many BSP meshes (or whatever format that you use) at any given time, you won’t be wasting much.
I think such a feature “might” lead to more and more people developing applications with poor performance.

If I remember rightly, the gamecube uses an ATI chipset for its graphics - so, as we know ATI don’t think it’s a worthwhile feature (Humus speaks for ATI), so, that must mean that the OpenGL-based gamecube API must be transforming the vertex data on the cpu before submitting it to the ATI pipeline!
No, the GameCube used an ArtX chipset. ArtX was purchased by ATi about a year before the GC’s release. The GC’s GPU is specifically designed for the ability to use multiple indices for multiple vertex attributes, and it does so natively without CPU interference.

I fail to see, how having data in random (for the GPU) locations would lead to better cache coherence?
If you use the same position index more than once, the second time you use it that position may still be in the pre-T&L cache. So you save memory bandwidth, compared to uploading the same duplicated vertex position data with the traditional method.

Originally posted by Zulfiqar Malik:
[b]

But considering that even large BSP maps don’t contain more than a few thousand vertices i think that such redundancy hardly effects the performance both in terms of speed and memory usage.

[/b]

Heh, if you’re talking about the days of Quake or maybe Quake 2, yes, you had 1000 - 2000 vertices PER FRAME. But if we’re talking about recent days, say Doom3, HF2 or whatever, I wouldn’t wonder if the total vertex count per level is over a million or so. A simple example.

Total number of vertices = 1.000.000
Average vertex polygon sharing = say 3

That’d mean we’d had to have three copies of the vertex positions, asuming that all other atributes are varying per face. So 12 (3 floats) * 1.000.000 * 2 = 24.000.000 bytes ADDITIONAL memory usage. Now subtract the memory used up by the indices, 2 * 4 (sizeof(int)) * 1.000.000 = 8.000.000 bytes. So, the actual savings would be ~16MB.

16MB is an acceptable additional memory footprint, if we take into account that we already have 512MB cards, and the 1GB models are soon to follow.

In one year, we’d have 2 million vertices per level, well then we’d have to sacrifice 32MB. As already said, I think that’s acceptable; I mean, a 512x512 RGB16F cubemap already takes up 24MB…

the problem with your sums are that you are assuming that there are many vertices which need to be shared and duplicated because of a slight change in details, but based on pervious discussions on the subject I’m pretty sure that in real world data the amount of sharing to that degree is pretty low, as such you’ll never see those savings.

Well, consider an average Quake level; the attributes (texture & lightmap coords) vary per face, so in fact these savings aren’t that far off. You wouldn’t get such savings on models with a single or a few textures, say items, humans, weapons etc. tough…

Originally posted by HellKnight:
Well, consider an average Quake level; the attributes (texture & lightmap coords) vary per face, so in fact these savings aren’t that far off. You wouldn’t get such savings on models with a single or a few textures, say items, humans, weapons etc. tough…
In Quake 1, texture/lightmap coordinates were computed by projecting the vertex position in a texture basis. There were a few duplicated vertices because of that.

But in Quake 3 there is no duplicate vertices at all.

Let’s do the maths:
V=nbr vertices
F=nbr faces
V2=nbr vertices we would have to duplicate
The vertices have a pos, and 2 texture coordinates (texture+lightmap).
indices are shorts.

case A) 1 index per vertex in each face:
mem = V*(3+2+2)4 + F32 + V2(3+2+2)*4

case B) 3 indices per vertex in each face:
mem = V*(3+2+2)4 + F332

diff = F322 - V2(3+2+2)4
diff = 12
F - 28V2
diff == 0 if V2 = F
12/28
diff ==0 if V2 = 42.8% F
We know that most of the time F >> V. Let’s assume F == V. We have to have more than 43% of the vertices being duplicated before we save some memory !
If F = 2*V, that’s still about at least 22% of duplicated vertices.

Recently I was doing some research in the field of reducing number of draw calls in the penumbra-wedge soft shadow rendering algorithm. Up until then single index per vertex didn’t seem to be a problem for me, although “normalizing” the data into GL acceptable form has always felt somewhat clumsy. But during the recent research I came to a point, when I had to replicate same data up to as many as 28 (!!!) times for each vertex. As an alternative way of implementation I tried to use fragment programs in pseudo render-to-vertex array scheme using PBOs. In the fragment program implementation I used texture sampling as a way of emulating the flexibility offered by independent vertex attribute indexing. But as the original algorithm is strongly bound by fragment processing rate, the aforementioned solution rather slowed things down than sped them up. In my case simple ability to specify per attribute frequency would suffice, something like D3D instancing. Some could argue that this functionality is available on SM3.0 hardware, that is the same hardware that supports vertex texturing and we can emulate the flexible data fetching offered by per attribute indexes using vertex texture sampling. But IMHO explicit representation of such functionality in API would be beneficial for both ease and clarity of use by developers and the extended ability to emulate it on architectures which don’t directly support it in hardware by for example in place expansion of data (whereas vertex texturing could not so easily be emulated). Of course my case could be totally isolated, but I think that as graphics algorithms grow more and more complicated such demands will surface more frequently (as we are witnessing on this forum from quite a while). And the arguments about increased amount of memory used by additional indexes don’t really convince me, because the solution potentially can provide 1:4 memory conservation ratio (assuming) 4-valued attribute vectors. Measurements taken for “usual” scenes and “conventional” attribute usage not always reflect all the potential usage schemes for such a functionality. As for the cache coherence argument and complexity of implementation, I would say that hardware vendors could use very basic way of implementation, skipping caching where it is to difficult to implement and let developers decide to use this functionality or not based on results that they get in their unique case. Needles to say that implementing this “the right way” :wink: could lead to healthy competition between hardware vendors.

Originally posted by tfpsly:

But in Quake 3 there is no duplicate vertices at all.

What do you mean, in Q3 there are no duplicated vertices at all?!? Always when two or more faces share some vertices, and that’s ALL the time or there wouldn’t be closed geometry at all, the vertex position (=12 bytes) is duplicated. Yeah, sure, the attributes are stored directly in the vertices instead of using texture planes in order to save disk space, but some of the vertex positions are still duplicated.

I must admit I don’t get your calculations. Why do you need the number of faces actually?

Originally posted by HellKnight:
[quote]Originally posted by tfpsly:

But in Quake 3 there is no duplicate vertices at all.

What do you mean, in Q3 there are no duplicated vertices at all?!? Always when two or more faces share some vertices, and that’s ALL the time or there wouldn’t be closed geometry at all, the vertex position (=12 bytes) is duplicated. Yeah, sure, the attributes are stored directly in the vertices instead of using texture planes in order to save disk space, but some of the vertex positions are still duplicated.
[/QUOTE]Hum sorry, the Q3 files already have the duplicate vertices, I was just looking at my loader code :frowning:

I must admit I don’t get your calculations. Why do you need the number of faces actually?
To compute the number of indices :stuck_out_tongue:
1 face = 3 vertex indices or = 3 vertex pos indices + 3 texture uv indices + 3 lightmap texture uv indices.

I remember that at work, in the Collada file format we get after exporting from the 3d editor, we do get indexed pos & indexed uv and I had to rebuild our own vertices when writing the Collada importer. But the amount of duplicates is very low in our case.

Originally posted by HellKnight:
What do you mean, in Q3 there are no duplicated vertices at all?!? Always when two or more faces share some vertices, and that’s ALL the time or there wouldn’t be closed geometry at all, the vertex position (=12 bytes) is duplicated. Yeah, sure, the attributes are stored directly in the vertices instead of using texture planes in order to save disk space, but some of the vertex positions are still duplicated.
Oh, wait! In Q1 and Q3 maps, you actually do not need at all the texture coordinates! You may generate them in the vertex program with 1 dot3 and 1 sub as long as you set one vector and one float (which is just a vec4) VP constant each time you change the current texture!

And to display the other models (characters, weapons, bonus, …), you would just proceed as usual, as there are very few duplicate vertices

Originally posted by tfpsly:
Oh, wait! In Q1 and Q3 maps, you actually do not need at all the texture coordinates! You may generate them in the vertex program with 1 dot3 and 1 sub as long as you set one vector and one float (which is just a vec4) VP constant each time you change the current texture!

Oh yes :slight_smile: , you mean everytime you draw a face you’d set up some uniform vertex program parameters, change texture states etc… In that case I’d just make all the computations on the CPU and use intermediate mode hehe :slight_smile:


To compute the number of indices [Razz]
1 face = 3 vertex indices or = 3 vertex pos indices + 3 texture uv indices + 3 lightmap texture uv indices.

Ah, now it seems logical :slight_smile: I just didn’t asume you mean triangles by faces, cause most of the faces in Quake have more than just 3 vertices. Never mind…