PDA

View Full Version : Indexing per vertex attribute



Falken42
11-30-2005, 03:38 PM
Hello all, this is my first post on these forums.

I come from a console platform background (most recently, GameCube) where we were able to access each vertex attribute (position, normal, texcoord, etc) with its own unique index. Recently I became interested in attempting to add OpenGL support to our engine, and found out that a single index represents an entire vertex for all attributes. In my opinion, this leads to unnecessary vertex duplication for shared vertices because not all vertices at the same position have the exact same attributes (for example, different texture coordinates) for all of the polygons attached to it. Having an index for each attribute would allow vertex data to be more compact, and repetitive data would be eliminated, resulting in overall better performance. Of course, how much performance would increase would be entirely dependant on the model itself (and how well the indices are reused).

I originally started a thread related to this on gamedev.net (http://www.gamedev.net/community/forums/topic.asp?topic_id=360924) where I describe an example case. At the time I was curious if there was an extension I had overlooked, but it seems apparent that current hardware does not support this functionality.

I would like to propose that this functionality be added as a new extension into OpenGL. If necessary, I am willing to write up a complete specification with documentation and function calls necessary to implement it from a developer standpoint. However, I'm not sure on what other people's opinions on this are. If there are a large number of developers who would find something like this interesting, then I would like to know how to proceed (sending mail to nVidia or ATI, perhaps?). If there isn't much of a response, I'll just stick with what we have now. :) However, this specific functionality was very useful on the GameCube in keeping memory requirements small, and most definitely helped improve performance with data being more cache-coherent. I personally think this would be a very good benefit for the entire 3D community.

Thoughts/comments, anyone?

Humus
11-30-2005, 08:10 PM
I don't think it's worth it.

1) The memory save in typical cases is small. The only places where I can see a save is at hard edges (no shared normal) or seams between different materials (different texture coordinates). For low-poly models perhaps the ratio between edge vertices and smooth vertices is large enough, but otherwise I don't think you'll see a good enough save, or a save at all (all the extra indices take space too).

2) It's bad for memory access to grab vertices from all over the place.

3) Hardware complexity increases a lot. For the post-shader vertex cache you currently only need to look at a single index to determine if it's in the cache. With separate indices the cache would have to look at a large set of indices to see if they all match an item that's in the cache.

Brolingstanz
11-30-2005, 08:42 PM
i'd go along with humus on the basic points, but this would be a nice thing for sure if it fit into the current technology and didn't cost anything. in that sense, the current implausibility is really more about how technology has been driven thus far by the typical usage model in games. with that in mind, i think the hardware details are largely irrelevant when considering the merits of a feature like this to begin with. obviously hardware can be changed/improved if the demand is there. but while there has been some noise in this area, no eardrums are bleeding.

i can live without a feature like this, but i wouldn't be offended by its existence either... not if it was free, anyways ;-).

season's greetings.

SeskaPeel
11-30-2005, 10:05 PM
bpoint :
I would be very interested to have a read at the basic specifications needed for this. If you could write a short desc I think it will do my brain a lot of good.
As we are making a lot of research on non photo realistic rendering, we need to use a lot of different datas / vertex (several color channels, several mapping channels for custom continuous Z screen mapping, curvature for silhouette width, etc.) added to the standard ones.

We are often limited by vertex bandwidth due to the heavy replication. Doubtless, new techniques will need to use these datas too.

SeskaPeel.

Tom Nuydens
12-01-2005, 01:04 AM
This is a very common question, and here's a common answer (http://www.mindcontrol.org/~hplus/graphics/vertex-arrays.html) :)

knackered
12-01-2005, 03:29 AM
If I remember rightly, the gamecube uses an ATI chipset for its graphics - so, as we know ATI don't think it's a worthwhile feature (Humus speaks for ATI), so, that must mean that the OpenGL-based gamecube API must be transforming the vertex data on the cpu before submitting it to the ATI pipeline!
So, you shouldn't really be using vertex data like this, bpoint.

Racoon
12-01-2005, 04:52 AM
Hello all.

How much attributes are exist? In GLSL spec it declared as const int gl_MaxVertexAttribs = 16. What on earth shall do with only one index? :)

[EDIT: may the Lord help you, bpoint...]

tfpsly
12-01-2005, 07:40 AM
http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/005059.html :

Originally posted by jwatte:
We actually did measurements on real, live meshes, and it turned out that saving them using two (or three, for colors) index arrays would result in LARGER files than just normalizing the vertices; mainly beacuse of the increased size of the extra index arrays.This is enough for me.

Zulfiqar Malik
12-01-2005, 08:05 AM
Originally posted by bpoint

However, this specific functionality was very useful on the GameCube in keeping memory requirements small, and most definitely helped improve performance with data being more cache-coherent

I fail to see, how having data in random (for the GPU) locations would lead to better cache coherence?

If you are talking specifically about a BSP mesh (or for any other mesh where textures change and hence there is a need to change texture coordinates which therefore cannot be shared, and are hence duplicated) then i agree that it can lead to a quite a few redundant vertexes, sometimes as many as those in the base mesh (i.e. no. of vertexes doubles). But considering that even large BSP maps don't contain more than a few thousand vertices i think that such redundancy hardly effects the performance both in terms of speed and memory usage. Modern day GPUs are capable of pumping millions of vertices easily per frame. Memory has gotten cheaper too, with even low end consumer cards featuring 256Megs of RAM. Realisticly speaking, if you are not loading many BSP meshes (or whatever format that you use) at any given time, you won't be wasting much.
I think such a feature "might" lead to more and more people developing applications with poor performance.

Korval
12-01-2005, 10:34 AM
If I remember rightly, the gamecube uses an ATI chipset for its graphics - so, as we know ATI don't think it's a worthwhile feature (Humus speaks for ATI), so, that must mean that the OpenGL-based gamecube API must be transforming the vertex data on the cpu before submitting it to the ATI pipeline!No, the GameCube used an ArtX chipset. ArtX was purchased by ATi about a year before the GC's release. The GC's GPU is specifically designed for the ability to use multiple indices for multiple vertex attributes, and it does so natively without CPU interference.


I fail to see, how having data in random (for the GPU) locations would lead to better cache coherence?If you use the same position index more than once, the second time you use it that position may still be in the pre-T&L cache. So you save memory bandwidth, compared to uploading the same duplicated vertex position data with the traditional method.

HellKnight
12-01-2005, 10:39 AM
Originally posted by Zulfiqar Malik:


But considering that even large BSP maps don't contain more than a few thousand vertices i think that such redundancy hardly effects the performance both in terms of speed and memory usage.


Heh, if you're talking about the days of Quake or maybe Quake 2, yes, you had 1000 - 2000 vertices PER FRAME. But if we're talking about recent days, say Doom3, HF2 or whatever, I wouldn't wonder if the total vertex count per level is over a million or so. A simple example.

Total number of vertices = 1.000.000
Average vertex polygon sharing = say 3

That'd mean we'd had to have three copies of the vertex positions, asuming that all other atributes are varying per face. So 12 (3 floats) * 1.000.000 * 2 = 24.000.000 bytes ADDITIONAL memory usage. Now subtract the memory used up by the indices, 2 * 4 (sizeof(int)) * 1.000.000 = 8.000.000 bytes. So, the actual savings would be ~16MB.

16MB is an acceptable additional memory footprint, if we take into account that we already have 512MB cards, and the 1GB models are soon to follow.

In one year, we'd have 2 million vertices per level, well then we'd have to sacrifice 32MB. As already said, I think that's acceptable; I mean, a 512x512 RGB16F cubemap already takes up 24MB...

bobvodka
12-01-2005, 11:13 AM
the problem with your sums are that you are assuming that there are many vertices which need to be shared and duplicated because of a slight change in details, but based on pervious discussions on the subject I'm pretty sure that in real world data the amount of sharing to that degree is pretty low, as such you'll never see those savings.

HellKnight
12-01-2005, 11:27 AM
Well, consider an average Quake level; the attributes (texture & lightmap coords) vary per face, so in fact these savings aren't that far off. You wouldn't get such savings on models with a single or a few textures, say items, humans, weapons etc. tough...

tfpsly
12-01-2005, 11:41 AM
Originally posted by HellKnight:
Well, consider an average Quake level; the attributes (texture & lightmap coords) vary per face, so in fact these savings aren't that far off. You wouldn't get such savings on models with a single or a few textures, say items, humans, weapons etc. tough...In Quake 1, texture/lightmap coordinates were computed by projecting the vertex position in a texture basis. There were a few duplicated vertices because of that.

But in Quake 3 there is no duplicate vertices at all.

tfpsly
12-01-2005, 11:54 AM
Let's do the maths:
V=nbr vertices
F=nbr faces
V2=nbr vertices we would have to duplicate
The vertices have a pos, and 2 texture coordinates (texture+lightmap).
indices are shorts.

case A) 1 index per vertex in each face:
mem = V*(3+2+2)*4 + F*3*2 + V2*(3+2+2)*4

case B) 3 indices per vertex in each face:
mem = V*(3+2+2)*4 + F*3*3*2

diff = F*3*2*2 - V2*(3+2+2)*4
diff = 12*F - 28*V2
diff == 0 if V2 = F*12/28
diff ==0 if V2 = 42.8% F
We know that most of the time F >> V. Let's assume F == V. We have to have more than 43% of the vertices being duplicated before we save some memory !
If F = 2*V, that's still about at least 22% of duplicated vertices.

Dez
12-01-2005, 12:11 PM
Recently I was doing some research in the field of reducing number of draw calls in the penumbra-wedge soft shadow rendering algorithm. Up until then single index per vertex didn't seem to be a problem for me, although “normalizing” the data into GL acceptable form has always felt somewhat clumsy. But during the recent research I came to a point, when I had to replicate same data up to as many as 28 (!!!) times for each vertex. As an alternative way of implementation I tried to use fragment programs in pseudo render-to-vertex array scheme using PBOs. In the fragment program implementation I used texture sampling as a way of emulating the flexibility offered by independent vertex attribute indexing. But as the original algorithm is strongly bound by fragment processing rate, the aforementioned solution rather slowed things down than sped them up. In my case simple ability to specify per attribute frequency would suffice, something like D3D instancing. Some could argue that this functionality is available on SM3.0 hardware, that is the same hardware that supports vertex texturing and we can emulate the flexible data fetching offered by per attribute indexes using vertex texture sampling. But IMHO explicit representation of such functionality in API would be beneficial for both ease and clarity of use by developers and the extended ability to emulate it on architectures which don't directly support it in hardware by for example in place expansion of data (whereas vertex texturing could not so easily be emulated). Of course my case could be totally isolated, but I think that as graphics algorithms grow more and more complicated such demands will surface more frequently (as we are witnessing on this forum from quite a while). And the arguments about increased amount of memory used by additional indexes don't really convince me, because the solution potentially can provide 1:4 memory conservation ratio (assuming) 4-valued attribute vectors. Measurements taken for “usual” scenes and “conventional” attribute usage not always reflect all the potential usage schemes for such a functionality. As for the cache coherence argument and complexity of implementation, I would say that hardware vendors could use very basic way of implementation, skipping caching where it is to difficult to implement and let developers decide to use this functionality or not based on results that they get in their unique case. Needles to say that implementing this “the right way” ;) could lead to healthy competition between hardware vendors.

HellKnight
12-01-2005, 12:19 PM
Originally posted by tfpsly:

But in Quake 3 there is no duplicate vertices at all.What do you mean, in Q3 there are no duplicated vertices at all?!? Always when two or more faces share some vertices, and that's ALL the time or there wouldn't be closed geometry at all, the vertex position (=12 bytes) is duplicated. Yeah, sure, the attributes are stored directly in the vertices instead of using texture planes in order to save disk space, but some of the vertex positions are still duplicated.

I must admit I don't get your calculations. Why do you need the number of faces actually?

tfpsly
12-01-2005, 12:31 PM
Originally posted by HellKnight:

Originally posted by tfpsly:

But in Quake 3 there is no duplicate vertices at all.What do you mean, in Q3 there are no duplicated vertices at all?!? Always when two or more faces share some vertices, and that's ALL the time or there wouldn't be closed geometry at all, the vertex position (=12 bytes) is duplicated. Yeah, sure, the attributes are stored directly in the vertices instead of using texture planes in order to save disk space, but some of the vertex positions are still duplicated.Hum sorry, the Q3 files already have the duplicate vertices, I was just looking at my loader code :(


I must admit I don't get your calculations. Why do you need the number of faces actually?To compute the number of indices :p
1 face = 3 vertex indices or = 3 vertex pos indices + 3 texture uv indices + 3 lightmap texture uv indices.

I remember that at work, in the Collada file format we get after exporting from the 3d editor, we do get indexed pos & indexed uv and I had to rebuild our own vertices when writing the Collada importer. But the amount of duplicates is very low in our case.

tfpsly
12-01-2005, 12:43 PM
Originally posted by HellKnight:
What do you mean, in Q3 there are no duplicated vertices at all?!? Always when two or more faces share some vertices, and that's ALL the time or there wouldn't be closed geometry at all, the vertex position (=12 bytes) is duplicated. Yeah, sure, the attributes are stored directly in the vertices instead of using texture planes in order to save disk space, but some of the vertex positions are still duplicated.Oh, wait! In Q1 and Q3 maps, you actually do not need at all the texture coordinates! You may generate them in the vertex program with 1 dot3 and 1 sub as long as you set one vector and one float (which is just a vec4) VP constant each time you change the current texture!

And to display the other models (characters, weapons, bonus, ...), you would just proceed as usual, as there are very few duplicate vertices

HellKnight
12-01-2005, 12:56 PM
Originally posted by tfpsly:
Oh, wait! In Q1 and Q3 maps, you actually do not need at all the texture coordinates! You may generate them in the vertex program with 1 dot3 and 1 sub as long as you set one vector and one float (which is just a vec4) VP constant each time you change the current texture!
Oh yes :) , you mean everytime you draw a face you'd set up some uniform vertex program parameters, change texture states etc... In that case I'd just make all the computations on the CPU and use intermediate mode hehe :)




To compute the number of indices [Razz]
1 face = 3 vertex indices or = 3 vertex pos indices + 3 texture uv indices + 3 lightmap texture uv indices.

Ah, now it seems logical :) I just didn't asume you mean triangles by faces, cause most of the faces in Quake have more than just 3 vertices. Never mind...

tfpsly
12-01-2005, 01:04 PM
Originally posted by HellKnight:

Originally posted by tfpsly:
Oh, wait! In Q1 and Q3 maps, you actually do not need at all the texture coordinates! You may generate them in the vertex program with 1 dot3 and 1 sub as long as you set one vector and one float (which is just a vec4) VP constant each time you change the current texture!
Oh yes :) , you mean everytime you draw a face you'd set up some uniform vertex program parameters, change texture states etc... In that case I'd just make all the computations on the CPU and use intermediate mode hehe :) ?? You really do not understand how the UV are computed in the Quake games then...
I was speaking about setting up 1 vector const each time you change the current texture. Which mean once per material. At most as many time as you call a glDrawElements, but it can be less.

HellKnight
12-01-2005, 01:19 PM
Well, could you please explain then how the texture coords are expected to be generated - per texture? You have, say, 100 faces each one with a different orientation etc., but with the same texture... How do you compute the tex coords then? With a SINGLE vp uniform parameter... ??

Your approach would work if all the faces with a given texture lied in the same plane and had no texture transformations applied (rotation/scaling/translation) and that's rarely the case. Yeah, there's this special case in Q1/2 where the BSP compiler splits up faces > 256x256 units, but that's only because of lightmap issues and is not done in modern engines anymore, AFAIK...

Stephen_H
12-01-2005, 04:38 PM
There was an old post on these forums (I can't remember by whom, perhaps Jon Watte), where someone said they took a few game models and stored them as index per attribute and index per entire vertex.

The index per vertex was actually smaller because you didn't have to store duplicate index information. Except on hard edges (eg. where you'd want a double normal per position), the indices are the same.

I can still see how having an index per attribute would be quite useful in certain situations, but to convince me that it uses less RAM you will need to provide more evidence

Zulfiqar Malik
12-01-2005, 08:57 PM
Originally posted by Korval

If you use the same position index more than once, the second time you use it that position may still be in the pre-T&L cache. So you save memory bandwidth, compared to uploading the same duplicated vertex position data with the traditional method.

For that to work effectively, so that it actually gives performance benefits, the mesh would have to be extremely well behaved. Given a sufficiently large mesh, it would be non-trivial to produce such a well behaved mesh. A tri-strip would probably be more well behaved, "out of the box" with minimal effort.



Originally posted by Hellknight

Heh, if you're talking about the days of Quake or maybe Quake 2, yes, you had 1000 - 2000 vertices PER FRAME. But if we're talking about recent days, say Doom3, HF2 or whatever, I wouldn't wonder if the total vertex count per level is over a million or so. A simple example.

I highly doubt that, although i confess that i do not have first hand information. But considering that most of the objects in games are dynamic these days (due to physics), they are not part of the actual BSP mesh, but are added as seperate entities, which are rendered in another pass that does not require any vertex duplication. That leaves very few vertices in the BSP mesh alone. I think that even Doom3 and HL2 wouldn't have more than 200,000 tris even for a large BSP mesh.



Originally posted by Dez

Some could argue that this functionality is available on SM3.0 hardware, that is the same hardware that supports vertex texturing and we can emulate the flexible data fetching offered by per attribute indexes using vertex texture sampling.

Instancing was a part of SM2.0b which was supported by R3xx series.



Originally posted by Dez

As for the cache coherence argument and complexity of implementation, I would say that hardware vendors could use very basic way of implementation, skipping caching where it is to difficult to implement and let developers decide to use this functionality or not based on results that they get in their unique case. Needles to say that implementing this “the right way” ;) could lead to healthy competition between hardware vendors.

I disagree with that because too many options can result in more confusion and hence bad code in many applications. Couple that with different implementations from different vendors, and you are adding more complexity (and confusion) in code optimizations.
I think keeping in mind, the bloated index buffer that would result from using such a technique (as mentioned by bpoint), and the results of jwatte's analysis, i think that this technique would probably not provide many benefits.
But that's just my opinion :) .

Jan
12-02-2005, 12:08 AM
I highly doubt that, although i confess that i do not have first hand information. But considering that most of the objects in games are dynamic these days (due to physics), they are not part of the actual BSP mesh, but are added as seperate entities, which are rendered in another pass that does not require any vertex duplication. That leaves very few vertices in the BSP mesh alone. I think that even Doom3 and HL2 wouldn't have more than 200,000 tris even for a large BSP mesh.
I don't know about Half-Life, but Doom 3 certainly does not use BSP-trees in that old fashioned way anymore, at all.

There was some .plan-file from John Carmack, where he said, that the D3 engine is based on a sector-portal system (well, we all know that anyway). He did say, that there are still BSP-Trees used for certain stuff (most certainly for some sorting, or to find out in what sector you are), but that they are not used as in Q1-Q3.

I did use BSP-Trees quite a while and everyone who did, knows, that they are quite useless for such high-res levels which you see in modern games. The problem is, that because of splitting they generate a multiple of the triangle-count, that the original mesh has.

Of course, you can simply not split triangles and use BSP-trees only for rough front-to-back sorting (or for glass back-to-front, or so), but then it is more efficient to simply do rough front-to-back sorting with your sectors, because the next problem with BSP-Trees is, that they really don't care about materials and therefore state-changes.

So, in general, BSP-Trees are not very useful for modern-games. At least not to use them for rendering. They, of course, can still be useful in some other situations.

Therefore: Why do you use BSP-Trees as an example to explain how several vertex-indices could be useful???
And, yes, most BSP-Trees won't hold that many faces, simply because it would be mad, not because it is not necessary.

To the topic: Actually, i am quite happy not even to have the option to use several indices. This makes my life a lot easier, because with that feature i would need to test and optimize for another, quite complicated, case. And since it is doubtful that i could really speed up anything using the feature, i am happy not even to have the option to waste my time on it.

Certainly, there are special cases, where it is obvious, that the feature would be useful, but those are rare and not common in games. And that is, why the hw vendors won't care to think about it.

Jan.

Zulfiqar Malik
12-02-2005, 12:46 AM
Oringinally posted by Jan

Therefore: Why do you use BSP-Trees as an example to explain how several vertex-indices could be useful???
And, yes, most BSP-Trees won't hold that many faces, simply because it would be mad, not because it is not necessary.

First of all, never for once did i imply that several vertex indices are useful. Infact i implied quite the contrary :) . Secondly, i quoted BSP trees because that's the one example in my knowledge that leads to horrible duplication, and the format, as we all know is quite popular.

HellKnight
12-02-2005, 01:03 AM
Ok that's something I found on the net. It's a FarCry benchmark summary:





The results of my run are dumped in a file called regulator.log in the directory Far Cry\Levels\Regulator.

As you run each benchmark multiple times, the results just keep appending to the same log file. They look something like this:

TimeDemo Play Started , (Total Frames: 3044, Recorded Time: 55.50s) !TimeDemo Run 0 Finished. Play Time: 51.59s, Average FPS: 59.00 Min FPS: 39.39 at frame 2388, Max FPS: 88.46 at frame 866 Average Tri/Sec: 5386234, Tri/Frame: 91292 Recorded/Played Tris ratio: 1.11 !TimeDemo Run 1 Finished. Play Time: 49.08s, Average FPS: 62.03 Min FPS: 39.39 at frame 2388, Max FPS: 96.84 at frame 1450 Average Tri/Sec: 5420742, Tri/Frame: 87393 Recorded/Played Tris ratio: 1.16 TimeDemo Play Ended, (2 Runs Performed)


As you can easily see, we're talking about ~100.000 triangles PER FRAME. Ok, there are these large outdoor scenes in FarCry, but it's just an example. Now guess the total polygon count per map.

Oh yes, and FarCry isn't the newest game after all... Judging from the screenshots of the next Unreal game, for example, I could imagine having such a high poly count even in indoor scenes...




I think that even Doom3 and HL2 wouldn't have more than 200,000 tris even for a large BSP mesh.

200.000 * 3 = 600.000... not that far away from 1.000.000...

EDIT: You appended while I was typing. Well, I'm not pushing for that feature either, and if you read my first post carefully, you'll see that IMO we can live without it. I just wanted to give an example and stated that the savings wouldn't exceed some couple of megabytes, and that's neglectable nowadays...

Falken42
12-02-2005, 01:25 AM
Wow, I go away for a day and this thread gets lots of replies! A bit unexpected... I'll try to check it more often from here on. I noticed that there are a few posts which are straying a bit off-topic. Please try to stay on-topic as best as possible. :)

I'll reiterate a few points again, because it seems a few people might have missed them. First of all, I speak from experience. I've developed and shipped two games on the GameCube, and have worked on the DS a bit too. I've also done a few games on PSX, N64, and PS2 as well, but those are a different story.

For both of our GameCube titles, we utilized multiple indices for each vertex attribute because it saved us memory. Even with the extra indices, the models were overall smaller than if we used a single index to represent a vertex. This was not with simple test model data -- but the actual models used in the game. Overall, I'd say we probably saved on an average of 25% per model.

It seems that no one mentioned (or noticed?) that one of the big gains for using multiple indices is not just to eliminate duplicated position coordinates, but to eliminate duplicated normals and duplicated color values.

Normals are, IMHO, probably the biggest gain with using multiple indicies. You can design a big map with lots of vertices for lighting, but only use a small fraction of the same normals to represent it. Colors are often duplicated as well, but since they're four bytes and an index to represent it requires two bytes, you certainly haven't saved much.

I believe the cache-coherency issue was summarized by Korval rather well:

If you use the same position index more than once, the second time you use it that position may still be in the pre-T&L cache. So you save memory bandwidth, compared to uploading the same duplicated vertex position data with the traditional method.Remeber that is applies to _any_ attribute that you index, not just positional data.

If it takes some solid facts to get some heads turned, I'll code up a tool which reads some kind of model data and splits it out into two formats: one would be where each vertex is represented by a single index and data is duplicated, and the other would use multiple indices to reference unique vertex data. It might take me a few days to get it done, but hopefully it'll show that this functionality _is_ useful.

Finally, I'd like to add that there are at least six different ways (glBegin/glEnd, glArrayIndex, glDrawElements, etc) to draw vertex data in OpenGL. Why not one more? :)

Thanks for all of the replies, and keep the comments coming.

Edit: Another post was added while I was writing mine, and I'd like to comment on it.

EDIT: You appended while I was typing. Well, I'm not pushing for that feature either, and if you read my first post carefully, you'll see that IMO we can live without it. I just wanted to give an example and stated that the savings wouldn't exceed some couple of megabytes, and that's neglectable nowadays...I can certainly live without this feature too. I just have to manage multiple sets of data for multiple platforms. If this feature did exist, it would make OpenGL more programmer-friendly. Not just for me, but I'm sure for others as well. Also, a "couple of megabytes" might not make much of a difference on Windows, but if you're using an embedded system (OpenGL ES?), a megabyte is fairly large.

Even at worst case where rendering was actually slower than if a single index was used, the user would have a choice as to which format would be better to suit his needs. Finally, I also don't think anyone can honestly say whether it would be slower or faster until it actually gets implemented into hardware. Even after it gets implemented, the GPU designers will probably be able to come up with some extra tricks to make it work better -- who knows. The only facts we have now is it does save on memory, and there are others who would like to have such a feature. IMHO, because of this alone it should be considered for inclusion into OpenGL.

HellKnight
12-02-2005, 02:25 AM
Also, a "couple of megabytes" might not make much of a difference on Windows, but if you're using an embedded system (OpenGL ES?), a megabyte is fairly large.
But you wouldn't have maps with 1.000.000 vertices either, so the savings would at best be a couple of KILObytes - and then again on recent embedded systems that's quite acceptable.



The only facts we have now is it does save on memory, and there are others who would like to have such a feature. IMHO, because of this alone it should be considered for inclusion into OpenGL.Unfortunately, speaking from my experience, the industry doesn't work that way. Just because you and 2 other guys want this implemented it's very unlikely that someone at nVidia or ATI would say: Hey guys, let's do that! In recent years, it pretty much went the other way round. Now, if you could persuade Carmack to push the hardware vendors for that feature, chances are that we'd have this on next-gen hardware :)

EDIT: @ Zulfiqar Malik

I did some more tests, I just wanted first-hand information. I opened some Quake3 levels wit a hex editor and looked up the size of the vertex lump. An average-sized Q3 map had ~70- 80.000 vertices, but I even found one used for benchmarking (chartres) that had ~175.000. That's the vertex data BEFORE the curved surfaces get tesselated, so we can safely assume that most of them belong to the low-detail BSP model = high vertex duplication ratio.

Oh yes, and we're talking about a game that's 5 years old...

Zulfiqar Malik
12-02-2005, 03:08 AM
Originally posted by HellKnight

As you can easily see, we're talking about ~100.000 triangles PER FRAME. Ok, there are these large outdoor scenes in FarCry, but it's just an example. Now guess the total polygon count per map.

100,000 Tris/Frame is nothing for an outdoor scene (considering next gen engines). The engine that i am currently writing is capable of rendering 2 MTris/frame at 60 FPS with texturing and interpolated per-vertex lighting (using 1 directional light)! Thats a good 110MTris/s and i didn't have to duplicate even a single vertex! Indoor maps are a different story altogether. The whole idea of per-pixel lighting and other detail preservation techniques was to give sufficiently good detail at lower levels. If you are going with such high poly counts, you might as well do displacement mapping, but even then you can utilize the hardware for that!
After bpoint's last post i assume that he is talking about something completely different. So no point in speculating. Let him come up with a good enough example for review.

Jan
12-02-2005, 05:09 AM
But:

Yes, several indices might save memory. However, i think the pre/post tnl cache usage becomes much worse, because now the GPU has to check every single index, if it is in the cache or not and only if ALL of them are in the cache, the complete result can be reused, because in the days were a shader is nearly always in use, you can't just say, "hey, the position-index is the same, so the resulting position will be the same, too".

So, speed-wise, i think this will be only another burdon.

Zulfiqar Malik: Sorry, i didn't want to adress you specifically, with "you" i meant everybody who was talking about BSP-Trees, because i think, this is a really bad example these days.

Jan.

andras
12-02-2005, 07:26 AM
I think that hardware will soon become so generic and flexible, that there will be no real difference between texture data, vertex data, etc.. And at that point, you could just create a buffer storing indices to whatever you want, and then you can just fetch the values in the vertex shader (you could already do this with SM3.0). Of course, if you told me to do this today, I'd say ahh, that's gonna be too slow. But in the future? Who knows :)

jm2c,

Andras

Stephen_H
12-02-2005, 09:43 AM
Doesn't the DX10 specification that MS is trying to push on vendors for Vista/Longhorn allow for indexing separate attributes?

santyhamer
12-05-2005, 04:03 PM
Oh I posted something about all this in

http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=013930

some time ago.

Humus
12-05-2005, 08:26 PM
Originally posted by knackered:
(Humus speaks for ATI)To clarify, I do work for ATI, but not everything I say here neccesarily represent an official opinion of the company. Especially if we're talking about console platforms. I'm not involved in that at all. In this topic I'm only stating my own personal opinion, which may differ from what our hardware guys think.

Humus
12-05-2005, 08:53 PM
Originally posted by bpoint:
For both of our GameCube titles, we utilized multiple indices for each vertex attribute because it saved us memory.But you're working on a console. Those platforms tend to be tight on memory, but on the other hand have extremely fast buses and/or very fast embedded memory. It may make sense there if memory footprint is more important than shading speed. The typical console artwork also isn't comparable to the typical PC artwork. The more high-poly your model is, the less saving opportunities you get from multiple indices since vertex sharing will be greater and the relative ratio of edge vertices vs smooth vertices gets smaller. I did a quick test in my little engine I'm working on, and sure, I got a decent save (30%) on the map, but it's mostly low-poly and per face data anyway (BSP-style ala UnrealEd). On the other hand, the gun I modelled in Maya turned twice as big.


Originally posted by bpoint:
Finally, I'd like to add that there are at least six different ways (glBegin/glEnd, glArrayIndex, glDrawElements, etc) to draw vertex data in OpenGL. Why not one more? :) If anything we need less ways to draw. glArrayIndex should go away, glBegin/glEnd should move into GLU or something. The only functions needed for the core API would be glDrawArrays and glDrawElement.


Originally posted by bpoint:
I just have to manage multiple sets of data for multiple platforms.You only need one set of data. Generating a single-index stream from multiple index streams is a piece of cake with a Kd-Tree and it can be done in O(n*lg(n)) time, which should be quick enough to do on load time. That's what I do myself actually since multiple indices are more convenient for preprocessing model data.

andras
12-06-2005, 07:30 AM
Originally posted by Humus:

Originally posted by bpoint:
Finally, I'd like to add that there are at least six different ways (glBegin/glEnd, glArrayIndex, glDrawElements, etc) to draw vertex data in OpenGL. Why not one more? :) If anything we need less ways to draw. glArrayIndex should go away, glBegin/glEnd should move into GLU or something. The only functions needed for the core API would be glDrawArrays and glDrawElement.
Generally, I agree with reducing the number of draw calls in the API (there was supposed to be a "pure" version of GL2.0, without all the 1.x legacy, no?), but we still need immediate mode for pseudo instancing, so we can change certain attributes between draw calls.

HellKnight
12-06-2005, 09:59 AM
Originally posted by andras:

[...] but we still need immediate mode for pseudo instancing, so we can change certain attributes between draw calls.Oh yes, I wouldn't like to rewrite my font rendering lib either (which, for simplicity, uses intermediate mode). If I used VBOs instead, the whole procedure of rebinding the vertex buffer objects would probably take more time - just for the couple of chars I'm drawing every frame.

Korval
12-06-2005, 10:34 AM
You only need one set of data. Generating a single-index stream from multiple index streams is a piece of cake with a Kd-Tree and it can be done in O(n*lg(n)) time, which should be quick enough to do on load time. That's what I do myself actually since multiple indices are more convenient for preprocessing model data.Yes but, to be blunt, you don't make games. You make demos.

Games, particularly cross-platform console games, don't have the luxury of doing O(nLog(n)) work even at load time. More and more game (and this will be far more true on next-gen platforms) will be based on streaming, so load time is gameplay time. As such, the closer you can get the on-disc data to match the in-memory representation of that data, the better off you are.

Then again, it is better to have each platform have its own final version of the data anyway, just to make loading work faster.

Humus
12-06-2005, 08:31 PM
Originally posted by Korval:
Yes but, to be blunt, you don't make games. You make demos.And I work with several of the world's top game developers. Game requirements is not an unknown field for me.


Originally posted by Korval:
Games, particularly cross-platform console games, don't have the luxury of doing O(nLog(n)) work even at load time.O(n*lg(n)) time grows close to O(n) when you reach a fair amount of data since lg(n) barely grows. It's quite realistic to do this work at load time. My implementation, which is still quite generic, indexes up an 1.4MB vertex array in 0.0026s. At that speed you can process in excess of 100MB in a fraction of a second.


Originally posted by Korval:
More and more game (and this will be far more true on next-gen platforms) will be based on streaming, so load time is gameplay time. As such, the closer you can get the on-disc data to match the in-memory representation of that data, the better off you are.That's a false assumption in the general case. Far more commonly the smaller the data set on the disc the faster the load time. It's not the CPU processing that's the limitation normally, but the read speed from HD, or even worse, from CD. A simple compression scheme will likely improve your load time.

Fastian
12-13-2005, 05:06 AM
AFAIK most people who participated in this discussion are programmers using OpenGL. None are actual hardware designers. It actually seems a little harsh to be rejecting bpoints idea altogether before actually giving it a thought. He is speaking from personal experience after all and his concern is really valid in console domain. If that isn’t a convincing reason, think about OpenGL ES. It's only a matter of time when we will start seeing cell phones playing OpenGL capable games and with the limited amount of memory that they host, this will certainly start to look an interesting idea.

Let's consider a game like NFS Most wanted. It has a large city model where I can see a lot of vertex duplication. Try that and you have a real world scenario to look after in console world.

No one is asking to scrap the existing functionality, but the benefit this suggestion provides certainly deserves a careful thought. I'm sure with all the tricks that hardware designers have, they'll come up with a solution to cache coherency/memory access issues as well. It certainly has weight in the embedded domain.

V-man
12-13-2005, 06:20 AM
Yes, none of us are hw designers, but the IHV's have not offered this feature to us. It's safe to assume they have performed case studies and the benifits are slim to none.

The whole point of interlaced vertices is to boost performance. You don't keep vertices, texcoords, normals in separate buffers, do you?

If you need it for GL ES, why not suggest this in the GL ES forum?

dorbie
12-13-2005, 10:30 AM
ES has already done away with similar legacy features due to implementation complexity. The bottom line is these kinds of things can complicate a design and cause multiple memory acesses as you try to get something like a DSP on a phone to repackage & massage your data into a format digestible by your hardware. Memory access patterns can also get hosed and simple HW implementations will and should have an optimal format that requires the fewest memory reads & writes for HW or a pipeline to draw.

This is not just about vertex space, it's about bug free complete functional and fast ubiquitous implementations.

Too much complexity on the front end is very much a bad thing no matter how good it seems, I'd rather have driver developers all around the world working on VBOs for phones (yes they are as we speak) than some seemingly more optimal corner case that loses on the swings what it gained on the roundabouts anyway.

Fastian
12-14-2005, 12:55 AM
Originally posted by dorbie:
ES has already done away with similar legacy features due to implementation complexity.So it has already been considered. Fair enough. I think this should be a good enough answer to everyone.