PDA

View Full Version : multiple indices for different array (vertex, color,...)



Rhett
11-08-2004, 07:49 AM
Hi,

Maybe my question is really stupid but after a full day reading all the vertex_xxx extension specification, I dare ask it.

In my program I have an array containing the vertex, and an other containing the color. But the color of the i_th vertex in the vertex array is not in the i_th color of the color array (I share the vertices but not the color).
Is there a way I can render my scene using those two arrays with 2 indices arrays (one for the vertex and an other for the color)?

Thanks a lot for your help

Rhett

Bob
11-08-2004, 09:15 AM
Immediate mode is the only way. If you use vertex arrays, the same index is used for all arrays. So if a vertex shares some attributes but not all, all it's attributes must be duplicated.

jwatte
11-08-2004, 05:13 PM
You need to normalize your arrays (http://www.mindcontrol.org/~hplus/graphics/vertex-arrays.html) .

execom_rt
11-09-2004, 03:53 AM
The multiple indices per mesh will be a feature in .. WGF (DirectX 10)

Humus
11-10-2004, 02:01 PM
If you need something faster than what jwatte suggested (doing it that way gets extremely slow if your model is large) you can use a simple hash-table to quickly check whether an identical pair of indices have been passed in before. It takes the time down to near linear if the hash-table is large enough.

knackered
11-10-2004, 02:13 PM
Originally posted by execom_rt:
The multiple indices per mesh will be a feature in .. WGF (DirectX 10)Err..why?
How would that affect the vertex cache?

Korval
11-10-2004, 10:10 PM
How would that affect the vertex cache?Which one? Pre-T&L or post-T&L?

For pre-T&L, it is meaningless. If it can fetch from multiple locations of memory using one index, then it isn't so difficult to see it fetch from multiple locations using multiple indices.

As for the post-T&L cache, well, it would only use the post-T&L entries if all the indices are the same. Simple.

knackered
11-11-2004, 10:41 AM
It only adds to the cache if all indices are the same, and only searches the cache if all indices are the same?
Right....so who in their right mind would use multiple index arrays?
If nobody's going to use them because of the performance impact (ie zero cache usage) then why bother introducing them into the API?

V-man
11-11-2004, 12:50 PM
It's useful if you don't want to waste memory.
You might be able to save on 32 bit color per vertex or 64 bit tex coords (u, v) and replace it with unsigned short.

>>>For pre-T&L, it is meaningless. If it can fetch from multiple locations of memory using one index, then it isn't so difficult to see it fetch from multiple locations using multiple indices.<<<

I think it will effect cache performance.

Korval
11-11-2004, 01:06 PM
It only adds to the cache if all indices are the same, and only searches the cache if all indices are the same?No, I didn't say that. Or I mis-wrote.

The way a post-T&L cache now works is that it stores some number of entries. The key for these entries is the index that was used to provoke them. As such, the only time post-T&L cache data is used (rather than T&L'ing the vertex) is if the incoming vertex's index matches one in the post-T&L cache.

Now, if you have multiple indices, then the sequence of indices becomes the key. The only time the post-T&L cache data is used in this case is if the sequence of indices for the incoming vertex matches the sequence of indices for the cache entries in the post-T&L cache.

It would seem, without putting any thought into it, that the likelihood of these matching is more remote than in the single-index case. However, that's if you assume a random indexing scheme.

Take the typical "box" case, where you have 8 positions, but 24 texture cooridinates. If you use a single-index system, you have 24 vertices; that's 24 separate indices. When you render these as individual triangles (a list), you get some matching, but only for the 2 triangles that make up a face.

If you use 2 sets of indices, you match at the same times as you did for the single indexed case. You repeat sets of indices at the same places as the single index case. All the multiple index case does is slightly complicate the post-T&L logic and potentially buy you back some memory.


I think it will effect cache performance.In what way? The pre-T&L cache must already be able to fetch from disperate locations in memory, since we are allowed to have different vertex attributes in different VBO's. All the multiple indexing would add is the ability to use a different index for each base pointer. It means that you have to dereference multiple indices into the memory pointers, but the accessing logic is no different than before.

jwatte
11-11-2004, 05:50 PM
It's useful if you don't want to waste memory.
We did extensive measurements on real-world meshes, and for all but the lowest-poly real-world meshes, a single index list and normalized vertices was smaller than multiple indices.

I e, while separate indices is a win for things with lots of seams, like a cube (where every vertex is a seam); most real geometry has a low seam to continuous vertex ratio, and thus the single-index array plus single index array is smaller than the multiple arrays plus multiple index arrays. It's the size of the multiple index arrays that kills you!

knackered
11-12-2004, 03:17 AM
So in the general case most indices in multiple index arrays will be identical....so what's the point in allowing them? Who's requested this feature? Just seems like another layer of complexity to what should be a simple mechanism in a graphics api. If it ain't broke don't fix it.

idr
11-12-2004, 08:32 AM
We did extensive measurements on real-world meshes, and for all but the lowest-poly real-world meshes, a single index list and normalized vertices was smaller than multiple indices. That's very interesting. Is that work published anywhere? I'd love to have some hard data to point to whenever this issue comes up. :D

knackered
11-12-2004, 12:40 PM
Why? Don't you think it's a reasonable assertion that most meshes have very few seams?
IDR, you say 'whenever this issue comes up' - this suggests it comes up frequently - who brings it up at ARB meetings and what reasoning do they give?
I'm very curious...is it anything to do with internet-based 3d file formats?

Korval
11-12-2004, 01:33 PM
So in the general case most indices in multiple index arrays will be identical....so what's the point in allowing them? Who's requested this feature? Just seems like another layer of complexity to what should be a simple mechanism in a graphics api. If it ain't broke don't fix it.Actually, I would say that it should get more complicated, not less. Ideally, vertex fetching (and some processing perhaps) should be programmatic. Some kind of program should be able to walk the vertex data as it sees fit and generate vertex attribute data as needed to feed the vertex program.

As for the point of having this, the purpose was not to make the postT&L cache more efficient. The purpose is purely memory saving (and therefore, potentially bandwidth saving).


We did extensive measurements on real-world meshes, and for all but the lowest-poly real-world meshes, a single index list and normalized vertices was smaller than multiple indices.How many multiple indices were you using? There are, of course, tradeoffs to multiple indexing, but you can decide to index a number of parameters together (like single indexing). In general, I wouldn't suggest having more than 2-3 sets of indices, as the size of the new index data starts competing with the size of the saved data. The idea is that your normals are more likely to cause a crease at the same place where you change texture coordinates (and, if they don't, then you know where to have your modellers start putting creases). You can crease sets of attributes together to achieve the best possible memory savings.

And what kinds of "real world" meshes did you use?


Why? Don't you think it's a reasonable assertion that most meshes have very few seams?Not really. There are quite a few texture coordinate seams for modern high-poly characters, and the bigger textures for character meshes get, the more seams there will be.

V-man
11-13-2004, 05:27 AM
Originally posted by Korval:
In what way? The pre-T&L cache must already be able to fetch from disperate locations in memory, since we are allowed to have different vertex attributes in different VBO's. All the multiple indexing would add is the ability to use a different index for each base pointer. It means that you have to dereference multiple indices into the memory pointers, but the accessing logic is no different than before.[/QB]With single indices, the vertex, color, texcoords will be in the same neighborhood if you use interlacing, but with multiple indices, they may be a little too far and might effect the cache performance negitively.

Jwatte says he did some performance tests, but there is a problem here. On what GPU?
You would need a GPU emulator on a specific GPU design.

I'm not convinced that this feature would be useful. The last time someone needed this feature, was indeed someone who wanted to render a million cubes.

jwatte
11-13-2004, 11:03 AM
Jwatte says he did some performance tests, but there is a problem here. On what GPU?
You would need a GPU emulator on a specific GPU design.
That's not what I said. I said I measured the total size of vertex data + index data when storing meshes as normalized arrays, and when storing meshes as separate arrays with separate index streams. Further, I said we measured that the overall size of the meshes was smaller with the normalized arrays. This measurement can easily be made without any specific GPU target.

When it comes to rendering, I don't understand how you could even compare the performance. The only way to render a mesh with separate indices, using the current API, is to decode it to immediate mode. I've measured that that's slower than proper VAR or VBO geometry management, many times.

Regarding what the meshes were, it was everything from a 100-poly vegetation prop, to a 2500-poly soft-skinned character. Note that the character had an artificically high seam count, because we swap textures (and geometry pieces) with quite some frequency (i e, change what shoes you're wearing separate from your T-shirt).

If you're doing a modern poly-reduction normal-mapped character, it still shouldn't have more seams, assuming your mesh parameterization (UV mapping) is well done. If you use crappy automated tools, and separate texture coordinate channels for base color versus normal mapping, then it's going to have lots of seams. But you really shouldn't be doing that if you care about performance, nor storage size, so I'm assuming you don't do that.

Korval
11-14-2004, 12:30 AM
With single indices, the vertex, color, texcoords will be in the same neighborhood if you use interlacing, but with multiple indices, they may be a little too far and might effect the cache performance negitively.Alternatively, with multiple indexing, it is possible that, when one of the indexes is repeated, the pre-T&L cache may still already have those values loaded, and therefore doesn't need to take up the bandwidth/time to read them. The only way for the single index case to hit on the pre-T&L cache is to access the same full index twice.


I said I measured the total size of vertex data + index data when storing meshes as normalized arrays, and when storing meshes as separate arrays with separate index streams.You never mentioned how many separate index streams you had. Or how many attributes you're talking about.


If you're doing a modern poly-reduction normal-mapped characterAs far as I'm concerned, a "modern" character should be no less than 5000 polys, with the main character reaching 10000. Modern graphics cards can handle this many polys, so I would see this as a reasonable number.


separate texture coordinate channels for base color versus normal mapping, then it's going to have lots of seams. But you really shouldn't be doing that if you care about performance, nor storage size, so I'm assuming you don't do that.I disagree that having separate texture coordinate channels for the bump map is inappropriate. There are a number of reasons for wanting the bump map to be on a separate parameterization than the main color map.

And, since the technique we're talking about is specifically designed to mitigate the storage cost of vertex data, such a technique becomes even more important. Considering that, as the number of texture coordinate sets increases, the number of seams increases, it is likely to become more relevant in the future, not less (as we will be wanting to have lots of texture coordinate sets with different mappings).

knackered
11-14-2004, 11:23 AM
Originally posted by Korval:
as far as I'm concerned, a "modern" character should be no less than 5000 polys, with the main character reaching 10000. Modern graphics cards can handle this many polys, so I would see this as a reasonable number.Even if you're using shadow volumes, Korval?

jwatte
11-14-2004, 03:06 PM
we will be wanting to have lots of texture coordinate sets with different mappings
I don't see this at all. I see a single, unique, mapping (a k a "parameterization"). Then I see a large number of maps mapped over this unique parameterization. The maps can be different resolution, but I think they all share the same unique texture coordinate set.

Edit: the claim that multiple index streams is to "save storage space" is fetched from thin air. The reason there are multiple index sets, is that it's easier to build a modeling application that way. It is, however, definitely NOT the smallest way to store a mesh in any modern art pipe I've worked with or looked at (and I've looked at several).

V-man
11-14-2004, 06:09 PM
Originally posted by jwatte:
That's not what I said. I said I measured the total size of vertex data + index data when storing meshes as normalized arrays, and when storing meshes as separate arrays with separate index streams. Sorry about that.

On an unrelated note, let's do some calculations :
assume we have texture coordinates (S, T), 32 bit float for a certain mesh, and it has 1000 of these.

You decide to use the multi index features, 16 bit unsigned.

1000 * 2 byte = 2000 bytes

You need to remove 2000 bytes/8 bytes = 250 tex coords from the previous 1000 to break even.

250/1000 = 25%

You need to remove at least 25% from any model you have to break (as far as tex coords are concerned)

For color, as the OP mentioned, assuming each color is 32 bit, you need to remove 50% to break even.

Korval
11-14-2004, 08:13 PM
Even if you're using shadow volumes, Korval?Yes.

Assuming modern hardware (something that can run HL2 or Doom3 with a good level of effects reasonably well), we can therefore assume 100M polys per second, theoretically. Now, we drop this to 50M right off the bat (turning theory into fact). At 75fps, that gives you 666 thousand polys per second. 10,000 poly characters means that you could have 66 of them, or 33 with one shadow, or 22 with 2. Plenty, as long as you refrain from multipassing too much.


I don't see this at all. I see a single, unique, mapping (a k a "parameterization"). Then I see a large number of maps mapped over this unique parameterization. The maps can be different resolution, but I think they all share the same unique texture coordinate set.Why? Why would you ever do that? Why would you limit your texture artists and modellers in this fashion?


It is, however, definitely NOT the smallest way to store a mesh in any modern art pipe I've worked with or looked at (and I've looked at several).Have all of those art pipes not interfaced directly with some form of hardware that required single indexing? Certainly, I wouldn't bother with such a representation if I knew I was just going to have to unpack it later.

As to the veracity of the claim that it is not the smallest way to store a mesh, I disagree.

Let's say you have the following vertex format:

position
color
Normal
UV1
UV2 (a bump map)
Tangent for UV2
Binormal for UV2

Position and color are almost never going to cause a crease. Positions and colors are virtually always 1:1. The Normal will crease when the bump texture coordinates do, usually. The UV2, Tangent, and Binormal only crease simultaneously.

This leaves the following sets of creasing elements:

1: Position/Color (16-bytes)
2: Normal/UV2/Tangent/Binormal (44-bytes aprox)
3: UV1 (8-bytes)

The cost of a crease (using the same other indices except for this element) due to #2 is 16-bytes + 8 bytes. The cost of a crease due to #3 is 44-bytes + 16-bytes.

Let's assume that, in the single index case, the total number of elements is 12,000. This makes the memory cost a total of 840,000 bytes.

Now, the multi-index case is difficult to compute. It is:

(12,000 * X) * 18-bytes +
(12,000 * Y) * 46-bytes +
(12,000 * Z) * 10-bytes

Where X, Y, Z are a scale of the single-index case to match the new index count. So, if the position/color was repeated 10% of the time, X would be 0.9.

To determine which one is smaller, we need to set the equations equal:

(X * 18) + (Y * 46) + (Z * 10) = 70.

Since we don't have specific data for this mesh, we can't really go any further. However, we do know the following. If Z isn't smaller than 0.8 (8/10), then we know that Z isn't going to make up the cost of its own indices. However, Y makes up the costs of its indices easily enough, at 44/46 or 0.95. That means that only 1 in 20 indices in the single index case have to duplicate set 2 in order for this to be a win (purely for set 2).

We can see what happens with some specific values. We'll assume that X is the smallest, since a position defines when a crease happens and the color is almost always 1:1 with position, so color doesn't induce creases.

If X=0.6, Y=0.9, and Z=1.0 (effectively, these numbers mean that UV1 dominates the creasing behavior), then we get 62.2, which is a win over 70.

If, however, Y dominates the creasing (more creases due to normals and normal-like things like bump maps. This may be more likely), then these seem reasonable: X=0.6, Y=1.0, Z=0.9. Neither Y nor Z makes up the cost of its indices. That still leaves us with 65.8.

The real question seems to be how low X is. If X can cover the cost of Y and Z's indices, then you win with multiple indexing. X being small means that set 1 (position & color) is frequently repeated. If X is less than 14/18 or 0.77, then you always win. 0.77 means that 1 out of every 4 positions/colors in the single index case is a repetition.

The key seems to be either a lot of position-based creasing (a small X), or a modest amount of creasing of something large (a Y of relatively small size).

Reasonably, I would say that the creasing in X being 0.77 is not reasonable with small vertex formats (1 set of UVs, one color, one normal/binormal,tangent, etc). However, if you have many mesh parameterizations for several textures (diffuse/specular, bump, detail, bump-detail), the number of possible creases in position/color shoots up dramatically, and X decreases. 0.77 is not unreasonable for cases of multiple changing attributes.

Of course, since Jwatte doesn't believe in having multiple parameters, he won't see the need in this, but those who don't force their texture and mesh artists to conform to such stringent requirements may find a memory reduction. Granted, the memory reduction is not necessarily dramatic, but as the number of parameterizations increase, it will become increasingly significant. Especially since increased parameterizations mean a basic increase in memory cost.

Also, consider this. It is easier to get a net gain if you have fewer sets indices. Granted, it is harder to make that gain significant, since having 2 sets of indexed data effectively means that you'll have more replicated data due to creasing. This analysis for the proper format (# of indices) all sounds like something that should be programmed into a tool to optimize meshes for rendering.

More importantly, if hardware is already going to give us this in the future (D3D 10 requiring it will force the issue), then GL may as well support it. It'd be silly not to.

The problem with V-Man's logic is that he only takes each individual attribute in turn, rather than looking at the whole set of attributes. The savings due to X and Z can total up to an overall savings, even if Y is 1.0 (never repeated).

Ysaneya
11-14-2004, 11:28 PM
Yes.

Assuming modern hardware (something that can run HL2 or Doom3 with a good level of effects reasonably well), we can therefore assume 100M polys per second, theoretically. Now, we drop this to 50M right off the bat (turning theory into fact). At 75fps, that gives you 666 thousand polys per second. 10,000 poly characters means that you could have 66 of them, or 33 with one shadow, or 22 with 2. Plenty, as long as you refrain from multipassing too much.
No offense, but this quote sounds very naive to me.

Your numbers are only valid in the "perfect case", ie. an infinitely powerful CPU, and no shader or texture, because switching states will decrease your numbers by a lot, especially in a complex scene, even if you sort your meshes by material.

In addition shadow volumes requires multi-passing. They consume an enormous amount of CPU (to compute the silhouettes, and then to fill some dynamic vertex buffers), as well as some enormous amount of fillrate (even with 2-sided stencil). In summary, if you get 30 fps with 100k polys and only one light, you're already lucky.

Y.

Korval
11-15-2004, 07:55 AM
Your numbers are only valid in the "perfect case", ie. an infinitely powerful CPU, and no shader or texture, because switching states will decrease your numbers by a lot, especially in a complex scene, even if you sort your meshes by material.No, I accounted for that. That's part of the cutting down from 100M to 50M. It accounts for time lost due to state changes, vertex programs running, and so forth.


In addition shadow volumes requires multi-passing. They consume an enormous amount of CPU (to compute the silhouettes, and then to fill some dynamic vertex buffers), as well as some enormous amount of fillrate (even with 2-sided stencil). In summary, if you get 30 fps with 100k polys and only one light, you're already lucky.Then stop using crappy rendering techniques.

Shadow maps don't require any CPU computation of anything. No silhouette edges, nothing. You just render the mesh from the light's point of view. They need fillrate, but far less than what stencil shadow volumes need.

In short: if you use a technique that is known for leaching the performance out of a GPU as a matter of course, you can't blame the GPU for it.

V-man
11-15-2004, 08:31 AM
Let's assume that, in the single index case, the total number of elements is 12,000. This makes the memory cost a total of 840,000 bytes.
There are two ways to implement multi index :
- one index array per attribute
- grouping attributes, giving each an index array

Now for your calculations :
840,000 bytes?

Should be
(16 byte + 44 byte + 8 byte) *12000 vertices=
816000 bytes

the index array is most likely larger than 12000 because your models should have vertex sharing.
It's a good idea to throw in a made up number, like 12000 * 1.12 = indices



Now, the multi-index case is difficult to compute. It is:

(12,000 * X) * 18-bytes +
(12,000 * Y) * 46-bytes +
(12,000 * Z) * 10-bytes Here as well. Why is the index count tied to the vertex count?
Can you explain your reasoning?

knackered
11-15-2004, 09:39 AM
Originally posted by Korval:
Then stop using crappy rendering techniques.

Shadow maps don't require any CPU computation of anything. No silhouette edges, nothing. You just render the mesh from the light's point of view. They need fillrate, but far less than what stencil shadow volumes need.

In short: if you use a technique that is known for leaching the performance out of a GPU as a matter of course, you can't blame the GPU for it. huh?????????????????????????????????????????????
Can you think of no advantages that shadow volumes have over shadow maps, korval?
Oooo, beautiful character models, but what the hell is that lego-land slab of darkness hanging off 'em?

Korval
11-15-2004, 09:58 AM
Here as well. Why is the index count tied to the vertex count?
Can you explain your reasoning?Hmmm... Not really. I must have gotten confused somewhere.

I had intended the 12,000 to be the number of indices, but I forgot that, even in the single index case, the number of indices is not the number of vertices.

Sorry about that.


Can you think of no advantages that shadow volumes have over shadow maps, korval?I can think of some, but the massive performance disadvantages of shadow volumes seems to outweight the advantages. At least to me.

knackered
11-15-2004, 11:15 AM
I agree.

Ysaneya
11-15-2004, 01:35 PM
Shadow maps don't require any CPU computation of anything. No silhouette edges, nothing. You just render the mesh from the light's point of view. They need fillrate, but far less than what stencil shadow volumes need.
Oups, sorry, you are obviously right. For some reason i (incorrectly) assumed that you were speaking of stencil shadows, not shadow maps, probably because i saw Doom 3 mentionned in your post.

Although honnestly 50M Tris/second, even with shadow maps and a recent graphics card, still looks quite high to me. 15 to 25 MTris maybe, but 50M ? Especially if you perform multisampling and implement per-pixel lighting; your framerate is gonna decrease pretty quickly..

Y.

V-man
11-15-2004, 04:14 PM
Originally posted by Ysaneya:
Although honnestly 50M Tris/second, even with shadow maps and a recent graphics card, still looks quite high to me. 15 to 25 MTris maybe, but 50M ? Especially if you perform multisampling and implement per-pixel lighting; your framerate is gonna decrease pretty quickly..

Y.[/QB]Having more polys should be affordable.
Multisampling and fragment shaders are another part of the pipe. You have to balance the pipe.

The performance problems with both shadow methods is fillrate among others. Shadow maps add the issues that come with RTT.

idr
11-15-2004, 08:04 PM
IDR, you say 'whenever this issue comes up' - this suggests it comes up frequently - who brings it up at ARB meetings and what reasoning do they give? To the best of my knowledge, and I've only been going to ARB meetings since September 2002, this has never come up at an ARB meeting. However, it comes up all the time on newsgroups and message boards around the net.