Primitive Restart with a twist

As of OpenGL 3.x, primitive restart was added where when enabled if a particular user defined index was encountered, then the primitive creation was restarted (the biggest use case being triangle strips). I would like to make augment it as follows:

Primitive Restart Index 2 is enabled/disabled GL_PRIMITIVE_RESTART_INDEX_SOME_NICE_NAME when passed to glEnable/glDisable.
When enabled, any one of the draw commands which transfers a set of generic attribute array elements to the GL the following will occur when index of the vertex is equal to the primitive restart index
[ul]
[li]will restart the primitive [/li][li]the index following is NOT an index for an attribute. Rather it specifies the value gl_SomeGoodSoundingName used by any of the shading stages[/li][/ul]

The initial value of gl_SomeGoodSoundingName is 0 at the start of each draw command. For instancing, the value if reset to zero at the start of each instance.

The idea behind this is that there are a number of cases where an attribute is used as an index into an array. Very often, that index does not change a great deal. With this extension, one does not need to occupy an entire attribute for that index.

Is there some reason we need a new enable for this? The only thing it changes is a variable that doesn’t currently exist, so I don’t see why it couldn’t just use the current restart enable. Old code won’t use a non-existent variable.

I don’t see why it couldn’t just piggy back off of regular restarting.

It is a new feature, so only new code would use it :stuck_out_tongue: The idea is for the following shader code:



MyType MyArray[N];
/* also applies to samplerBuffer with texelFetch, UBO's, etc */

.
.
.
MyType node_value;
node_value=MyArray[gl_SomeNiceName];


Currently one needs to do this:



MyType MyArray[N];
in int MyIndex;
/* also applies to samplerBuffer with texelFetch, UBO's, etc */

.
.
.
MyType node_value;
node_value=MyArray[MyIndex];


The idea behind this is that for many vertices, the value of MyIndex is the same.

I don’t see why it couldn’t just piggy back off of regular restarting.

Exactly how would one piggy back this onto current primitive restarting?

You misunderstand.

I’m arguing with the need for a special enum for the functionality, not the functionality itself. I’m saying that it should always be active when restarting, and if the hardware has to do something special based on whether you’re going to use the count, it could just as easily check the vertex shader to see if you’re using that value.

[QUOTE=Alfonse Reinheart;1250404]You misunderstand.

I’m arguing with the need for a special enum for the functionality, not the functionality itself. I’m saying that it should always be active when restarting, and if the hardware has to do something special based on whether you’re going to use the count, it could just as easily check the vertex shader to see if you’re using that value.[/QUOTE]

I don’t see how to make the interface work with piggybacking it onto the existing primitive restart, because it changes the meaning of an index stream. The current primitive restart implies that the next index specifies a vertex. The jazz I am advocating has the next index does not specify a vertex but specifies a value used by any of the shader stages… at the very least, what I am advocating changes the interpretation of an index stream compared to the current primitive restart… but to spell it out…

Lets say the primitive restart index is P. Lets say the index stream is given by say as {0,3,4,P,5,6,8,12}

Current primitive restart then produces the index stream groups G0={0,3,4}, G1={5,6,8,12}

What I am advocating produces the index stream groups G0={0,3,4}, G1={6,8,12}
with gl_SomeNiceName begin 0 for those primitives from G0 and having value 5 for those of G1

If you want to provide the shader a value after the restart index, you should (for the sake of symmetry) make the first index such a value, so that the user can set this value as normal.

More to the point, why do you need to do it this way? It would make far more sense if you just bumped an index after each restart, then let the vertex shader do whatever it needs to do to fetch the value of interest.

After all, what will people do with the value? It’s an integer, so it would probably be used to fetch data from some buffer in the VS, right? So why put it in the index list, when you can just make it an implicit counter? Why do users need to be able to set the value?

More to the point, why do you need to do it this way? It would make far more sense if you just bumped an index after each restart, then let the vertex shader do whatever it needs to do to fetch the value of interest.

This would effectively make this suggestion, “make gl_PrimitiveID accessible in the vertex shader”, which means that the vertex shader would not be able to cache vertices, and instead run on every vertex element.

I did a test a while back with a 72K point, 288K vertex model with per-face attributes. I compared the performance of promoting the 72K points and 77K per-face attributes to 288K-element vertex arrays via duplication and drew them with glDrawArrays, vs. 72K points drawn with glDrawElements() w/288K elements and a geometry shader to lookup the 77K per-face attributes via gl_PrimitiveID. The performance was roughly the same, though the memory use was a bit lower in the second case (tested on a Nvidia GEForce 670). So this would be more of a convenience or memory saving feature, than a performance improvement.

This would effectively make this suggestion, “make gl_PrimitiveID accessible in the vertex shader”, which means that the vertex shader would not be able to cache vertices, and instead run on every vertex element.

How not? Part of the per-vertex input would be the restart count. Thus, changing the restart count changes the input values, and thus you have a different vertex. Between restarts, the vertices could be cached.

Also, the original suggestion would have this same issue, since you’re changing a per-vertex input based on an index value.

This would effectively make this suggestion, “make gl_PrimitiveID accessible in the vertex shader”, which means that the vertex shader would not be able to cache vertices, and instead run on every vertex element.

How not? Part of the per-vertex input would be the restart count. Thus, changing the restart count changes the input values, and thus you have a different vertex. Between restarts, the vertices could be cached.

Also, the original suggestion would have this same issue, since you’re changing a per-vertex input based on an index value.

I wasn’t rebutting, just making the point that this can be done now using a geometry shader with similar performance characteristics. This suggestion just becomes a convenience.

This suggestion is much more (for me) than a convenience. This is essentially providing a (simple) means to specify a value that changes rarely. Alfhoneses suggestion that rather than making the index follow, but have another value running around that increments whenever primitive restart happens has legs in it and not a bad idea; it just means that one gets the value through another lookup into an array, buffer object or something else. My suggestion with or without Alfhonses modification to it means that the entries in the vertex cache should be discarded at primitive restart. For my use case, that will have no impact on performance because a fixed primitive uses the same value. In general, this idea is just to avoid using up the extra memory and attribute slot needed to get the same functionality. Doing it the way I have to now, the vertex cache is a miss anyways.

Not that I’m adverse to any of the suggestions above, I can see cases where they would be useful. But in the interests of discussion, if it changes rarely, would drawing them as separate batches with a uniform change between them suffice? Or perhaps a GLSL “gl_ArrayID” extension for glMultiDrawArrays/Elements(), with geometry with the same attribute grouped together?

Rarely in my case, and to what wish to apply this, means that the indices named between the primitive restart are disjoint, so for my use case the vertex cache issue is mute. For record, my use case is user interfaces, where the index is an index for “what node” to use… UI drawing has a tendency to have an enormous number of nodes where each node does not have a huge number of triangles to it… so breaking each node into a separate draw call is performance suicide.

Another way to get similar to what I want is to add glMultiDrawElements the ability to specify (or have it given) for each index range,

glMultiDrawElements(mode, count, primitivetype, indices, primcount);

is functionally equivalent to


for(unsigned int i=0; i<primcount; ++i)
{
   "gl_SomeNiceName=i";
   glDrawElements(mode, count[i], primitivetype, indices[i]);
}


this would also give me what I am after, but I prefer the primitive restart index because glMultiDrawElements method forces the bookkeeping updating of the arrays to use more than the primitive restart way in my use case.

This would potentially break with GL_UNSIGNED_BYTE or GL_UNSIGNED_SHORT indices; how do you specify a value of gl_SomeNiceName greater than 255 or 65535 in those cases?

If you want to have an input attribute that changes rarely, you can use glVertexAttrib calls to set it, with much greater flexibility but admittedly at the cost of breaking a batch.

If you want to have an input attribute that changes rarely, you can use glVertexAttrib calls to set it, with much greater flexibility but admittedly at the cost of breaking a batch.

Has anyone done any kind of comparative performance analysis between setting a uniform value and setting a non-array attribute?

Nvidia called this “pseudo instancing”, and showed some significant performance gains on old GPUs by using vertex attributes. Performance on DX10+ GPUs may be different.

This would potentially break with GL_UNSIGNED_BYTE or GL_UNSIGNED_SHORT indices; how do you specify a value of gl_SomeNiceName greater than 255 or 65535 in those cases?

Break is not the correct word, the correct word is that it is much more limited. The value taken from the index stream has the range by the index stream type, thus if drawing with GL_UNSIGNED_BYTE, then the value is an integer in the range [0,255] and for GL_UNSIGNED_SHORT is it in integer in the rage [0, 65535].

Alfonse’s suggestion of having the value start at 0 and increment whenever restart is encountered is nice though, and though using that would be harder, it is growing on me.

Additionally, looking at the way of tweaking glMultiDrawElements, one can see that there are roads open for other tweaks. Another way forward is a draw_indirect2 idea as follows by tweaking GL_ARB_draw_indirect



typedef struct {   GLuint count;   GLuint primCount;   GLuint firstIndex;   GLint  baseVertex;   GLuint reservedMustBeZero;   GLuint someNiceName;} DrawElements2IndirectCommand

DrawElementsIndirect2(enum mode, enum type, const void *indirect);means:
DrawElements2IndirectCommand *cmd = (DrawElementsIndirectCommand *)indirect;
"gl_SomeNiceName=someNiceName";
DrawElementsInstancedBaseVertex(mode, cmd->count, type, cmd->firstIndex * size-of-type, cmd->primCount, cmd->baseVertex);

which by itself is not so interesting, but then augment GL_ARB_multi_draw_indirect to a GL_ARB_multi_draw_indirect2. Going further, the data for someNiceName does NOT then really need to be a 32-bit unsigned integer. Instead it can be like a “pseudo-attribute”, a munge of bytes and the DrawElementsIndirect2 (and MultiDrawElementsIndirect2) would then have further arguments specifying how to interpret that munge of bytes as follows:


typedef struct {   
   GLuint count;  
   GLuint primCount;  
   GLuint firstIndex;  
   GLint  baseVertex;  
   GLuint reservedMustBeZero;   
   GLubyte someNiceName[]; //variable length
} DrawElements2IndirectCommand;

DrawElementsIndirect2(enum mode, enum type, const void *indirect);means:
DrawElements2IndirectCommand *cmd = (DrawElementsIndirectCommand *)indirect;
"gl_SomeNiceName=someNiceName";
DrawElementsInstancedBaseVertex(mode, cmd->count, type, cmd->firstIndex * size-of-type, cmd->primCount, cmd->baseVertex, someNiceNameSize, someNiceNameType, someNiceNameNormalized);

where someNiceNameSize, someNiceNameType, someNiceNameNormalized specify, analogous to VertexAttribPointerhow to interpret the bytes at the end of the structure.