View Full Version : Should geometry shader preserve primitive order or am I just lucky?

06-08-2012, 06:19 AM

For some GPGPU application, I'm throwing the content of a VBO as GL_POINTS at a geometry shader with transform feedback enabled in order to filter out data in the captured VBO. To discard a vertex, I just have to not call EmitVertex() in the geometry shader.

My algorithm requires the order of elements of the input vertex buffer to be preserved in the output, e.g. if the input vertex buffer contains
vtx1, vtx2, vtx3, vtx4 and the shader discards vtx2, the output vertex buffer content must be vtx1, vtx3, vtx4 in that order.

Now, I could succesfully implement the algo and it works great on an nVidia GF GTX280 with current drivers but I could find no places in the specs where it said that such order was preserved (only that the order of vertices within a primitive is preserved).

So I'm wondering if I'm just being lucky or if it's actually supposed to be working that way.
Could someone please point me to the chapter in the specs where it says that primitive order is kept?

Thanks in advance,

06-08-2012, 07:08 AM
This is not a coincidence.

In fact, geometry shaders preserve primitive order, thus the output of the geometry shader executed for the ith primitive will be always written before the output of the geometry shader executed for the (i+1)st primitive.

06-08-2012, 09:13 AM
Thanks for your reply.

Is that part of the specifications of ARB_geometry_shader4?
I couldn't find anything regarding the order of primitives in EXT_transform_feedback*.

By the way, the application is a radix sort of data generated on the GPU through rasterization, which is easy to implement with a fragment shader and a software loop on key bit index, provided the pipelines preserve the order of primitives, which is the case.
I know there are better sort algorithms out there but 16 millions 32-bit keys sorted in 3-4ms is good enough for me.

For anybody interested in the shader:

#version 130
#extension GL_EXT_geometry_shader4 : enable

uniform uint uMask;
uniform uint uDiscardIf;

flat in uint iKey[];
flat out uint oKey;

void main() {
for( int i=0 ; i<gl_VerticesIn ; ++i ) {
oKey = iKey[i];
if( (oKey&uMask) != uDiscardIf ) EmitVertex();

with uMask = 1 << bit_index
and uDiscardIf = uMask then 0
(VBO drawn twice per iteration)

The data actually bound to vertex attrib aKey can be some GLfloats as long as there are no INFs or NaNs.

Alfonse Reinheart
06-08-2012, 09:41 AM
You're not going to find a "geometry shaders do not modify the order of primitives" statement because it's unnecessary. Everything that happens in OpenGL happens as though it were in the order provided by the user. Therefore, unless the spec has language that states that something does modify the order, then it does not.

Geometry shaders get primitives; therefore, they must get primitives in the order provided. Primitives emitted are emitted in the order in which they are emitted. Transform feedback writes primitives that pass the shader stages; therefore, it writes them in the order emitted from the geometry shader.

However, if you want a spec quote, it is the third paragraph of chapter 2:

A vertex defines a point, an endpoint of an edge, or a corner of a polygon where two edges meet. Data such as positional coordinates, colors, normals, texture coordinates, etc. are associated with a vertex and each vertex is processed independently, in order, and in the same way. The only exception to this rule is if the group of vertices must be clipped so that the indicated primitive fits within a specified region; in this case vertex data may be modified and new vertices created.

The next paragraph expands on this for primitives.

06-11-2012, 02:33 AM
Hello again and thank you very much for your reply.

That just makes sense, I forgot that the order in which primitives are rasteriezd in the next stages does matter in many cases (blending, etc.).

Now, on a different subject, it seems to me that actually allowing primitives to be delivered out of order in cases where order is not relevant might permit some optimizations (better parallelization or maybe automatic partial depth sort to benefit early depth test, etc.). Is there some means of controlling that (like glHint)?
Maybe OpenGL implementations already have that kind of optimizations without needing a hint.
(and sorry for asking what might be stupid questions).


06-11-2012, 07:07 AM
No, there isn't such control. The only alternative is to use an append/consume buffer implemented using image load/store and atomic counters. Those work out-of-order, however, that might require you to put the generation step to the fragment shader on some hardware as the first implementations of image load/store don't allow image operations inside the vertex or geometry shader.