PDA

View Full Version : Primitive restart extended functionality



Yandersen
06-12-2014, 04:36 PM
Let's start from the problem: it is common to have a mesh built with different types of primitives, but gl*Draw* functions let you draw only one type of primitives per call. So you are either forced to convert all the geometry to one type of primitive (in most cases, GL_TRIANGLE_STRIP is the most optimal solution) or call gl*Draw* multiple times sorting the primitives by their type. Well, primitive restarting indices helped to separate primitives "on the fly" without bothering to call gl*Draw* for each individual primitive or inserting multiple index clones, but still the optimization on the mesh have to be done to combine individual triangles into primitives of the same type. Howether, merging triangles of abstractly-shaped geometry all into strips only, with blind machine algorithms (even advanced ones), results in a considerable quantity of short strips. Some of that "garbage" may even better fit into GL_TRIANGLE_FAN rather than GL_TRIANGLE_STRIP, or even rendered as individual triangles. But as we forced to use strips for all primitives, we still have to use separation indices for each of those small chunks.

So here is my solution for that mess: let's extend the functionality of the primitive restart index(es) to let it(them) change the primitive mode. In order to do that we need more than one value of index to have a special meaning rather then referencing the corresponding set of vertex attribute. I think, the most elegant way is to state that once the extended functionality of Primitive Restart Index (shortly, PRI) is enabled, then any index, which value is equal or higher than the value specified by glPrimitiveRestartIndex will have the special meaning. If the index is equal to the specified PRI, then the primitive type stays the same (just restarted); if the index is higher than PRI, then the type of the primitive changes, so all subsequent indices will be used to construct the primitive of the new type.

So what are the rules of converting the special index value into the desired primitive type? Let's have a look on the defined values of all currently known primitives (let's keep legacy types in that table just for consistency purposes):
GL_POINTS 0x00
GL_LINES 0x01
GL_LINE_LOOP 0x02
GL_LINE_STRIP 0x03
GL_TRIANGLES 0x04
GL_TRIANGLE_STRIP 0x05
GL_TRIANGLE_FAN 0x06
GL_QUADS 0x07
GL_QUAD_STRIP 0x08
GL_POLYGON 0x09
GL_LINES_ADJACENCY 0x0A
GL_LINE_STRIP_ADJACENCY 0x0B
GL_TRIANGLES_ADJACENCY 0x0C
GL_TRIANGLE_STRIP_ADJACENCY 0x0D
GL_PATCHES 0x0E
So far there are 15 values only, defined as consecutive numbers. The straightforward approach is to think of the index just above the PRI value as a base offset for those numbers. So once the index encountered which is equal to PRI, the primitive will be just restarted, but it's type left unchanged; if the index value is equal to (PRI+1), then the primitive type will switch to GL_POINTS; if index is (PRI+2), then the new primitive type will became a GL_LINES and so on.

The way I described this may be raw and messy, I know, but I hope the idea will find the support.
The extended primitive restart functionality will let the meshes to be constructed with primitives of different types and greatly simplify their drawing and storing in files. The quntity of gl*DrawElements* calls required to draw a single mesh will be reduced to 1 single call as all required information may be stored right in the GL_ELEMENT_ARRAY_BUFFER making the "mode" parameter of a set of gl*DrawElements* functions obsolete.

mhagain
06-13-2014, 04:27 AM
Actually strips have not been the most optimal primitive type for a long time. glDrawElements with GL_TRIANGLES is preferred (you can arrange the data in strip order if you wish, but you don't have to). id software pushed this as the optimal path for Quake III (in 1999), they give better vertex reuse, and this has been what vendors optimize around. See also http://tomsdxfaq.blogspot.ie/2005_12_01_archive.html


Let me say this just once, because academics are still spending time and money researching this subject. You're wasting your time. Strips are obsolete - they are optimising for hardware that no longer exists. Indexed vertex caches are much better at this than non-indexed strips, and the hardware is totally ubiquitous. Please go and spend your time and money researching something more interesting.

The exception is if you're talking about a mobile device where you know that strips are still preferred, but that's OpenGL ES, not OpenGL

Yandersen
06-13-2014, 07:24 AM
"...Indexed vertex caches are much better at this than non-indexed strips..." - what I see here is a comparison between indexed way of drawing of the most unoptimal primitives (GL_TRIANGLES) vs indexed GL_TRIANGLE_STRIPS - this emphasizes the performance gain from vertex post-processing cache, it has nothing to do with the actual type of the primitive. If the comparison will be made between indexed drawing of both of those types, no doubt strips will win. Well, at least the size of the index buffer will be smaller.

Drawing 2 connected triangles independently will require 6 invocations of a vertex shader, but if they are drawn using GL_TRIANGLE_FAN or GL_TRIANGLE_STRIPS, there will be only 4 invocations. Well, with the help of vertex post-processing cache the number of vertex shader invocations may be equal to the number of processed vertices, yes, but still there is a gain achieved by the reduction of the index buffer size, as triangles take 3*n indices and strips (or fans) take 2+n+1 indices (including the PRI), so if there is 2 or more connected triangles in a mesh, it is better to use strips or fan primitive to draw them. And in most cases any triangle in a mesh has at least 3 neighbors.

mbentrup
06-13-2014, 11:05 AM
Drawing 2 connected triangles independently will require 6 invocations of a vertex shader, but if they are drawn using GL_TRIANGLE_FAN or GL_TRIANGLE_STRIPS, there will be only 4 invocations. Well, with the help of vertex post-processing cache the number of vertex shader invocations may be equal to the number of processed vertices, yes, but still there is a gain achieved by the reduction of the index buffer size, as triangles take 3*n indices and strips (or fans) take 2+n+1 indices (including the PRI), so if there is 2 or more connected triangles in a mesh, it is better to use strips or fan primitive to draw them. And in most cases any triangle in a mesh has at least 3 neighbors.

For a vertex cache of size 2 or more there will be only 4 invocations for indexed triangles either.

The gain in index buffer size is only relevant if you actually build long strips, but in that case you can't make effective use of the vertex cache:

In an optimal triangle mesh, every vertex is part of 6 triangles, but in a strip it is only part of at most 3 triangles, so (on average) every vertex has to be part of two strips. If you optimize for long strips to reduce memory consumption, you'll no longer have that vertex cached the second time it is needed, so you need twice the number of vertex shader invocations for strips.

Yandersen
06-13-2014, 03:46 PM
So am I just outdated? Are you guys saying everybody nowadays using triangles only and nobody bother with strips and fans? So there will be no use for the proposed extension at all? :confused:


In an optimal triangle mesh, every vertex is part of 6 triangles, but in a strip it is only part of at most 3 triangles, so (on average) every vertex has to be part of two strips. If you optimize for long strips to reduce memory consumption, you'll no longer have that vertex cached the second time it is needed, so you need twice the number of vertex shader invocations for strips.
If I have such mesh that could be tessellated onto long strips - long enough to extend the capacity of the cach - I can't imagine how could I draw it with triangles to make a better use of cache. Drawing it area-by-area will still make border vertices outcached. Perhaps it may be even worse than with strips. Still, strips can be trimmed into the smaller sizes, and even after that the amount of indices will be smaller comparing to what the individual triangles will take.

mhagain
06-13-2014, 05:42 PM
In an optimal triangle mesh, every vertex is part of 6 triangles, but in a strip it is only part of at most 3 triangles, so (on average) every vertex has to be part of two strips. If you optimize for long strips to reduce memory consumption, you'll no longer have that vertex cached the second time it is needed, so you need twice the number of vertex shader invocations for strips.

And vertices are much bigger than indices too, so the saving on vertices more than offsets the increased index count.


So am I just outdated? Are you guys saying everybody nowadays using triangles only and nobody bother with strips and fans? So there will be no use for the proposed extension at all? http://www.opengl.org/discussion_boards/images/smilies/confused.png

Strips are still relevant in the mobile world and this would probably be useful for GL ES. There may be some cost from both restarting a primitive and switching the primitive type, but the saving from fewer draw calls may offset that (draw calls are a much bigger deal with ES than they are on the desktop).

Yandersen
06-13-2014, 09:54 PM
Let's skip on arguing what type of the primitive is *the best* and focus on the actual idea of the proposed extension. The most important question, as I see that, is the performance. Is it possible to predict if switching the primitive type along with primitive restarting will considerably slow down the performance comparing to the simple primitive restarting? Is there a way to implement the proposed primitive switching functionality and keep rendering speed the same?

Maybe the way of primitive switching I described earlier is not the best one for implementation. Maybe the switching indices has to be set independently using some special function:


//glPrimitiveSwitchingIndexMode(GLenum index, GLenum mode)
glEnable(GL_PRIMITIVE_RESTART);
glPrimitiveRestartIndex(252);
glPrimitiveSwitchingIndexMode(GL_PRIMITIVE_SWITCHI NG_INDEX0, GL_TRIANGLE_FAN); //The actual index value is 253
glPrimitiveSwitchingIndexMode(GL_PRIMITIVE_SWITCHI NG_INDEX1, GL_TRIANGLES); //The actual index value is 254
glEnable(GL_PRIMITIVE_SWITCHING_INDEX0); //Now the index 253 will switch the primitive type to GL_TRIANGLE_FAN
glEnable(GL_PRIMITIVE_SWITCHING_INDEX1); //Now the index 254 will switch the primitive type to GL_TRIANGLES

In other words, the implementation will have a set of supported primitive switching indices (PSI) ranging from GL_PRIMITIVE_SWITCHING_INDEX0 to GL_PRIMITIVE_SWITCHING_INDEXi, where i is equal to GL_MAX_PRIMITIVE_SWITCHING_INDICES-1. Each of those index binding points have an index value implicitly associated with them. That value is based on the primitive restart index, so for the target GL_PRIMITIVE_SWITCHING_INDEX0 the actual index value will be equal to PRI+1; for GL_PRIMITIVE_SWITCHING_INDEX1 it is PRI+2 and so on. The actual number of supported PSI is implementation dependent and may be smaller than the number of supported primitive types. Maybe. I don't know, I am not a developer. :)
So we associate the primitive types for different PSI, then enable that PSI.

It is just an another way to implement the index-based primitive switching.
It can also be extended. F.e. if some index value is set to mode GL_NONE, let such index to be the so called termination index, acting just like 0 in character strings, resulting in abortion of index array execution:

glPrimitiveSwitchingIndexMode(GL_PRIMITIVE_SWITCHI NG_INDEX2, GL_NONE); //Once the index equal to PRI+3 encountered, all subsequent indices will be ignored
glEnable(GL_PRIMITIVE_SWITCHING_INDEX2); //Now the index 255 will stop the index array execution

kRogue
06-17-2014, 04:26 AM
I don't want to be too harsh, but what is the purpose of extending the primitive restart to change the primitive type? Is it expected that the index buffer will be made by a feedback mechanism? Is it to just avoid an extra draw call?

Lets take a review why primitive restart was/is a good idea. A long time ago, in a galaxy far away, folks used triangle strips. They used them because one index per triangle was great. It was really great in immediate mode. Then came indexed mode and TnL hardware where vertex processing was on GPU (instead of CPU). Once there, a post vertex transform cache was christened and folks realized that can get better than 1 triangle per processed vertex if one uses GL_TRIANGLES and primes for the cache. Finally, we get to primitive restart. With primitive restart one can get one triangle per index (often) and better than 1 triangle per processed vertex. From an API point of view we always had glMultiDraw* but they are really done by software.

What is the gain for being able to change the primitive type in the index stream? What I see are:

avoid issuing another draw call
potential to reuse more vertices from the post-vertex cache
if index stream is made by feedback process (i.e. by GPU)


Here are my thoughts for each one of the above:

Avoiding issuing a draw call is not really a big deal. Most drivers batch many draw calls together before sending it down to the kernel to send to the GPU, so what matters for many draw calls is:

If there are state changes between draw calls. For this case the state change is primitive mode, which is not really a big deal. Also, one can argue that index buffer offset as well.


Likely we are talking saving maybe 3 or 4 vertices max between mode changes, not a big deal UNLESS one has very number of vertices between primitive mode changes; this also applies to 1.
one can rig a feedback process to also output when the primitive mode changes in an array of streams, indexed by primitive mode


So what I see, the main benefit is if the batches are tiny between changing primitive modes and the draw order MUST be in that order [for otherwise one can organize the stream by primitive mode].

From the hardware point of view, it makes life harder because now the primitive type, rather than being set only at draw call, would need to be propagated down the pipeline in addition to the logic at vertex fetch units to recognize that the primitive type changed. That is going to cost some sand.

Yandersen
06-17-2014, 04:02 PM
Well, my major intention was to make an index buffer pretty much a self-descriptive item, requiring only a single draw call to render an entire model. An array of indices technically represents an array of batches, but type of the batch' primitive and the bounds of each batch are not stored in an index array - that "mapping" is made externally by a draw call providing the missing arguments. The same arguments every time, which is senseless. It would be logical for client to store the range information, which defines where inside that index buffer each individual detalization-level-submodel is located, but if we also have to store each primitive batch mapping of that submodel - it becomes just messy and doesn't make sense anymore. We may draw one submodel or another depending on the conditions (distance from the viewer) but we never draw parts of the model separately just because those are built with different type of the primitive - all batches related to a single submodel are either all drawn or none of them at all. The particular primitive is not a unit the user may need to manipulate independently unless the whole submodel is built with the same type of primitives. We also can't choose which type of the primitive to use when drawing a given batch of indices - the given set of indices require a predefined primitive type to construct primitive properly. Those are not the type of info the client may need to manipulate at all - the primitive type is an info of the same atomic level as an indices by themselves. Those are two data types which are naturally bound; together they represent the single logical item and changing any of them independently will not produce anything meaningful. So in my opinion, the type of the primitive should not be separated from the indices, so the drawing routines will provide only the starting index and the quantity of indices to render. IMO.

There is another argument of glDrawElements command that client should not bother also, IMO, - that is the type of indices. Again, it is a type of info that is bound to the contents of an IBO. Howether, unlike the primitive type, the index type does not dynamically changes and in most cases an entire IBO share the same type of indices (especially if all those indices refer to the same VBO). I think, it would make a perfect sense to store an index type as state attribute of GL_ELEMENT_ARRAY_BUFFER target, or better, an IBO by itself. The primitive switching index values, PRI and index type - all can be set there also as all of those parameters are hard-bound to the array of indices. F.e. the commands setting those values may look like this:


//glBufferParameteri(GLenum target, GLenum pname, const GLint param)
//glBufferParameteriv(GLenum target, GLenum pname, const GLint * params)
glBufferParameteri(GL_ELEMENT_ARRAY_BUFFER, GL_ELEMENT_ARRAY_INDEX_TYPE, GL_UNSIGNED_SHORT); //Set index type
glBufferParameteri(GL_ELEMENT_ARRAY_BUFFER, GL_PRIMITIVE_RESTART_INDEX, 0xfffc); //Set PRI
glBufferParameteri(GL_ELEMENT_ARRAY_BUFFER, GL_PRIMITIVE_RESTART_INDEX_RANGE, 4); //4 Indices from 0xfffc to 0xffff inclusive are PRI by default
GLint PRI1_PSI3[] = { GL_PRIMITIVE_RESTART, GL_TRIANGLES, GL_TRIANGLE_STRIP, GL_TRIANGLE_FAN };
glBufferParameteriv(GL_ELEMENT_ARRAY_BUFFER, GL_PRIMITIVE_SWITCHING_INDEX, PRI1_PSI3); //Redefine 3 out of 4 restarting indices as switching indices

So to keep a backward-compatibility, if a draw call is made by any of the currently known functions, the index type and primitive mode are taken from the function arguments rather than from a states of an GL_ELEMENT_ARRAY_BUFFER; as well as a single value of PRI set by glPrimitiveRestartIndex. To make a use of an IBO states, a set of functions for indexed drawing I recommend to introduce:


glDrawIndices(GLint indexFirst, GLsizei indexCount)
glDrawIndicesInstanced(GLint indexFirst, GLsizei indexCount, GLsizei instanceCount)
...
The indexFirst parameter is an "index of a first index" in an array stored in IBO; therefore the byte offset will depend on the type of the indices, which is taken from a parameters of the given IBO. That means that from a client point of view the IBO represents an array of indices, and all the client should know about it when choosing which part of it to draw, is a particular area range where the desired model' data is located. Neither the index type has to be respecified, nor the primitive-mapping has to be messed about.

Well, taken that IBO is a part of VAO state anyway, all those parameters may alternatively be a part of VAO state rather than the GL_ELEMENT_ARRAY_BUFFER state. I am not sure about that...

mhagain
06-17-2014, 05:33 PM
It's valid for the index type to be allowed change if you're streaming indices to a dynamic buffer. One type of object may need 32-bit, another type may be fine with 16-bit. Allowing this means that you only need create a single such buffer and only bind it once rather than having an extra buffer bind (or even full VAO change) each time the type needs to change. This also works well with a persistent mapping setup.

Yandersen
06-17-2014, 06:45 PM
Simply change the index type state of an IBO right inbetween the calls of glDrawIndices. I am sure all of the model primitives will share the same type of indices, and even different models will have the same index type in most cases, so the reconfiguration of IBO' index type state will not be frequent. Besides, the same mess was happening with the primitive restart index until the fixed restart index value feature was introduced recently - the PRI had to be respecified according to the index type as it was always intentional to use the maximum possible value, but it required a function call separate from drawing call.
But yeah, the "GLint indexFirst" argument is better to be changed to something able to specify an offset which may not be aligned to the index type:

glDrawIndices(GLsizei indexCount, const GLvoid* indices)
glDrawIndicesInstanced(GLsizei indexCount, const GLvoid* indices, GLsizei instanceCount)
...
The respecification of the IBO states (index type, PRI, PSIes) inbetween model drawing calls is kind of senseful: different models may have a different set of primitive type used for their construction; they may come from different vendors which use different PSI-bindings conventions; as with the PSI help the whole model is drawn using the single draw call, the setup of an IBO states according to specifications of the model's data conventions seem to be logical way of drawing.
Howether, the lack of "mode" argument assumes that the primitive type becomes somewhat like an internal state and it is undefined until the first PSI encountered. Probably, it preserves the status since the last PSI encountering, so either the IBO shall start from one of the PSI or the default primitive type state must be set using some function just before calling the glDrawIndices (f.e. model's convention is to use GL_TRIANGLES by default unless the IBO starts from the explicit PSI). The normal drawing functions set the mode explicitly, but a proposed set of glDrawIndices functions avoid the disturbance of a primitive rasterization mode state as well as index type respecification (as I am just an OpenGL user I may predict any advantage here only intuitively). :)

The way I see the implementation of PSI is more like an exception handling, so respecifying it for every new primitive is not a desired way of using this feature - if two consecutive primitives have the same type, usage of the restart index between them should be preferred over the switching index. But the PSI usage, even if implemented as exception for every switching index encountered, should obviously work faster then calling a glDrawElements* function, right?

kRogue
06-18-2014, 05:44 AM
I think it would be a really good idea to decide how often that primitive mode is going to change. Really important. If it changes rarely, then having the hardware handle primitive mode changes dictated in the index stream is silly; it significantly complicates the front end to put it mildly. Emulating be the driver is a really bad idea since it would then need to walk the index stream to break it up, a really bad idea for non-unified memory architectures (and in truth a bad idea still for unified).

I admit that this is a nice idea in terms of making the API a touch nicer, but for drivers it is not the number of draw calls that is so bad, but rather state changes between draw calls; in that regard, for the point of view of the API, changing primitive mode is not even a state change anyways.


What are some use cases that make this feature, at the cost of complicating the hardware, worth while? This, if ever done, is going to cost sand so it had better make some rendering faster in such a way that application breaking draws indexed by primitive type is not reasonable.

Yandersen
06-18-2014, 02:30 PM
Well, let's compare it with the PRI which we have implemented already by now. We have an index value, a single value, which can be set by the API, which is treated in a specific manner, unlike a real index value. So every time a new index is read it is compared to that special value. At this stage, the upgrade to PSI will require to change the checking method: instead of comparing an index to be equal to a single predefined value we check it now against the bounds, because now we have a range of special index values. So far, no much difference in performance as checking against bounds can be done as fast as checking for equality.
Next step is taken when the index is recognized to belong to a set of special indices. By now, with PRI we already have some sort of reinitialization when primitive is restarted. With PSI that reinitialization conditionally branches depending on the specific value of the special index encountered. As we bind a specific index values with the desired primitive modes before the draw call is made, then we can expect that some preparations are performed/precomputed/precompiled at that time so during the rendering the switching mechanism could work without a help of the driver.
I do not know how exactly the indexed rendering is implemented in hardware and how the PRI is implemented currently, but I have a "feeling" that advancing it to PSI could be done without too much "sand". This is question for the actual videodriver developers.


What are some use cases that make this feature, at the cost of complicating the hardware, worth while? This, if ever done, is going to cost sand so it had better make some rendering faster in such a way that application breaking draws indexed by primitive type is not reasonable.I didn't claimed that the PSI should make drawing faster. The focus is on the user. In most cases the models originally built with different types of primitives are converted to GL_TRIANGLES just because no one wants to bother storing separate parameter sets for each primitive and call glDraw* multiple times.
The OpenGL is not just an independent thing - the user has to store a lot of info about what he has to draw. Multiple objects and structures. If model is built with different primitives, then array of primitive parameters (mode, indexcount, offset and so on) have to be allocated for each model. A dynamic array, which require a memory allocation. If an application has hundreds of models and each of them has a detalization level submodels - then we have a horrible amount of a small memory allocation calls, which fragments the main memory. And every memory allocation is a potential source of error. The more of them done, the higher the risk of error to occur. And if one decides to optimize the usage of vertex and index buffers by merging some models which share the same vertex type, all that dynamic arrays have to be recalculated (indices shifted, base offsets added or whatever). With a help of PSI we have just one set of parameters for a single draw call for each model, even if it is built with a multiple types of primitives. It is a huge simplification for the user side, and even if it will not make rendering faster by itself, it will take the overwhelming burden from the user, so some other optimization techniques could be taken advantage of more easily. As well it will save the main memory, which is otherwise polluted by all that drawing data.
It is really illogical to store part of the primitive definition in an index array (indices) and the other part - in a main memory (primitive type) as those are two parts of the same thing. The VBO stores the information about the actual points of the model, which are pretty much an independent items in terms of vertex transformations, but what kind of a data does the IBO represent without a primitive type mapping?! Specifying the primitive type in a draw call is the same senseless thing like specifying an internal format of a texture every time it is bound - nonsense. Every set of indices can correctly work only with the primitive type they were generated for, just like a texture data can be fetched correctly only if it is viewed through the right internalformat. So from this point of view, the GL_DRAW_INDIRECT_BUFFER should store the primitive types and references to the index ranges, linking the primitives with their indices, but it doesn't. The user still has to manage the individual primitives (grouped by types at best case) just like that is the data he may want to manipulate. Really?
But the way the index buffer is used is more like the reading of a text string - indices are picked consecutively, so inserting the PSI into proper positions is a very straightforward solution for marking the borders at which the primitives of a given type start.
Representation of the idea in a graphical way may look like this:

- The IBO before PRI:
{ 1 2 3 4 5 6 7 8 9 }

- The IBO with the PRI:
{ 1 2 3 ';' 4 5 6 ';' 7 8 9 }

- The IBO with the PSI and PRI:
{ 'GL_TRIANGLES' 1 2 3 'GL_TRIANGLE_FAN' 4 5 6 ';' 7 8 9 }

In other words, if IBO stores the PSI along with the normal indices, that IBO fully defines a geometry and the only thing the user has to store is the bounds of the area of the IBO which is related to the object the user wants to draw. This approach is very consistent with the general usage of an OpenGL: we upload the data, define it's structure, configure the pipeline, and make use of the data by telling the OpenGL when to draw it and where. But if the user has to store a part of the model's technical definition on his side, then both the application and OpenGL are involved in the low-level drawing mess, so the OpenGL serves just a half of it's purpose, isn't it?

So the advantage of PSI extension is not only the minor shrink of IBO size due to the optimal packing of indices into a type of primitives they fit better (instead of using the universal primitive type for all); not only the drawing time benefit from glDraw* calls which could be minimized to one per model instead of one per each primitive of that model, but the major advantage is the minimization of the pollution of a main memory due to a multiple memory allocation calls the application must make during the initialization (or model reloading) to allocate an arrays of primitive type definitions for each model. Removing the headache of primitive-by-primitive drawing sequence the user must follow to get the model drawn will also make the application code lighter - this is advantage the user will appreciate. And here I do not mean the newbie user writing the first app! The time (so as money) the debugging of a messy code takes (code dealing with multiple dereferences and dynamically allocated objects) should also be considered.

I do think that the technical experts are to be consulted about the possible ways of PSI implementation.

Nikki_k
06-19-2014, 02:43 PM
I didn't claimed that the PSI should make drawing faster. The focus is on the user. In most cases the models originally built with different types of primitives are converted to GL_TRIANGLES just because no one wants to bother storing separate parameter sets for each primitive and call glDraw* multiple times.


No, No, No!!!
This sounds like the most pointless reasoning imaginable.
The focus should be to guide the user to provide the data in a manner that allows the most efficient execution. In case you haven't noticed, the most recent talk about 3D has not been about making a more user friendly API but to push it closer to the hardware in order to reduce driver overhead. Driver overhead has become one of the most important issues with graphics performance. And you ask for more of it.

What you want has absolutely no place in the driver, it's merely a convenience feature - but one that puts a huge burden on the hardware because it requires specific implementation.

I was faced with a similar setup recently - creating a buffer from data that contained triangles, strips, fans, and even quads and quad strips making up a single model object
My solution was to turn everything into triangles but I didn't want to change all the code that generates the data, so what I did was to generate the buffer data as it was but inserted some fake primitive restart markers into it and then passed it to a conversion function that made a list of triangles out of it. Effectively the data generation still can assume it creates all kinds of primitives but the 3D hardware never will see any of it, it will only see the final triangles that can be rendered with a single glDrawElements call. You should do the same.

And the number of saved indices can be considered irrelevant, the maintenance overhead will easily negate any of it.

Yandersen
06-19-2014, 04:44 PM
That is exactly the common "solution" that makes me so sad! :sorrow: What is the point of having different primitive types supported if no one is using them?! Why to introduce the PRI? Triangles do not require that, right?
Besides, what was the difference in the quantity of indices between the original model's version and the triangle-only version? Smg like more then two times bigger, I guess? Well, the internal texture format GL_UNSIGNED_SHORT_4_4_4_4 is also just 2 times smaller then {GL_RGBA&GL_UNSIGNED_BYTE}, but I doubt the GPU can operate with 4-byte values directly, so the unpackings must be performed all the time. But still, that complicated format (as well as many others of the same kind) was introduced into the core version of OpenGL just to halve the size the texture could take. And again: how many users do actually use it and how much more complications it brought to the drivers? ;)

"Driver overhead has become one of the most important issues with graphics performance. And you ask for more of it."

If so, then let's deprecate all flat primitive types except for triangles and patches - good? Let's also deprecate PRI then, because fetching groups of indices of standardized quantity is so much easier - just iterationally stride by n indices, pick next batch of n indices and rasterize a new primitive - this way there will be no need to check indices for special values which cause misalignment for the stride and preventing such an easy-going!:whistle:

IMO, if the primitive types with undefined index quantities (strips, fans) are not scheduled for deprecation, then they need to be given a "full support" so users would be encouraged to use them. PRI is just a half-way toward that: primitive is restarted, but mode stays the same. I do not think that making a second step to solve the problem completely is such an unaffordable complication.

"The focus should be to guide the user to provide the data in a manner that allows the most efficient execution."

If the same amount of triangles could be rasterized faster in GL_TRIANGLES mode rather than in strips, quads, fan or any other modes, then there is nothing to argue about - downconverting other types of primitives into a unified array of triangles is easy. But if there is no difference in rendering speed, then the small benefits like IBO size saving and user-side simplifications start to play toward the PSI extension.

malexander
06-19-2014, 06:16 PM
Do you have a usage scenario for this extension, such as an algorithm that would be accelerated by it? The reason I ask is because over the years I have run into very few cases where points, lines, and triangles would share the same shader or GL state.

I think such an extension could switch between prim types that generate the same basic GL primitive type (points, lines, triangles, adjacency-types). However, I wonder if hardware can switch rasterization modes efficiently, from triangle rendering to lines, and back again. Certainly with a geometry shader active you'd be restricted to a single GL primitive type.

For some background on my GL experience: I've occasionally found myself wanting to switch between lines and line-strips in the same draw. I don't use triangle strips or fans except in extremely specialized cases that would be their own draw batch (circle drawing, for example). If tristrips or trifans were deprecated I wouldn't shed a tear :) I'm not at all interested in the old fixed-function pipeline.

Nikki_k
06-20-2014, 02:42 AM
That is exactly the common "solution" that makes me so sad! :sorrow: What is the point of having different primitive types supported if no one is using them?! Why to introduce the PRI? Triangles do not require that, right?


Baggage from older times? Remember, quads have already been deprecated because hardware support is poor.
Also, why is this solution so bad? It does precisely what you want, it only requires a bit of groundwork on the CPU side - just like it is with matrices in the core profile. There's absolutely no need for the driver to handle them, all it really needs to do is upload the data.


Besides, what was the difference in the quantity of indices between the original model's version and the triangle-only version?


Depends on the model. On average 1.5 times larger. But let's be clear about one thing: You need HUGE models for this to have an impact. Model data is static, you upload it once to the GPU and forget about them, who cares if they take 1 MB or 1.5 MB of index storage ALTOGETHER?
If you are this concerned about space, you can still check to see if you can convert everything to strips, then all you need to take apart is the fans and quad strips but I really didn't bother with that because common wisdom currently says that single triangles are better for the hardware.



Smg like more then two times bigger, I guess? Well, the internal texture format GL_UNSIGNED_SHORT_4_4_4_4 is also just 2 times smaller then {GL_RGBA&GL_UNSIGNED_BYTE}, but I doubt the GPU can operate with 4-byte values directly, so the unpackings must be performed all the time. But still, that complicated format (as well as many others of the same kind) was introduced into the core version of OpenGL just to halve the size the texture could take. And again: how many users do actually use it and how much more complications it brought to the drivers? ;)


What do I care if the index buffer gets a bit larger? The memory it takes is still a fraction of the textures required for drawing all this stuff, so all things considered, we are talking about less than 5% space savings, all things considered. That's nothing! That's simply not worth adding new logic to the hardware.
And that's where these texture formats come in: Let's take a highly complex model with 100000 triangles. That's 300000 vs maybe 200000 indices, i.e. a difference of 400000 bytes. Now let's take a skin texture. For a model of this detail it'd have to be at least 1024x1024, if not larger. But let's stick to 1024. With all mipmaps generated, such a texture is 5.5 MB in RGB32 format, halving that amounts to 2.75 MB of space savings, even more if you use other compression formats. See the relation between texture and index buffer? It's almost 7:1 - if you got a second skin for the same model you are at 14:1. That's why formats with a smaller memory footprint exist. As a whole the texture to index ratio will even be far higher than in this contrived example. If you want to save space, save where you can get huge savings with small investment, do not try to get small savings with costly investments.



If so, then let's deprecate all flat primitive types except for triangles and patches - good?


No. Let's deprecate everything that's not commonly supported across existing hardware. The API should mirror what can be done efficiently by the driver, not what allows the most convenience to the programmer. It only gets bad if the reduced feature set puts some severe limitation on what can be done and how it can be done.

And that's still the crux here: In order to implement this feature in the spec you need hardware supporting it! But hardware currently does not support it, meaning it has to resort to expensive emulation steps to support it. You simply do not want that in the most time critical part of the entire driver, namely the draw calls. You want those to be as efficient as they possibly can be.



Let's also deprecate PRI then, because fetching groups of indices of standardized quantity is so much easier - just iterationally stride by n indices, pick next batch of n indices and rasterize a new primitive - this way there will be no need to check indices for special values which cause misalignment for the stride and preventing such an easy-going!:whistle:


No, strips and fans still have their use and I'd still use them if it makes sense. But it gets very hard to justify the effort to mix both, aside from poor data design. To be honest, as of this writing the only means to get such a mix of primitives I know is from the GLU tesselator and from models as old as Quake 2's MD2 format. Anything newer has already been optimized for better hardware use. So, sorry, I really have no clue what this would be there for.



IMO, if the primitive types with undefined index quantities (strips, fans) are not scheduled for deprecation, then they need to be given a "full support" so users would be encouraged to use them. PRI is just a half-way toward that: primitive is restarted, but mode stays the same. I do not think that making a second step to solve the problem completely is such an unaffordable complication.


No, they are not. They do not need to have to because they still get universal support by all existing hardware, meaning that issuing one draw call results in the driver starting one GPU operation. But if the hardware can't switch on its own to a different primitive type your suggestion would mean that it has to resort to emulation to satisfy your request, meaning it has to analyze the buffer on the CPU, see where some primitive type change occurs and then dispatch single draw calls. I can't stress this enough: YOU DO NOT WANT THAT!!!.



If the same amount of triangles could be rasterized faster in GL_TRIANGLES mode rather than in strips, quads, fan or any other modes, then there is nothing to argue about - downconverting other types of primitives into a unified array of triangles is easy. But if there is no difference in rendering speed, then the small benefits like IBO size saving and user-side simplifications start to play toward the PSI extension.

No, they don't. You want a small benefit that'd require a significant investment in hardware complexity. That game will never play off. If you have an index buffer on the GPU, it has to be in a form that the GPU can efficiently consume, not in a format that's as small as possible and certainly not in a format that allows you to do shortcuts in your CPU code.

Have you ever asked yourself why even the most newfangled indirect draw calls allow no switch of primitive types? Right, that's because the hardware is not designed to do it. And ultimately that's the only thing a new feature should be measured against: If it got universal hardware support, yes put it in, if it'd require emulation, leave it out. And that's the end of story, you can argue as much with your buffer size savings - they don't mean anything if they cause inefficiencies - especially if it's for a problem that can already be solved with existing features.

And trust me, this one's extremely low on the hardware makers' radar. They have absolutely no motivation to add features that provide no quantifiable benefit for some measly memory savings.

Yandersen
06-20-2014, 05:35 AM
Alright then, I am reasoned now, thanks everybody for contribution! I see that primitive switching index is rather a PITA than an advantage.
The topic may be closed, I think...