Vertex Rendering

Revision as of 22:43, 16 June 2014 by Racarate (Talk | contribs) (it seems wrong to teach legal-but-un-useful-behavior)

Jump to: navigation, search
This page is about the drawing functions for vertices. If you're looking for info on how to define where this vertex data comes from, that is on Vertex Specification.

Vertex Rendering is the process of taking vertex data specified in arrays and rendering one or more Primitives with this vertex data.


In order to successfully issue a rendering command, the currently bound Vertex Array Object must have been properly set up with vertex attribute arrays, as defined here. If indexed rendering is to be used, the GL_ELEMENT_ARRAY_BUFFER binding in the VAO must have a Buffer Object bound to it as well.

Causes of rendering failure

V · E

The GL_INVALID_OPERATION error can happen when issuing any drawing command for many reasons, most of which have little to do with the actual drawing command itself. The following represent conditions you must ensure are valid when issuing a drawing command.

This list is not comprehensive. If you know of more, please add them here.


Rendering can take place as non-indexed rendering or indexed rendering. Indexed rendering uses an element buffer to decide which index in the vertex arrays values are pulled from. This is explained in more detail in the Vertex Specification article.

All non-indexed rendering commands are of the form, gl*Draw*Arrays*​, where the *​ values can be filled in with different words. All indexed rendering commands are of the form, gl*Draw*Elements*​.

Primitive Restart

Primitive restart functionality allows you to tell OpenGL that a particular index value means, not to source a vertex at that index, but to begin a new primitive of the same type with the next vertex. In essence, it is an alternative to glMultiDrawElements (see below). This allows you to have an element buffer that contains multiple triangle strips or fans (or similar primitives where the start of a primitive has special behavior).

The way it works is with the function glPrimitiveRestartIndex. This function takes an index value. If this index is found in the index array, the system will start the primitive processing again as though a second rendering command had been issued. If you use a BaseVertex drawing function, this test is done before the base index is added to the restart. Using this feature also requires using glEnable(GL_PRIMITIVE_RESTART) to activate it, and the corresponding glDisable​ to turn it off.

Here is an example. Let's say you have an index array as follows:

 { 0 1 2 3 65535 2 3 4 5 }

If you render this as a triangle strip normally, you get 7 triangles. If you render it with glPrimitiveRestartIndex(65535) and GL_PRIMITIVE_RESTART enabled, then you will get 4 triangles:

 {0 1 2}, {1 2 3}, {2 3 4}, {3 4 5}

Primitive restart works with any indexed rendering function. Even the indirect ones.

Fixed index restart

Primitive Restart Fixed Index
Core in version 4.5
Core since version 4.3
Core ARB extension ARB_ES3_compatibility

For compatibility with OpenGL ES 3.0, OpenGL 4.3 allows the use of GL_PRIMITIVE_RESTART_FIXED_INDEX. This enumerator can be enabled and disabled, just like GL_PRIMITIVE_RESTART. The new enumerator takes priority over GL_PRIMITIVE_RESTART if they are both enabled.

Unlike regular restarting, the fixed-index version uses a specific index. Namely, the largest index possible for the type​ of the indexed rendering command. So if you the type​ is GL_UNSIGNED_SHORT, then the restart index will be 65535 or 0xFFFF.

Note: If GL_PRIMITIVE_RESTART_FIXED_INDEX is enabled, and you use one of the array rendering commands, then primitive restarting will not be used. This is different from when GL_PRIMITIVE_RESTART. And remember: GL_PRIMITIVE_RESTART_FIXED_INDEX takes priority.
Compatibility Note: When using this in a compatibility profile, the restart index is only used if you are using one of the indexed rendering commands that actually takes a type​. So if you're using immediate mode with arrays, through glArrayElement​, then there will be no primitive restarting at all. Again, fixed-index restart behavior takes priority.

Direct rendering

These vertex rendering commands provide the various rendering parameters directly as parameters passed to the functions. This contrasts with other rendering commands (see later sections), where some parameters are pulled from OpenGL object sources.

Basic Drawing

The basic drawing functions are these:

 void glDrawArrays( GLenum mode​, GLint first​, GLsizei count​ );
 void glDrawElements( GLenum mode​, GLsizei count​, GLenum type​, void * indices​ );

where, for glDrawArrays:

  • mode​ parameter is the Primitive type.
  • first​ and count​ values in define the range of elements to be pulled from the buffer.

as for glDrawElements:

  • count​ and indices​ parameters define the range of indices.
    • count​ defines how many indices to use.
    • indices​ defines the offset into the index buffer object (bound to GL_ELEMENT_ARRAY_BUFFER, stored in the VAO) to begin reading data.
  • type​ field describes what the type of the indices are:
    • GL_UNSIGNED_BYTE: index range: [0, 255]
    • GL_UNSIGNED_SHORT: index range: [0, 65535]
    • GL_UNSIGNED_INT: index range: [0, 232 - 1].


The basic drawing functions are all you really need in order to send vertices for rendering. However, there are a number of ways to draw that optimize certain rendering cases.

Rendering with a different VAO from the last rendering command is usually a relatively expensive operation. So many of the optimization mechanisms are based on you storing the data for several meshes in the same buffer objects with the same vertex formats and other VAO data.

Binding a VAO or modifying VAO state is often an expensive operation. And there are many cases where you want to render a number of distinct meshes with a single draw call. All of the meshes must be in the same VAO (and therefore the same buffer objects and index buffers). Also, of course, they must use the same shader program with the same uniform values.

To render multiple primitives from a VAO at once, use this:

 void glMultiDrawArrays( GLenum mode​, GLint *first​, GLsizei *count​, GLsizei primcount​);

This function is conceptually implemented as:

void glMultiDrawArrays( GLenum mode, GLint *first, GLsizei *count, GLsizei primcount )
	for (int i = 0; i < primcount; i++)
		if (count[i] > 0)
			glDrawArrays(mode, first[i], count[i]);

Of course, you could write this function yourself. However, because it all happens in a single OpenGL call, the implementation has the opportunity to optimize this beyond what you could write.

There is an indexed form as well:

 void glMultiDrawElements( GLenum mode​, GLsizei *count​, GLenum type​, void **indices​, GLsizei primcount​ );

Similarly, this is implemented conceptually as:

void glMultiDrawElements( GLenum mode, GLsizei *count, GLenum type, void **indices, GLsizei primcount )
	for (int i = 0; i < primcount; i++)
		if (count[i]) > 0)
			glDrawElements(mode, count[i], type, indices[i]);

Multi-draw is useful for circumstances where you know that you are going to draw a lot of separate primitives of the same kind that all use the same shader. Typically, this would be a single conceptual object that you would always draw together in the same way. You simply pack all of the vertex data into the same VAO and buffer objects, using the various offsets to pick and choose between them.

Base Index

All of the glVertexAttribPointer calls define the format of the vertices. That is, the way the vertex data is stored in the buffer objects. Changing this format is somewhat expensive in terms of performance.

If you have a number of meshes that all share the same vertex format, it would be useful to be able to put them all in a single set of buffer objects, one after the other. If we have two meshes, A and B, then their data would look like this:

 [A00 A01 A02 A03 A04... Ann B00 B01 B02... Bmm]

B's mesh data immediately follows A's mesh data, with no breaks inbetween.

The glDrawArrays call takes a start index. If we are using unindexed rendering, then this is all we need. We call glDrawArrays once with 0 as the start index and nn as the array count. Then we call it again with nn as the start index and mm as the array count.

Indexed rendering is often very useful, both for memory saving and performance. So it would be great if we can preserve this performance saving optimization when using indexed rendering.

In indexed rendering, each mesh also has an index buffers. glDrawElements takes an offset into the index buffer, so we can use the same mechanism to select which sets of indices to use.

The problem is the contents of these indices. The third vertex of mesh B is technically index 02. However, the actual index is determined by the location of that vertex relative to where the format was defined. And since we're trying to avoid redefining the format, the format still points to the start of the buffer. So the third vertex of mesh B is actually at index 02 + nn.

We could in fact store these indices in the index buffer that way. We could go through all of mesh B's indices and add nn to them. But we don't have to.

Instead, we can use this function:

 void glDrawElementsBaseVertex( GLenum mode​, GLsizei count​,
   GLenum type​, void *indices​, GLint basevertex​);

This works as glDrawElements does, except that basevertex​ is added to each index before pulling from the vertex data. So for mesh A, we pass a base vertex of 0 (or just use glDrawElements), and for mesh B, we pass a base vertex of nn.

Note: When combining with primitive restart, the restart test happens before the base index is added to the index.


It is often useful to be able to render multiple copies of the same mesh in different locations. If you're doing this with small numbers, like 5-20 or so, multiple draw commands with shader uniform changes between them (to tell which is in which location) is reasonably fast in performance. However, if you're doing this with large numbers of meshes, like 5,000+ or so, then it can be a performance problem.

Instancing is a way to get around this. The idea is that your vertex shader has some internal mechanism for deciding where each instance of the rendered mesh goes based on a single number. Perhaps it has a table (stored in a Buffer Texture or Uniform Buffer Object) that it indexes with the instance number to get the per-instance data it needs. Perhaps it uses an attribute divisor for certain attributes, which increments for each instance. Or perhaps it has a simple algorithm for computing the location of an instance based on its number.

Regardless of the mechanism, if you want to do instanced rendering, you call:

 void glDrawArraysInstanced( GLenum mode​, GLint first​,
   GLsizei count​, GLsizei instancecount​ );
 void glDrawElementsInstanced( GLenum mode​, GLsizei count​, 
   GLenum type​, const void *indices​, GLsizei instancecount​ );

It will send the same vertices instancecount​ number of times, as though you called glDrawArrays/Elements​ in a loop of instancecount​ length. However, the vertex shader is given a special input value: gl_InstanceID​. It will receive a value on the half-open range [0, instancecount​) based on which instance of the mesh is being rendered. gl_InstanceID​ and using instanced attribute arrays are the only mechanisms for being able to differentiate between instances.

In OpenGL 4.2 or with ARB_base_instance, the starting instance can be specified with "BaseInstance" commands, as follows:

 void glDrawArraysInstancedBaseInstance( GLenum mode​, GLint first​,
   GLsizei count​, GLsizei instancecount​, GLuint baseinstance​​ );
 void glDrawElementsInstancedBaseInstance( GLenum mode​, GLsizei count​, 
   GLenum type​, const void *indices​, GLsizei instancecount​, GLuint baseinstance​​ );

The baseinstance​ specifies the first instance. The instancecount​ still represents the number of instances. The instance used by the attribute divisor is biased by the base instance. That is, it starts at the base instance and is incremented by 1 for each instance. So the base instance affects the attribute divisor.

Warning: The input gl_InstanceID​ does not follow the baseinstance​. gl_InstanceID​ always falls on the half-open range [0, instancecount​ ).


Implementations of OpenGL can often find it useful to know how much vertex data is being used in a buffer object. For non-indexed rendering, this is pretty easy to determine: the first​ and count​ parameters of the Arrays functions gives you appropriate information. For indexed rendering, this is more difficult, as the index buffer can use potentially any index up to its size.

Still for optimization purposes, it is useful for implementations to know the range of indexed rendering data. Implementations may even read index data manually to determine this.

The "Range" series of glDrawElements commands allows the user to specify that this indexed rendering call will never cause indices outside of the given range of values to be sourced. The call works as follows:

 void glDrawRangeElements( GLenum mode​, GLuint start​, 
   GLuint end​, GLsizei count​, GLenum type​, void *indices​ );

Unlike the "Arrays" functions, the start​ and end​ parameters specify the minimum and maximum index values (from the element buffer) that this draw call will use (rather than a first and count-style). If you try to violate this restriction, you will get implementation-behavior (ie: rendering may work fine or you may get garbage).

There is one index that is allowed outside of the area bound by start​ and end​: the primitive restart index. If primitive restart is set and enabled, it does not have to be within the given boundary.

Implementations may have a specific "sweet spot" for the range of indices, such that using indices within this range will have better performance. They expose such values with a pair of glGetIntegerv enumerators. To get the best performance, end​ - start​ should be less than or equal to GL_MAX_ELEMENTS_VERTICES, and count​ (the number of indices to be rendered) should be less than or equal to GL_MAX_ELEMENTS_INDICES.


It is often useful to combine these optimization techniques. Primitive restart can be combined with any of them, so long as they are using indexed rendering. The primitive restart comparison test, in the case of BaseVertex calls, is done before the base index is added to the index from the mesh.

Base vertex can be combined with any one of MultiDraw, Range, or Instancing. These functions are:

 void glMultiDrawElementsBaseVertex( GLenum mode​, 
   GLsizei *count​, GLenum type​, void **indices​, 
   GLsizei primcount​, GLint *basevertex​ );
 void glDrawRangeElementsBaseVertex( GLenum mode​, 
   GLuint start​, GLuint end​, GLsizei count​, GLenum type​, 
   void *indices​, GLint basevertex​ );
 void glDrawElementsInstancedBaseVertex( GLenum mode​, 
   GLsizei count​, GLenum type​, const void *indices​, 
   GLsizei instancecount​, GLint basevertex​ );

In the case of MultiDraw, the basevertex​ parameter is an array, so each primitive can have its own base index.

BaseVertex and instancing can also be combined with BaseInstance in GL 4.2 or ARB_base_instance, thus yielding the massively named:

 void glDrawElementsInstancedBaseVertexBaseInstance( GLenum mode​, 
   GLsizei count​, GLenum type​, const void *indices​, 
   GLsizei instancecount​, GLint basevertex​, GLuint baseinstance​);

None of the other features can be combined with one another. So Range does not combine with MultiDraw.

Transform feedback rendering

Transform feedback rendering
Core in version 4.5
Core since version 4.0
Core ARB extension ARB_transform_feedback2, ARB_transform_feedback3, ARB_transform_feedback_instanced

When using Transform Feedback to generate vertices for rendering, you often use a asynchronous query object to get the number of primitives, and then use this number to compute the number of vertices for glDrawArrays or glDrawArraysInstanced call, where appropriate.

However, using a query object for this requires a GPU->CPU->GPU transfer of information. You have to read from the query object on the CPU, then transfer that information to your draw call.

This feature allows a way to bypass this. These functions allow the user to draw everything that was rendered during a transform feedback operation, without the CPU having to explicitly read the value back.

Note: The only thing these functions do is issue the rendering call. They do not bind the transform feedback buffers. They do not modify any VAO state. The only thing pulled from the transform feedback object is the number of primitives that were rendered to that stream. It is your responsibility to set up the vertex arrays for actually rendering before making these calls.
Note: On the first transform feedback pass a non-Transform glDraw* function must be called to write the vertex data to the transform feedback buffer since the transform feedback object does not yet have the vertex count information. Once this is done, glDrawTransform* can be used both during transform feedback and rendering to screen.

To perform non-instanced rendering from a transform feedback object, these functions are used:

void glDrawTransformFeedback(GLenum mode​, GLuint id​);
void glDrawTransformFeedbackStream(GLenum mode​, GLuint id​, GLuint stream​);

mode​ is the usual Primitive type. The id​ is the transform feedback object to draw from. The stream​ is the stream in the feedback object to get the vertex count from. Note that glDrawTransformFeedback is equivalent to calling glDrawTransformFeedbackStream with a stream​ of zero.

If GL 4.2 or ARB_transform_feedback_instanced is available, then the instanced version of these functions can be used:

void glDrawTransformFeedbackInstanced(GLenum mode​, GLuint id​, GLsizei instancecount​);
void glDrawTransformFeedbackStreamInstanced(GLenum mode​, GLuint id​, GLuint stream​, GLsizei instancecount​);

These function as glDrawArraysInstanced. There are no BaseInstance versions of these.

Indirect rendering

Indirect rendering
Core in version 4.5
Core since version 4.0
Core ARB extension ARB_draw_indirect, ARB_multi_draw_indirect, ARB_base_instance

Indirect rendering is the process of issuing a rendering command to OpenGL, except that most of the parameters to that command come from GPU storage provided by a Buffer Object. For example, glDrawArrays takes a primitive type, the number of vertices, and the starting vertex. When using the indirect rendering command glDrawArraysIndirect, the starting vertex and number of vertices to render would instead be stored in a buffer object.

The purpose of this is to allow GPU processes to fill these values in. This could be a compute shader, a specially designed geometry shader coupled with transform feedback, or an OpenCL/CUDA process. The idea is to avoid the GPU->CPU->GPU round-trip; the GPU decides what range of vertices to render with. All the CPU does is decide when to issue the rendering command, as well as which Primitive is used with that command.

The indirect rendering functions take their data from the buffer currently bound to the GL_DRAW_INDIRECT_BUFFER binding. Thus, any of these functions will fail if no buffer is bound to that binding.

All of the indirect rendering functions allow the following features:

  • Indexed rendering
    • Base vertex (for indexed rendering)
  • Instanced rendering
  • Base instance (if GL 4.2 or ARB_base_instance is available)

Thus, they act as the largest combination of features of the supported implementation.

For non-indexed rendering, the indirect equivalent to glDrawArraysInstancedBaseInstance is this:

void glDrawArraysIndirect(GLenum mode​, const void *indirect​);

The mode​ is the usual primitive type. indirect​ is the offset into the GL_DRAW_INDIRECT_BUFFER to find the beginning of the data.

The data is provided as if in a C struct of the following definition:

typedef  struct {
   GLuint  count;
   GLuint  instanceCount;
   GLuint  first;
   GLuint  baseInstance;
} DrawArraysIndirectCommand;

This represents a draw call equivalent to:

glDrawArraysInstancedBaseInstance(mode, cmd->first, cmd->count, cmd->instanceCount, cmd->baseInstance);
Note: if GL 4.2 or ARB_base_instance are not available, then the baseInstance​ field must be 0 or undefined behavior results.

If GL 4.3 or ARB_multi_draw_indirect are available, then multiple indirect array rendering commands can be issued in one call with this:

void glMultiDrawArraysIndirect(GLenum mode​, const void *indirect​, GLsizei drawcount​, GLsizei stride​);

The drawcount​ is the number of indirect rendering commands to issue; the stride​ is the byte offset from one rendering command to the next. It can be set to zero; if so, then the array of indirect commands is assumed to be tightly backed (ie: 16-byte stride). The stride​ must be a multiple of 4.

For indexed rendering, the indirect equivalent to glDrawElementsInstancedBaseVertexBaseInstance is this:

void glDrawElementsIndirect}(GLenum mode​, GLenum type​, const void *indirect​);

The mode​ and type​ parameters work as they do in regular glDrawElements-style functions. As with other indirect functions, the indirect​ is the byte-offset into the GL_DRAW_INDIRECT_BUFFER to find the indirect data structure.

In indexed rendering, the structure is defined as follows:

typedef  struct {
    GLuint  count;
    GLuint  instanceCount;
    GLuint  firstIndex;
    GLuint  baseVertex;
    GLuint  baseInstance;
} DrawElementsIndirectCommand;

This represents a draw call equivalent to:

glDrawElementsInstancedBaseVertexBaseInstance(mode, cmd->count, type,
  cmd->firstIndex * size-of-type, cmd->instanceCount, cmd->baseVertex, cmd->baseInstance);

Where size-of-type​ is the size in bytes of type​.

Note: if GL 4.2 or ARB_base_instance are not available, then the baseInstance​ field must be 0 or undefined behavior results.

If GL 4.3 or ARB_multi_draw_indirect are available, then multiple indirect indexed rendering commands can be issued in one call with this:

 void glMultiDrawElementsIndirect(GLenum mode​, GLenum type​, const void *indirect​, GLsizei drawcount​, GLsizei stride​);

The drawcount​ is the number of indirect rendering commands to issue; the stride​ is the byte offset from one rendering command to the next. It can be set to zero; if so, then the array of indirect commands is assumed to be tightly backed (ie: 20-byte stride). The stride​ must be a multiple of 4.

Conditional rendering

Conditional Rendering
Core in version 3.0
Vendor extension NV_conditional_render

Conditional rendering is a mechanism for making the execution of one or more rendering commands conditional on the result of an Occlusion Query operation. This feature allows you to render some cheap object, then use an occlusion query to see if any of it is visible. If it is, then you can render the expensive object, but if it isn't, then you can save time and performance.

This is done with the following functions:

glBeginConditionalRender(GLuint id​, GLenum mode​);

All rendering commands issued within the boundaries of these two functions will only execute if the occlusion condition specified by id​ is tested to be true. For GL_SAMPLES_PASSED queries, it is considered true (and thus rendering commands are executed) if the number of samples is not zero.

The commands that can be conditioned are:

The mode​ parameter determines how the discarding of the rendering functions is performed. It can be one of the following:

  • GL_QUERY_WAIT​: OpenGL will wait until the query result is returned, then decide whether to execute the rendering command. This ensures that the rendering commands will only be executed if the query fails. Note that it is OpenGL that's waiting, not (necessarily) the CPU.
  • GL_QUERY_NO_WAIT​: OpenGL may execute the rendering commands anyway. It will not wait to see if the query test is true or not. This is used to prevent pipeline stalls if the time between the query test and the execution of the rendering commands is too short.
  • GL_QUERY_BY_REGION_WAIT: OpenGL will wait until the query result is returned, then decide whether to execute the rendering command. However, the rendered results will be clipped to the samples that were actually rasterized in the occlusion query. Thus, the rendered result can never appear outside of the occlusion query area.
  • GL_QUERY_BY_REGION_NO_WAIT: As above, except that it may not wait until the occlusion query is finished. The region clipping still holds.

Note that "wait" in this case does not mean that glEndConditionalRender itself will stall on the CPU. It means that the first command within the conditional rendering scope will not be executed by the GPU until the query has returned. So the CPU will continue processing, but the GPU itself may have a pipeline stall.

See Also