Vertex Rendering

Revision as of 10:23, 5 September 2012 by Alfonse (talk | contribs) (categorization)

Jump to: navigation, search

Vertex Rendering is the process of taking vertex data specified in arrays and rendering one or more Primitives with this vertex data.


In order to successfully issue a rendering command, the currently bound Vertex Array Object must have been properly set up with vertex attribute arrays, as defined here. If indexed rendering is to be used, the GL_ELEMENT_ARRAY_BUFFER binding in the VAO must have a Buffer Object bound to it as well.

Causes of rendering errors

The GL_INVALID_OPERATION error can happen when issuing any rendering command for many reasons, most of which have little to do with the actual rendering command itself. The following represent conditions you must ensure are valid when issuing a rendering command.

  • A non-zero Vertex Array Object must be bound.
  • The current framebuffer must be complete. The Default Framebuffer (if present) is always complete, so this usually happens with Framebuffer Objects.
  • The current program must be successfully linked, or if a program pipeline is used, it must be valid.
  • The current program or program pipeline must have a vertex shader and a fragment shader.
  • Textures used by the current programs' sampler and/or image objects must be complete.
  • If a Geometry Shader is present, the mode​ primitive type is incompatible with the primitive input that the GS uses.
  • If the mode​ is GL_PATCH, a Tessellation Shader must be active.

This list is not comprehensive. If you know of more, please add them here.


Rendering can take place as non-indexed rendering or indexed rendering. Indexed rendering uses an element buffer to decide which index in the vertex arrays values are pulled from. This is explained in more detail in the Vertex Specification article.

All non-indexed rendering commands are of the form, gl*Draw*Arrays*​, where the *​ values can be filled in with different words. All indexed rendering commands are of the form, gl*Draw*Elements*​.

Primitive Restart

Primitive restart functionality allows you to tell OpenGL that a particular index value means, not to source a vertex at that index, but to begin a new primitive of the same type with the next vertex. In essence, it is an alternative to glMultiDrawElements (see below). This allows you to have an element buffer that contains multiple triangle strips or fans (or similar primitives where the start of a primitive has special behavior).

The way it works is with the function glPrimitiveRestartIndex. This function takes an index value. If this index is found in the index array, the system will start the primitive processing again as though a second rendering command had been issued. If you use a BaseVertex drawing function, this test is done before the base vertex is added to the restart. Using this feature also requires using glEnable(GL_PRIMITIVE_RESTART);​ to activate it, and the corresponding glDisable​ to turn it off.

Here is an example. Let's say you have an index array as follows:

 { 0 1 2 3 65535 2 3 4 5 }

If you render this as a triangle strip normally, you get 7 triangles. If you render it with glPrimitiveRestartIndex(65535)​ and the primitive restart enabled, then you will get 4 triangles:

 {0 1 2}, {1 2 3}, {2 3 4}, {3 4 5}

Primitive restart works with any rendering function. Even the indirect ones.

Warning: It is technically legal to use this with non-indexed rendering. You should not do this, as it will not give you a useful result.

Direct rendering

These vertex rendering commands provide the various rendering parameters directly as parameters passed to the functions. This contrasts with other rendering commands (see later sections), were some parameters are pulled from OpenGL object sources.

Basic Drawing

The basic drawing functions are these:

 void glDrawArrays( GLenum mode​, GLint first​, GLsizei count​ );
 void glDrawElements( GLenum mode​, GLsizei count​, GLenum type​, void * indices​ );

where, for glDrawArrays:

  • mode​ parameter is the Primitive type.
  • first​ and count​ values in define the range of elements to be pulled from the buffer.

as for glDrawElements:

  • count​ and indices​ parameters define the range of indices.
    • count​ defines how many indices to use.
    • indices​ defines the offset into the index buffer object (bound to GL_ELEMENT_ARRAY_BUFFER, stored in the VAO) to begin reading data.
  • type​ field describes what the type of the indices are:
    • GL_UNSIGNED_BYTE: index range: [0, 255]
    • GL_UNSIGNED_SHORT: index range: [0, 65535]
    • GL_UNSIGNED_INT: index range: [0, 232 - 1].


The basic drawing functions are all you really need in order to send vertices for rendering. However, there are a number of ways to draw that optimize certain rendering cases.

Rendering with a different VAO from the last rendering command is usually a relatively expensive operation. So many of the optimization mechanisms are based on you storing the data for several meshes in the same buffer objects with the same vertex formats and other VAO data.


Binding a VAO or modifying VAO state is often an expensive operation. And there are many cases where you want to render a number of distinct meshes with a single draw call. All of the meshes must be in the same VAO, as must all of the index arrays if you are doing indexed rendering. Also, of course, they must use the same shader program with the same uniform values.

To render multiple primitives from a VAO at once, use this:

 void glMultiDrawArrays( GLenum mode​, GLint *first​, GLsizei *count​, GLsizei primcount​);

This function is conceptually implemented as:

void glMultiDrawArrays( GLenum mode, GLint *first, GLsizei *count, GLsizei primcount )
	for (int i = 0; i < primcount; i++)
		if (count[i] > 0)
			glDrawArrays(mode, first[i], count[i]);

Of course, you could write this function yourself. However, because it all happens in a single OpenGL call, the implementation has the opportunity to optimize this beyond what you could write.

There is an indexed form as well:

 void glMultiDrawElements( GLenum mode​, GLsizei *count​, GLenum type​, void **indices​, GLsizei primcount​ );

Similarly, this is implemented conceptually as:

void glMultiDrawElements( GLenum mode, GLsizei *count, GLenum type, void **indices, GLsizei primcount )
	for (int i = 0; i < primcount; i++)
		if (count[i]) > 0)
			glDrawElements(mode, count[i], type, indices[i]);

Multi-draw is useful for circumstances where you know that you are going to draw a lot of separate primitives of the same kind that all use the same shader. Typically, this would be a single conceptual object that you would always draw together in the same way. You simply pack all of the vertex data into the same VAO and buffer objects, using the various offsets to pick and choose between them.

Base Index

All of the glVertexAttribPointer calls define the format of the vertices. That is, the way the vertex data is stored in the buffer objects. Changing this format is somewhat expensive in terms of performance.

If you have a number of meshes that all share the same vertex format, it would be useful to be able to put them all in a single set of buffer objects, one after the other. If we have two meshes, A and B, then their data would look like this:

 [A00 A01 A02 A03 A04... Ann B00 B01 B02... Bmm]

B's mesh data immediately follows A's mesh data, with no breaks inbetween.

The glDrawArrays call takes a start index. If we are using unindexed rendering, then this is all we need. We call glDrawArrays once with 0 as the start index and nn as the array count. Then we call it again with nn as the start index and mm as the array count.

Indexed rendering is often very useful, both for memory saving and performance. So it would be great if we can preserve this performance saving optimization when using indexed rendering.

In indexed rendering, each mesh also has an index buffers. glDrawElements takes an offset into the index buffer, so we can use the same mechanism to select which sets of indices to use.

The problem is the contents of these indices. The third vertex of mesh B is technically index 02. However, the actual index is determined by the location of that vertex relative to where the format was defined. And since we're trying to avoid redefining the format, the format still points to the start of the buffer. So the third vertex of mesh B is actually at index 02 + nn.

We could in fact store these indices in the index buffer that way. We could go through all of mesh B's indices and add nn to them. But we don't have to.

Instead, we can use this function:

 void glDrawElementsBaseVertex( GLenum mode​, GLsizei count​,
   GLenum type​, void *indices​, GLint basevertex​);

This works as glDrawElements does, except that basevertex​ is added to each index before pulling from the vertex data. So for mesh A, we pass a base vertex of 0 (or just use glDrawElements), and for mesh B, we pass a base vertex of nn.

Note: When combining with primitive restart, the restart test happens before the base index is added to the index.


It is often useful to be able to render multiple copies of the same mesh in different locations. If you're doing this with small numbers, like 5-20 or so, multiple draw commands with shader uniform changes between them (to tell which is in which location) is reasonably fast in performance. However, if you're doing this with large numbers of meshes, like 5,000+ or so, then it can be a performance problem.

Instancing is a way to get around this. The idea is that your vertex shader has some internal mechanism for deciding where each instance of the rendered mesh goes based on a single number. Perhaps it has a table (stored in a Buffer Texture or Uniform Buffer Object) that it indexes with the instance number to get the per-instance data it needs. Or perhaps it has a simple algorithm for computing the location of an instance based on its number.

Regardless of the mechanism, it is based the shader getting an instance number that changes only when it is rendering a new instance. If you want to do instanced rendering, you call:

 void glDrawArraysInstanced( GLenum mode​, GLint first​,
   GLsizei count​, GLsizei primcount​ );
 void glDrawElementsInstanced( GLenum mode​, GLsizei count​, 
   GLenum type​, const void *indices​, GLsizei primcount​ );

It will send the same vertices primcount​ number of times, as though you called glDrawArrays/Elements​ in a loop of primcount​ length. However, the vertex shader is given a special input value: gl_InstanceID​. It will receive a value from 0 to primcount​-1 based on which instance of the mesh is being rendered. This is the only mechanism the vertex shader has for differentiating between instances; it is up to the shader itself to decide how to use this information.


Implementations of OpenGL can often find it useful to know how much vertex data is being used in a buffer object. For non-indexed rendering, this is pretty easy to determine: the first​ and count​ parameters of the Arrays functions gives you appropriate information. For indexed rendering, this is more difficult, as the index buffer can use potentially any index up to its size.

Still for optimization purposes, it is useful for implementations to know the range of indexed rendering data. Implementations may even read index data manually to determine this.

The "Range" series of glDrawElements commands allows the user to specify that this indexed rendering call will never cause indices outside of the given range of values to be sourced. The call works as follows:

 void glDrawRangeElements( GLenum mode​, GLuint start​, 
   GLuint end​, GLsizei count​, GLenum type​, void *indices​ );

Unlike the "Arrays" functions, the start​ and end​ parameters specify the minimum and maximum index values (from the element buffer) that this draw call will use (rather than a first and count-style). If you try to violate this restriction, you will get implementation-behavior (ie: rendering may work fine or you may get garbage).

There is one index that is allowed outside of the area bound by start​ and end​: the primitive restart index. If primitive restart is set and enabled, it does not have to be within the given boundary.

Implementations may have a specific "sweet spot" for the range of indices, such that using indices within this range will have better performance. They expose such values with a pair of glGetIntegerv enumerators. To get the best performance, end​ - start​ should be less than or equal to GL_MAX_ELEMENTS_VERTICES, and count​ (the number of indices to be rendered) should be less than or equal to GL_MAX_ELEMENTS_INDICES.


It is often useful to combine these optimization techniques. Primitive restart can be combined with any of them, so long as they are using indexed rendering. The primitive restart comparison test, in the case of BaseVertex calls, is done before the base index is added to the index from the mesh.

Base vertex can be combined with any one of MultiDraw, Range, or Instancing. These functions are:

 void glMultiDrawElementsBaseVertex( GLenum mode​, 
   GLsizei *count​, GLenum type​, void **indices​, 
   GLsizei primcount​, GLint *basevertex​ );
 void glDrawRangeElementsBaseVertex( GLenum mode​, 
   GLuint start​, GLuint end​, GLsizei count​, GLenum type​, 
   void *indices​, GLint basevertex​ );
 void glDrawElementsInstancedBaseVertex( GLenum mode​, 
   GLsizei count​, GLenum type​, const void *indices​, 
   GLsizei primcount​, GLint basevertex​ );

In the case of MultiDraw, the basevertex​ parameter is an array, so each primitive can have its own base index.

None of the other features can be combined with one another. So Range does not combine with MultiDraw.

Transform feedback rendering

Transform feedback rendering
Core in version 4.5
Core since version 4.0
Core ARB extension ARB_transform_feedback2, ARB_transform_feedback3, ARB_transform_feedback_instanced

Transform Feedback objects can be used to render the results of a feedback operation. These are the effective equivalent of a glDrawArrays or glDrawArraysInstanced call, where appropriate.

The most important thing to note is that the only thing these functions do is issue the rendering call. They do not bind the transform feedback buffers. They do not modify any VAO state. The only thing pulled from the transform feedback object is the number of primitives that were rendered to that stream. It is your responsibility to set up the vertex arrays for actually rendering before making these calls.

The purpose of this feature is to avoid the GPU->CPU->GPU round-trip for the number of vertices written to a transform feedback stream. This is better even than using a Query Object, as any waiting will happen on the GPU rather than the CPU.

To perform non-instanced rendering from a transform feedback object, these functions are used:

void glDrawTransformFeedback(GLenum mode​, GLuint id​);
void glDrawTransformFeedbackStream(GLenum mode​, GLuint id​, GLuint stream​);

mode​ is the usual Primitive type. The id​ is the transform feedback object to draw from. The stream​ is the stream in the feedback object to get the vertex count from. Note that glDrawTransformFeedback is equivalent to calling glDrawTransformFeedbackStream with a stream​ of zero.

If GL 4.2 or ARB_transform_feedback_instanced is available, then the instanced version of these functions can be used:

void glDrawTransformFeedbackInstanced(GLenum mode​, GLuint id​, GLsizei instancecount​);
void glDrawTransformFeedbackStreamInstanced(GLenum mode​, GLuint id​, GLuint stream​, GLsizei instancecount​);

These function as glDrawArraysInstanced. There are no BaseInstance versions of these.

Indirect rendering

Indirect rendering
Core in version 4.5
Core since version 4.0
Core ARB extension ARB_draw_indirect, ARB_multi_draw_indirect, ARB_base_instance

Indirect rendering is the process of issuing a rendering command to OpenGL, except that most of the parameters to that command come from GPU storage provided by a Buffer Object. For example, glDrawArrays takes a primitive type, the number of vertices, and the starting vertex. When using the indirect rendering command glDrawArraysIndirect, the starting vertex and number of vertices to render would instead be stored in a buffer object.

The purpose of this is to allow GPU processes to fill these values in. This could be a compute shader, a specially designed geometry shader coupled with transform feedback, or an OpenCL/CUDA process. The idea is to avoid the GPU->CPU->GPU round-trip; the GPU decides what range of vertices to render with. All the CPU does is decide when to issue the rendering command, as well as which Primitive is used with that command.

The indirect rendering functions take their data from the buffer currently bound to the GL_DRAW_INDIRECT_BUFFER binding. Thus, any of these functions will fail if no buffer is bound to that binding.

All of the indirect rendering functions allow the following features:

  • Indexed rendering
    • Base vertex (for indexed rendering)
  • Instanced rendering
  • Base instance (if GL 4.2 or ARB_base_instance is available)

Thus, they act as the largest combination of features of the supported implementation.

For non-indexed rendering, the indirect equivalent to glDrawArraysInstancedBaseInstance is this:

void glDrawArraysIndirect(GLenum mode​, const void *indirect​);

The mode​ is the usual primitive type. indirect​ is the offset into the GL_DRAW_INDIRECT_BUFFER to find the beginning of the data.

The data is provided as if in a C struct of the following definition:

typedef  struct {
   GLuint  count;
   GLuint  instanceCount;
   GLuint  first;
   GLuint  baseInstance;
} DrawArraysIndirectCommand;

This represents a draw call equivalent to:

glDrawArraysInstancedBaseInstance(mode, cmd->first, cmd->count, cmd->instanceCount, cmd->baseInstance);
Note: if GL 4.2 or ARB_base_instance are not available, then the baseInstance​ field must be 0 or undefined behavior results.

If GL 4.3 or ARB_multi_draw_indirect are available, then multiple indirect array rendering commands can be issued in one call with this:

void glMultiDrawArraysIndirect(GLenum mode​, const void *indirect​, GLsizei drawcount​, GLsizei stride​);

The drawcount​ is the number of indirect rendering commands to issue; the stride​ is the byte offset from one rendering command to the next. It can be set to zero; if so, then the array of indirect commands is assumed to be tightly backed (ie: 16-byte stride). The stride​ must be a multiple of 4.

For indexed rendering, the indirect equivalent to glDrawElementsInstancedBaseVertexBaseInstance is this:

void glDrawElementsIndirect}(GLenum mode​, GLenum type​, const void *indirect​);

The mode​ and type​ parameters work as they do in regular glDrawElements-style functions. As with other indirect functions, the indirect​ is the byte-offset into the GL_DRAW_INDIRECT_BUFFER to find the indirect data structure.

In indexed rendering, the structure is defined as follows:

typedef  struct {
    GLuint  count;
    GLuint  instanceCount;
    GLuint  firstIndex;
    GLuint  baseVertex;
    GLuint  baseInstance;
} DrawElementsIndirectCommand;

This represents a draw call equivalent to:

glDrawElementsInstancedBaseVertexBaseInstance(mode, cmd->count, type,
  cmd->firstIndex * size-of-type, cmd->instanceCount, cmd->baseVertex, cmd->baseInstance);

Where size-of-type​ is the size in bytes of type​.

Note: if GL 4.2 or ARB_base_instance are not available, then the baseInstance​ field must be 0 or undefined behavior results.

If GL 4.3 or ARB_multi_draw_indirect are available, then multiple indirect indexed rendering commands can be issued in one call with this:

 void glMultiDrawElementsIndirect(GLenum mode​, GLenum type​, const void *indirect​, GLsizei drawcount​, GLsizei stride​);

The drawcount​ is the number of indirect rendering commands to issue; the stride​ is the byte offset from one rendering command to the next. It can be set to zero; if so, then the array of indirect commands is assumed to be tightly backed (ie: 20-byte stride). The stride​ must be a multiple of 4.

Conditional rendering

Conditional Rendering
Core in version 3.0
Vendor extension NV_conditional_render

Conditional rendering is a mechanism for making the execution of one or more rendering commands conditional on the result of an Occlusion Query operation. This is done with the following functions:

glBeginConditionalRender(GLuint id​, GLenum mode​);

All rendering commands issued within the boundaries of these two functions will only execute if the occlusion condition specified by id​ is tested to be true. For GL_SAMPLES_PASSED queries, it is considered true (and thus rendering commands are executed) if the number of samples is not zero.

The commands that can be conditioned are:

The mode​ parameter determines how the discarding of the rendering functions is performed. It can be one of the following:

  • GL_QUERY_WAIT​: OpenGL will wait until the query result is returned, then decide whether to execute the rendering command. This ensures that the rendering commands will only be executed if the query fails.
  • GL_QUERY_NO_WAIT​: OpenGL may execute the rendering commands anyway. It will not wait to see if the query test is true or not. This is used to prevent pipeline stalls if the time between the query test and the execution of the rendering commands is too short.
  • GL_QUERY_BY_REGION_WAIT: OpenGL will wait until the query result is returned, then decide whether to execute the rendering command. However, the rendered results will be clipped to the samples that were actually rasterized in the occlusion query. Thus, the rendered result can never appear outside of the occlusion query area.
  • GL_QUERY_BY_REGION_NO_WAIT: As above, except that it may not wait until the occlusion query is finished. The region clipping still holds.

Note that "wait" in this case does not mean that glEndConditionalRender itself will stall on the CPU. It means that the first command within the conditional rendering scope will not be executed by the GPU until the query has returned. So the CPU will continue processing, but the GPU itself may have a pipeline stall.

See Also