The OpenGL Pipeline Newsletter - Volume 003

Table of Contents
Previous article: OpenGL and Windows Vista™
Next article: Windows Vista and OpenGL-the Facts

Optimize Your Application Performance

In the previous article, “Clean your OpenGL usage using gDEBugger,” we demonstrated how gDEBugger can help you verify that your application uses OpenGL correctly and calls the OpenGL API commands you expect it to call. This article will discuss the use of ATI and NVIDIA performance counters together with gDEBugger's Performance Views to locate graphics pipeline performance bottlenecks.

Graphics Pipeline Bottlenecks

The graphics system generates images through a pipelined sequence of operations. A pipeline runs only as fast as its slowest stage. The slowest stage is often called the pipeline bottleneck. A single graphics primitive (for example, a triangle) has a single graphic pipeline bottleneck. However, the bottleneck may change when rendering a graphics frame that contains multiple primitives. For example, if the application first renders a group of lines and afterwards a group of lit and shaded triangles, we can expect the bottleneck to change.

The OpenGL Pipeline

The OpenGL pipeline is an abstraction of the graphics system pipeline. It contains stages, executed one after the other. Such stages are:

Application: the graphical application, executed on the CPU, calls OpenGL API functions.
Driver: the graphics system driver runs on the CPU and translates OpenGL API calls into actions executed on either the CPU or the GPU.
Geometric operations: the operations required to calculate vertex attributes and position within the rendered 2D image space. This includes: multiplying vertices by the model-view and projection matrices, calculating vertex lighting values, executing vertex shaders, etc.
Raster operations: operations operating on fragments / screen pixels: reading and writing color components, reading and writing depth and stencil buffers, performing alpha blending, using textures, executing fragment shaders, etc.
Frame buffer: a memory area holding the rendered 2D image.

Some of the pipeline stages are executed on the CPU; other stages are executed on the GPU. Most operations that are executed on top of the GPU are executed in parallel.

Remove Performance Bottlenecks

As mentioned in the “Graphics Pipeline Bottlenecks” section, the graphics system runs only as fast as its slowest pipeline stage, which is often called the pipeline bottleneck. The process for removing performance bottlenecks usually involves the following stages:

Identify the bottleneck: Locate the pipeline stage that is the current graphic pipeline bottleneck.
Optimize: Reduce the workload done in that pipeline stage until performance stops improving or until you have achieved the desired performance level.
Repeat: Go back to stage 1.

Notice that after your performance optimizations are done, or after you have reached a bottleneck that you cannot optimize anymore, you can start adding workload to pipeline stages that are not fully utilized without affecting render performance. For example, use more accurate textures, perform more complicated vertex shader operations, etc.

gDEBugger Performance Graph View

gDEBugger Performance Graph view helps you locate your application's graphics pipeline performance bottlenecks; it displays, in real time, graphics system performance metrics. Viewing metrics that measure the workload done in each pipeline stage enables you to estimate the current performance pipeline bottleneck.

gDEBugger Performance Graph View helps you locate your application’s graphic pipeline performance bottlenecks. View Closeup

Such metrics are: CPU user mode and privilege mode utilizations, graphics driver idle, GPU idle, vertex shader utilization, fragment shader utilization, video memory usage, culled primitives counters, frames per seconds (per render context), number of OpenGL function calls per frame, total size of all loaded textures (in texels) and many other counters.

There is no need to make any changes to your source code or recompile your application. The performance counters will be displayed inside the Performance Graph view.
gDEBugger supports operating system performance counters (Windows and Linux), NVIDIA's performance counters via NVPerfKit, ATI's performance metrics and gDEBugger's internal performance counters. Other IHVs' counters will be supported in the future.

gDEBugger Performance Analysis Toolbar

The Performance Analysis toolbar offers commands that enable you to pinpoint application performance bottlenecks by “turning off” graphics pipeline stages. If the performance metrics improve while “turning off” a certain stage, you have found a graphics pipeline bottleneck!

These commands include:

- Eliminate Draw Commands: Identify CPU and bus performance bottlenecks by ignoring all OpenGL commands that push vertices or texture data into OpenGL. When ignoring these commands, the CPU and bus workloads remain unchanged, but the GPU workload is almost totally removed, since most GPU activities are triggered by input primitives (triangles, lines, etc).

- Eliminate Raster operations: Identify raster operation bottlenecks by forcing OpenGL to use a 1x1 pixels view port. Raster operations operate per fragment or pixel. By setting a 1x1 pixel view port most raster operations will be eliminated.

- Eliminate Fixed Pipeline Lighting operations: Identify “fixed pipeline lighting” related calculation bottlenecks. This is done by turning off all OpenGL fixed pipeline lights. Notice that this command does not affect shaders that do not use the fixed pipeline lights.

- Eliminate Textures Data Fetch operations: Identify texture memory performance bottlenecks by forcing OpenGL to use 2x2 pixel stub textures instead of the application defined textures. By using such small stub textures, the texture data fetch operation workload will be almost completely removed.

- Eliminate Fragment Shader Operations: Identify fragment shader related bottlenecks by forcing OpenGL to use a very simple stub fragment shader instead of the application defined fragment shaders.

The “Combined” Approach

Combining the Performance Analysis toolbar with the Performance Graph view gives an even stronger ability to locate performance bottlenecks. Viewing the way performance metrics vary when disabling graphics pipeline stages can give excellent hints for locating the graphics pipeline performance bottleneck.

For example, an application runs at 20 F/S and has 100% fragment shader utilization and 30% vertex shader utilization. When disabling fragment shader operations, the metrics change to 50 F/S, 2% fragment shader utilization and 90% vertex shader utilization. The “combined” approach tells us that the current bottleneck is probably the fragment shader operations. It also tells us that if we optimize and reduce the fragment shader operation workload, the next bottleneck that we will come across will probably be the vertex shader operations.

We hope this article will help you optimize the performance of your OpenGL based applications. In our next article we will talk about OpenGL's debugging model and show how gDEBugger can help you find those “hard to catch” OpenGL-related bugs.

Yaki Tebeka, Graphic Remedy
CTO & Cofounder

Editor's Note: You'll remember from our first edition that Graphic Remedy and the ARB have teamed up to make gDEBugger available free to non-commercial users for a limited time.

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers