Extensions for Managing Textures (And Other Objects) in OpenGL

Allen Akin <akin@valinux.com>
Version 1 - August 28, 2000

Motivation

Most current graphics hardware includes onboard high-performance memory for storing textures, geometric primitives, and other data used for rendering.

OpenGL provides a few simple mechanisms for controlling use of this memory: creation/deletion/clobber of offscreen rendering targets (pbuffers), prioritization of textures, prioritization of display lists (on some implementations), and proxy texture queries (to determine if a single texture can be loaded into high-speed texture memory).

OpenGL drivers are expected to handle most memory management automatically, using only the clues provided by those mechanisms, and without exposing the internal organization of graphics memory to the application. The justification for this design is the large variation in hardware: differing numbers of physically separate memories; single-banked vs. multi-banked memory; differing allocation granularities; differing allocation alignment constraints; differing internal formats for textures and pixel arrays; etc. Exposing all these differences would lead inevitably to fragile, nonportable application code.

However, both applications and hardware have evolved since the current memory management mechanisms were added to OpenGL. Those mechanisms now exhibit practical shortcomings, including:

  1. The proxy texture query mechanism can gather information for only one texture at a time. There is no way to determine whether a set of textures can be made coresident. Technically this makes it impossible to determine whether a particular configuration of textures can be used for multitextured rendering. We have been able to live with this situation only because dual-textured hardware is the norm at the moment, and texture memories are large enough to contain any two textures of interest to current rendering algorithms. This is changing, as higher-order multitexturing hardware becomes available and as OpenGL moves into embedded hardware with more severe memory constraints.
  2. Texture priorities are intended to allow apps to guide memory management by the driver. However, there is little consistency in the way current drivers interpret the priorities, and as a result application developers make little use of them. Even when texture priorities are implemented as intended, only priorities 0.0 and 1.0 see heavy use. Apps generally want to force textures to be resident (to avoid reloading them), lock textures in place (to minimize fragmentation), mark textures for unloading if space is needed, or even mark textures for automatic deletion (to avoid unloading them). It appears that there's a disconnect between what the apps need and what the priority mechanism provides.
  3. OpenGL's memory model implies there are just two memory pools in which textures may reside: Host memory and texture memory. Textures must be "made resident" in texture memory before they may be used for rendering. However, current PC hardware offers more options; in particular, effective use of AGP memory may be critical on some systems, and AGP memory has characteristics that differ from OpenGL's notions of both host and texture memory.
  4. Furthermore, in many current systems there is no clear distinction between the memory used for textures and the memory used for other objects (display lists, back buffers, pbuffers, color tables, etc.). This is very often the case for AGP memory. Applications must trade off the storage requirements of various classes of objects, but the OpenGL mechanisms fall short of providing the functionality needed to do this, because they focus on managing within a given object class rather than across all classes. This has also given rise to redundancy in the API, for example, the separate prioritization systems for display lists and textures.
  5. An OpenGL driver's memory defragmentation and object-cache replacement policies aren't visible to applications. While this insulates apps from nonportable aspects of the driver, it also leads to unpredictable performance. This is especially troublesome for interactive applications that need to maintain a constant rendering rate.
  6. To work around the previous problem, apps sometimes resort to direct management of textures by creating dummy texture objects (which are always resident) and modifying their contents with TexSubImage2D (or equivalent). This has several disadvantages: apps can't readily determine how many dummy textures of which sizes and internal formats will fit; multiple mipmapped subtextures can't be packed into a single dummy texture (without other workarounds that are too complex to discuss here, and have disadvantages of their own); and extra overhead is incurred on some machines that must reformat textures as they're loaded.

We can provide new mechanisms to mitigate those problems. In addition, there are new opportunities we might wish to address:

  1. Current high-level APIs ("engines" or "scene graphs") can do some optimization to reduce the cost of state changes, especially sorting to minimize texture loads and binds. More sophisticated algorithms for this are feasible, but they require information about the cost of each memory management operation, rather than simply whether or not it's possible. This information is not now available because texture memory defragmentation and cache-flushing operations take place at unpredictable times and have costs that can't be estimated accurately.
  2. New objects that are likely to stress the memory management capabilities of drivers are under development; chief among them are state objects and vertex array objects. Ideally, any new mechanism we provide would make consistent the treatment of these objects as well as textures, display lists, rendering targets, and possibly color tables.
  3. "Embedded" devices with 3D graphics capability are becoming more significant commercially, and OpenGL has already been ported to many of them. These devices may have new memory allocation semantics, or at least memory management tradeoffs that are very different from those of workstations, image generators, and PCs. (Mobile devices, for example, may page managed objects across a wireless connection.) Better application control over memory management behavior is essential for environments in which memory management operations may cost several orders of magnitude more than in traditional OpenGL environments.

Proposal

General Design Principles

We'll provide resource virtualization for a single rendering context (and all the contexts in a share group), but not for multiple independent contexts. This is OpenGL's current behavior.

The implication is that we're optimizing for the case of a single application. This simplifies the application's view of the memory-management model (e.g., memory isn't consumed by any party other than the application), and is the most appropriate choice for real-time or interactive applications.

We'll continue to leave the lowest-level details of memory management to the driver, so that we can accommodate unusual hardware (and leave designers free to create unusual hardware!). However, we will provide the application with much more control over memory-management behavior than it has today.

Memory Pools

Until now we have operated under a very simple assumption about memory allocation pools: There is host memory, and there is an object-specific memory for each object type (e.g. texture memory and display list memory). The object-specific memories are decoupled; for example, creating millions of display lists must not affect the amount of memory available for storing texture images. We must generalize this in three ways.

  1. We must allow for the existence of new pools of memory for the new objects (e.g. vertex array objects) that are currently under development.
  2. We must introduce new levels in the memory hierarchy. In today's PCs, for example, textures might reside in onboard memory, in AGP memory (typically at some cost in performance), or in host memory. It's reasonable to move textures from host memory to AGP memory to onboard memory either manually or automatically in order to improve onboard memory utilization or increase rendering performance.
    Note that only the driver has enough information to implement the proper hierarchy for each type of object. For example, there are some PC systems in which onboard memory is the preferred location for textures and AGP memory is available for overflow; other PC systems in which AGP memory is the primary location for textures; and still other PC systems in which AGP memory is completely unavailable.
  3. We must allow objects of different types to coreside in some memory pools, and consequently to compete for space in them. For example, both texture objects and vertex array objects might compete for space in AGP memory. Note that this requires us to relax a constraint that currently exists in OpenGL: If a proxy texture query indicates that a texture can be loaded, then it will always be true that the texture can be loaded. (Once texture memory is "promised" to an app, it can't be taken away.) In this proposal, apps are free to trade off textures with other objects that reside in the same memory pool. Apps might also pin textures in memory, with the side-effect that fragmentation might prevent previously-loadable textures from fitting.

As a start, consider the following list of memory pools. (XXX represents an extension suffix yet to be chosen.)


#define GL_MEMORY_ONBOARD_XXX		0x00000001
#define GL_MEMORY_AGP_XXX		0x00000002
#define GL_MEMORY_HOST_XXX		0x00000004
    

The MEMORY_ONBOARD_XXX pool represents high-performance local graphics memory; it may be logically or physically partitioned so that some portions are reserved for objects of a specific type, or it may be a single shared pool. The MEMORY_AGP_XXX pool is reserved for hardware that supports the AGP standard. Finally, there is the MEMORY_HOST_XXX pool which represents main memory.

We have expressed these enumerants as bits in a bitmask, in case some future memory management policy is capable of supporting residence in more than one pool simultaneously.

Future extensions may add to this list.

Object Types

Currently we have multiple namespaces for OpenGL objects: a given integer value might be used as the name for both a texture and a display list, and would be interpreted according to the context in which it's used. Since we are now considering an API in which objects of different types appear in a single context, we must use object types to disambiguate identification numbers.

Textures introduce one further complication. Some systems require multitextures to reside in distinct banks of memory. These banks are not hierarchical in the sense of the memory pools discussed above; but for purposes of memory management they must somehow be taken into account. For this proposal we have chosen to encode information about the use of the texture into the object type.


#define GL_OBJECT_TEXTURE_XXX		...	// Currently active texture
#define GL_OBJECT_TEXTURE0_XXX		...	// Texture on unit 0
#define GL_OBJECT_TEXTURE1_XXX		...	// Texture on unit 1, and so on
...
#define GL_OBJECT_TEXTURE31_XXX		...
#define GL_OBJECT_DISPLAY_LIST_XXX	...
    

Extensions for features such as vertex array objects would add to this list in the obvious way.

Offscreen rendering targets are also candidates for memory management. Currently they are not visible to the OpenGL core; they have no identification numbers analogous to texture object IDs or display list IDs. This requires further thought, not only for memory management, but also for extensions such as render-to-texture.

Groups of Objects

Existing OpenGL memory management mechanisms apply to single objects -- loading a texture, querying a proxy texture, etc. In order to handle cases where several objects need to be coresident to complete a single drawing operation (e.g. multitexturing), we must be able to handle groups of objects. In general, a group must consist of an array of object ID numbers and a corresponding array of object types.

Object Residence Policies

Texture priorities are not quite sufficient to express the texture residence policies most applications need. They fall short in that the priority semantics aren't ironclad; neither consistent across implementations, nor guaranteed for a single implementation.

Rather than attempt to enforce new semantics for texture priorities, it makes more sense to enumerate object residence policies that cover most application needs, and then allow them to be applied to any managed object. For example, it should be possible to pin objects (preventing them from being moved), cache them, etc. And it should be possible to force them to move from one memory pool to another in a reliable way.

We propose the following set of policies:


#define GL_POLICY_PINNED_XXX		...
#define GL_POLICY_LOADED_XXX		...
#define GL_POLICY_CACHED_XXX		...
    

POLICY_PINNED_XXX means that the object, once loaded into the desired memory pool, will be neither unloaded nor moved by the memory manager. (It may be unloaded, moved, or deleted explicitly by the application.) This is intended to give the application enough control over memory layout to prevent fragmentation (long-lived objects can be loaded first) and to prevent unpredictable performance hits due to defragmentation by the memory manager.

POLICY_LOADED_XXX means that the object, once loaded, will not be unloaded by the memory manager. It may, however, be moved (usually during defragmentation). This gives apps the ability to trade off some defragmentation delays for improved memory utilization.

POLICY_CACHED_XXX means that the app prefers that the object be loaded into the desired memory pool, but the object may be shifted down the memory hierarchy to make room for an object that uses the POLICY_LOADED_XXX or POLICY_PINNED_XXX policies. This is much like the standard OpenGL texture management mechanism.

Setting Object Residence

The following command changes the residence or memory management policy associated with each of a set of objects:


GLsizei
glObjectResidenceXXX(
    GLsizei n,
    const GLenum *type,
    const GLuint *id,
    const GLenum *policy,
    const GLenum *pool
    )
    

Each of the arrays type, id, policy, and pool must contain n elements. The corresponding elements of type and id specify an object. The corresponding element of policy specifies a memory management policy for that object. The corresponding element of pool specifies a memory pool for that object.

ObjectResidenceXXX forces the objects to be managed according to the specified policy, and forces them to be placed in the specified memory pool (or another in the appropriate hierarchy, if the policy is POLICY_CACHED_XXX).

If ObjectResidenceXXX can set all the residencies and policies as requested, its return value is equal to n and it does not set a GL error. Otherwise, the return value is equal to the number of objects that were managed as requested, and a GL error is set as follows:

ErrorFailure Conditions
INVALID_ENUM An element of type or policy was unrecognized or impermissible in context.
INVALID_VALUE An element of id is not currently the identifier of an object of the appropriate type.
OUT_OF_MEMORY One or more objects could not be placed as specified by the app. This could be because an individual object is too large for its chosen memory pool, or because too much space in the pool is already occupied by objects with the POLICY_PINNED_XXX or POLICY_LOADED_XXX policies.

See Memory Management Algorithm for further discussion of the memory management behavior on which apps can rely.

Testing Object Residence


GLsizei
glTestObjectResidenceXXX(
    GLsizei n,
    const GLenum *type,
    const GLuint *id,
    const GLenum *policy,
    const GLenum *pool,
    GLenum *assignedPool,
    GLfloat *estimatedTime
    )
    

TestObjectResidenceXXX performs a trial run of the memory management operations needed to honor an object residence-setting command.

The arguments n, id, policy, and pool have the same meanings as the identically-named arguments for ObjectResidenceXXX.

The elements of the argument assignedPool will be set to the memory pool assigned to the corresponding objects. This is most obviously relevant for objects with the POLICY_CACHED_XXX policy, which might be assigned to any of several pools in the hierarchy for the appropriate object type. However, it is also relevant in more subtle circumstances as discussed below (Memory Management Algorithm).

The elements of the argument estimatedTime will be set to an estimate of the number of seconds required to honor the residence-setting command for the corresponding object. If the command for a given object cannot be honored, then the corresponding element of estimatedTime will be set to -1.0. Thus, if all the values returned in estimatedTime are nonnegative, their sum is an estimate of the time required to perform the entire collection of residence-setting operations.

TestObjectResidenceXXX returns the number of objects placed successfully and generates errors in the same manner as ObjectResidenceXXX.

Note that if TestObjectResidenceXXX returns successfully, a subsequent call to ObjectResidenceXXX using the same n, type, id, policy, and pool arguments is guaranteed to succeed, provided that none of the attributes of the specified objects have been changed in the meantime. (For example, changing a texture's minification filter from LINEAR to LINEAR_MIPMAP_LINEAR would require more mipmap levels to be resident.)


GLsizei
glTestWorstCaseObjectResidenceXXX(
    GLsizei n,
    const GLenum *type,
    const GLuint *id,
    const GLenum *policy,
    const GLenum *pool,
    GLenum *assignedPool,
    GLfloat *estimatedTime
    )
    

Whereas the time estimates returned by TestObjectResidenceXXX are based on the amount of data transfer required to convert the current state of all memory pools into the desired state, the time estimates returned by TestWorstCaseObjectResidenceXXX are based on the amount of data transfer that would be required in the worst case. Worst-case time estimates are appropriate for applications that need to:

  1. Guarantee a minimum frame rate. The best-guess estimates (from TestObjectResidenceXXX) are intended to be reasonably accurate, but may be too optimistic in some cases, which can lead to dropped frames.
  2. Look ahead, though a series of memory management operations, in order to schedule rendering for one or more full frames. Basing the time estimates on the current state of the memory pools would be inaccurate, since that state will almost certainly not obtain when the memory management commands are actually executed. Furthermore, more efficient scheduling heuristics may be used when the time estimates are constant, rather than dependent on the sequence of previous memory management operations. Using the worst-case time estimates is one way to ensure that they are constant.

Memory Management Algorithm

Obviously the results returned by TestObjectResidenceXXX would be worthless if we didn't guarantee that ObjectResidenceXXX and TestObjectResidenceXXX use the same memory management algorithm. So this proposal requires driver implementors to make that guarantee.

Another tricky area concerns memory pools. It's not guaranteed that all implementations will offer the same pools; there's too much variation in hardware and in drivers. So we must specify the behavior carefully. We propose the following constraints:

  1. All drivers must provide the MEMORY_ONBOARD_XXX and MEMORY_HOST_XXX pools. These should map to the highest-performance and highest-capacity physical memory pools, respectively, even if there is no actual onboard memory. (For example, on systems that support only texturing from AGP memory, MEMORY_ONBOARD_XXX will refer to AGP memory.) The host memory pool must support all object types. The onboard memory pool must support at least textures (for upward compatibility with existing OpenGL semantics). It may also support other types, and in fact this is encouraged wherever a performance advantage can be gained.
  2. For other memory pools, the driver decides which object types will be supported in which pools. If the app requests a pool/object combination that isn't supported, the behavior depends on the requested memory management policy. POLICY_LOADED_XXX and POLICY_PINNED_XXX policies should cause the request to fail (OUT_OF_MEMORY). POLICY_CACHED_XXX should cause the driver to treat the request as if MEMORY_ONBOARD_XXX had been specified. This ensures that the applications wishes are respected as much as is possible under the circumstances.
  3. To smooth the way for future extensions that add new memory pools, if the app issues a request for a memory pool that the driver doesn't recognize, the driver should treat the request as if it had been for a memory pool that it recognizes but doesn't support. (As detailed immediately above.)

We have chosen not to apply the priority concept (as used for texture priorities) to all other objects. Instead, we require that drivers use the order in which objects are specified (in the arguments to ObjectResidenceXXX) to determine priority. The first object in the id array is first priority; the next object is second priority; and so on. Drivers are free to examine the entire list of objects before allocating memory, and in fact we urge driver developers to do so (because that can prevent cached objects from being flushed out of a memory pool just before the driver discovers that they're still needed). However, we require that no lower-priority object can prevent a higher-priority object from being loaded into the desired pool. Note that it is still possible for a lower-priority object to be loaded even after a higher-priority object failed to load (e.g., because of fragmentation); this can be detected by examining the estimated load times returned by TestObjectResidenceXXX.

With regard to time estimates, drivers should observe the following guidelines:

  1. If an object is already resident in the requested pool, the estimated time to load the object is zero. There is no need to account for overhead.
  2. If an object is not resident in the requested pool, but enough free space can be created to place the object in the pool, then the estimated time is the sum of three components: The time required to move objects in the pool to coalesce free space (defragmentation), the time required to moved cached objects to a different pool in order to create new free space (unloading), and the time required to move the object from its current residence to the new residence (loading).
  3. Time estimates should be computed by dividing the size of the data to be moved by the transfer rate between the memory pools involved. More accuracy is better, of course, but a rough conservative approximation is good enough.
  4. For TestWorstCaseObjectResidenceXXX, the estimated time is as described above, plus the time required to copy the entire target memory pool to host memory. This is added only once per TestWorstCaseObjectResidenceXXX command, not once per object!
  5. If there is not enough space in the requested pool to fit an object with policy POLICY_LOADED_XXX or POLICY_PINNED_XXX, then the estimated time is -1.0.

Just to emphasize: TestObjectResidenceXXX is intended to estimate the time required to change the current memory state to the one desired by the application. TestWorstCaseObjectResidenceXXX is intended to estimate the maximum time required to change any memory state to the one desired by the application.

Examples

A basic example of do-it-yourself texture management -- forcing two textures to be loaded and pinned for multitextured rendering, and later forcing one of them to return to host memory. (If the texture isn't needed any longer, it could be deleted rather than being forced back to the host.)


// Flush any pending errors, if necessary:
while (glGetError() != GL_NO_ERROR)
	;

// Create a couple of textures and set parameters:
GLuint id[2];
glGenTextures(2, id);
glBindTexture(GL_TEXTURE_2D, id[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_RGB,
	GL_UNSIGNED_BYTE, texels);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
...and so on for id[1]

// Load the textures and pin them down:
{
GLenum typeTex[] = {GL_OBJECT_TEXTURE0_XXX, GL_OBJECT_TEXTURE1_XXX};
GLenum policyPin[] = {GL_POLICY_PINNED_XXX, GL_POLICY_PINNED_XXX};
GLenum poolOnboard[] = {GL_MEMORY_ONBOARD_XXX, GL_MEMORY_ONBOARD_XXX};
glObjectResidenceXXX(2, typeTex, id, policyPin, poolOnboard);
}

if (glGetError() != GL_NO_ERROR)
	// Didn't fit...choose a fallback strategy or give up

// Do some rendering now...

// Unpin a texture and move it back to the host:
{
GLenum typeTex[] = {GL_OBJECT_TEXTURE1_XXX};
GLenum policyLoad[] = {GL_POLICY_LOADED_XXX};
GLenum poolHost[] = {GL_MEMORY_HOST_XXX};
glObjectResidenceXXX(1, typeTex, &id[1], policyLoad, poolHost);
}
    

A wrapper function to force a texture to be resident, and then bind it. (This is convenient for applications that want a simpler usage model, in which long-lived textures are preloaded to minimize fragmentation and short-lived textures are created and deleted in the remaining texture memory.)


bool
BindTextureResident(GLenum target, GLuint texID) {
	GLenum type = GL_OBJECT_TEXTURE_XXX;	// currently-active tex unit
	GLenum policy = GL_POLICY_PINNED_XXX;
	GLenum pool = GL_MEMORY_ONBOARD_XXX;
	if (glObjectResidenceXXX(1, &type, &texID, &policy, &pool) != 1)
		return false;	// use glGetError to fetch error code
	glBindTexture(target, texID);
	return true;
}
    

Issues

Volatile Objects

A shortcoming in older versions of the Win9x product line occasionally causes the contents of onboard memory to be destroyed without notifying the driver in time for the contents to be saved. Since OpenGL textures must persist until explicitly deleted by the app, this forces drivers to make a safe copy of each texture as it's loaded for the first time (or modified by TexSubImage2D, among other things). This degrades performance and increases memory footprint substantially, especially for dynamic textures that are generated every frame.

Recent versions of Win98 have a driver-notification callback that eliminates the problem. However, if it's deemed critical to address the older versions of Windows, or if we find it advantageous in other situations (e.g. video textures, large volume textures), we might choose to add one more memory management policy:


#define GL_POLICY_VOLATILE_XXX		...
    

VOLATILE means that the object may be destroyed at any time. There is no explicit notification; the app may discover that the object has been destroyed only by attempting to change its storage policy and generating a GL error because the ID is no longer valid.

We have chosen not to include this in the current proposal because it needs further thought. There are several applications in which volatile textures might be used, but it's possible that they would be better-served by slightly different policies.

Worst-Case Time Estimates May Be Too Extreme in Practice

This proposal mentions using time estimates for scheduling rendering operations. A full consideration of this subject would distract us from the memory management discussion. However, it's worth mentioning that the worst-case time estimates proposed here might turn out to be too pessimistic to be useful in practice.

If this turns out to be true, then a more complex design may be needed. One approach is to encapsulate memory-management state in memory-management objects. One such object would be passed to a time-estimation command to represent the "initial" state; another such object would be produced by the command and would represent the "final" state after the memory-management operation is complete. This would allow more accurate estimation and would support a variety of search strategies. It would not, however, permit the use of heuristics that assume constant state-transition costs. The current formulation permits them.

Aside from the problem with scheduling heuristics, we have decided not to propose such a design here because it may be better to consider it as part of an extension for OpenGL state objects.

Other Managed Objects Need IDs Accessible from the OpenGL Core

We know of systems in which rendering targets (back buffers, pbuffers, etc.) compete with textures and other objects for the same memory pool. The extension discussed in this proposal is intended to be powerful enough to deal with such systems, but at the moment there is no way to refer to the rendering targets by identification number from within the OpenGL core. (They are typically referenced only by a window-system-specific handle of some kind in the OpenGL window-system binding.) This needs to be addressed in some way, not only for memory management, but also for new features that might render to textures or to buffers containing geometry.

Concurrency Risks

Suppose a multithreaded app performs a residency test and concludes that the working set of textures needed for the next frame will not fit in high-speed texture memory. It might reset the minimum LOD parameter for a mipmapped texture in order to reduce the amount of memory required by that texture, and perform a new residency test.

Unfortunately, if the texture is being used for rendering by another thread, a race condition arises.

Currently we believe this is a case of "Doctor, it hurts when I do this." Multithreaded apps should interlock changes to shared objects such as textures to avoid the problem.

Selecting Memory Pools using Access Characteristics

Eventually we will face new hardware with memory pools that differ greatly from those we currently regard as standard. Although the memory management algorithm described in this proposal will substitute pools when necessary, it may be desirable to help applications choose memory pools more directly.

One solution might be to have the app characterize the accesses that will be made to the object (e.g. read, modify, and write by the host CPU and by the graphics accelerator), and provide a query function to map that access information into the enumerant for an appropriate memory pool. The NV_vertex_array_range extension uses a similar technique for the wglAllocateMemoryNV function.

Texture (and Other) Priorities

This proposal hasn't specified the role of priorities in the cache victim-selection process, but it might be wise to do so.

With little effort texture priorities could still be used for textures, but a new priority mechanism would be needed to cover the other types of objects. Perhaps we should add a general priority-setting function for all object types, and allow texture priority to be set through that function as well as the current mechanism.

We have avoided the issue for the moment because it isn't clear that apps really want to use a priority mechanism anyway.

Minor API Design Issues

For convenience, it may be desirable to have both scalar and vector forms of the residence-setting and residence-testing commands.

An earlier proposal suggested that TestObjectResidenceXXX simply return a single time estimate for performing the entire array of object residence changes. We have chosen to return an array of estimates so that the app can determine which residence-change commands could not be honored. With a single return value, the app would have to binary-search to determine which commands failed.

We have elected to create a single interface that applies to all objects, hopefully making it clear that memory may be traded off between objects of various types. However, it should be noted that there are object-specific characteristics not controlled by this new API that nevertheless affect memory management. Texture filtering methods and min/max LOD for mipmap pyramids, for example.

Notes to Hardware Designers

Much of this proposal deals with issues that were addressed decades ago in the design of memory management for general-purpose CPUs. Borrowing concepts from those main-memory subsystems might make graphics programming simpler and more efficient.

Probably the first priority is to build systems with fewer memory fragmentation problems. Allocating memory in fixed-size pages (even if they're not automatically swapped between levels in the memory hierarchy) would be a big help.

Note that it you choose to support automatic paging, it would still be necessary to allow objects to be pinned in memory, so that performance is predictable in real-time applications.

With respect to optimizing rendering order, making state-change costs independent of past history is a huge win. Most scheduling problems are NP-complete, but there are good heuristics for many cases in which state-change costs are constant and predictable. There are few good heuristics when state-change cost depends on all previous state changes (as is currently the case when managing textures). Paged memories help this as well. There may be some waste of memory due to internal fragmentation (unused space within a page), but paging eliminates the need for objects to be physically contiguous, thus avoiding external fragmentation and making the time to load an object independent of the arrangement of the other objects in the memory pool.

Followup

We expect this proposal will generate debate, and modifications will be required before reaching consensus.

We suggest that the next round of comment take place on the opengl-participants@corp.sgi.com mailing list.

Afterwards, a complete extension specification can be developed and reviewed by the ARB or a subgroup of vendors if there is sufficient interest.

Acknowledgements

Brian Paul proposed the current form of the API. Gareth Hughes independently developed many of the concepts discussed in this document during his investigation of texture priorities.

Remi Arnaud, Avi Bar-Zeev, Michael Jones, Craig Phillips, and Hansong Zhang greatly influenced the author's thinking about state-change optimization techniques.

Chris Hecker, Dave Moore, and Tim Sweeney contributed requirements from a (possibly the) key application area, game development.

Change Log

VersionChanges
Draft 2 Made ObjectResidenceXXX, TestObjectResidenceXXX, and TestWorstCaseObjectResidenceXXX return the number of objects successfully placed as requested by the app. This makes it unnecessary to call GetError after each command, and simplifies the application's logic a little.

Clarified that memory may be defragmented and cached objects may be flushed as needed to make space for a new object. This affects the description of time estimates.

Emphasized that TestObjectResidenceXXX makes time estimates with respect to the current memory state, while TestWorstCaseObjectResidenceXXX makes time estimates that shouldn't be exceeded for any memory state.

Noted that if TestObjectResidenceXXX succeeds, then ObjectResidenceXXX with the same arguments will also succeed, unless some attributes of the objects have changed between the two commands.

Added OBJECT_TEXTURE_XXX object type to make object management simpler when multitexturing. Added an example using this to show how Tim Sweeney's proposed BindTextureResident command could be implemented.
Version 1 Changed release date and version for general distribution; otherwise same as Draft 2.