Most current graphics hardware includes onboard high-performance memory for storing textures, geometric primitives, and other data used for rendering.
OpenGL provides a few simple mechanisms for controlling use of this memory: creation/deletion/clobber of offscreen rendering targets (pbuffers), prioritization of textures, prioritization of display lists (on some implementations), and proxy texture queries (to determine if a single texture can be loaded into high-speed texture memory).
OpenGL drivers are expected to handle most memory management automatically, using only the clues provided by those mechanisms, and without exposing the internal organization of graphics memory to the application. The justification for this design is the large variation in hardware: differing numbers of physically separate memories; single-banked vs. multi-banked memory; differing allocation granularities; differing allocation alignment constraints; differing internal formats for textures and pixel arrays; etc. Exposing all these differences would lead inevitably to fragile, nonportable application code.
However, both applications and hardware have evolved since the current memory management mechanisms were added to OpenGL. Those mechanisms now exhibit practical shortcomings, including:
TexSubImage2D (or equivalent).
This has several disadvantages:
apps can't readily determine how many dummy textures of which
sizes and internal formats will fit;
multiple mipmapped subtextures can't be packed into a single
dummy texture (without other workarounds that are too complex
to discuss here, and have disadvantages of their own);
and extra overhead is incurred on some machines
that must reformat textures as they're loaded.
We can provide new mechanisms to mitigate those problems. In addition, there are new opportunities we might wish to address:
We'll provide resource virtualization for a single rendering context (and all the contexts in a share group), but not for multiple independent contexts. This is OpenGL's current behavior.
The implication is that we're optimizing for the case of a single application. This simplifies the application's view of the memory-management model (e.g., memory isn't consumed by any party other than the application), and is the most appropriate choice for real-time or interactive applications.
We'll continue to leave the lowest-level details of memory management to the driver, so that we can accommodate unusual hardware (and leave designers free to create unusual hardware!). However, we will provide the application with much more control over memory-management behavior than it has today.
Until now we have operated under a very simple assumption about memory allocation pools: There is host memory, and there is an object-specific memory for each object type (e.g. texture memory and display list memory). The object-specific memories are decoupled; for example, creating millions of display lists must not affect the amount of memory available for storing texture images. We must generalize this in three ways.
Note that only the driver has enough information to implement the proper hierarchy for each type of object. For example, there are some PC systems in which onboard memory is the preferred location for textures and AGP memory is available for overflow; other PC systems in which AGP memory is the primary location for textures; and still other PC systems in which AGP memory is completely unavailable.
As a start, consider the following list of memory pools.
(XXX represents an extension suffix yet to be
chosen.)
#define GL_MEMORY_ONBOARD_XXX 0x00000001 #define GL_MEMORY_AGP_XXX 0x00000002 #define GL_MEMORY_HOST_XXX 0x00000004
The MEMORY_ONBOARD_XXX pool represents high-performance
local graphics memory; it may be logically or physically
partitioned so that some portions are reserved for objects
of a specific type, or it may be a single shared pool.
The MEMORY_AGP_XXX pool is reserved for hardware that
supports the AGP standard.
Finally, there is the MEMORY_HOST_XXX pool
which represents main memory.
We have expressed these enumerants as bits in a bitmask, in case some future memory management policy is capable of supporting residence in more than one pool simultaneously.
Future extensions may add to this list.
Currently we have multiple namespaces for OpenGL objects: a given integer value might be used as the name for both a texture and a display list, and would be interpreted according to the context in which it's used. Since we are now considering an API in which objects of different types appear in a single context, we must use object types to disambiguate identification numbers.
Textures introduce one further complication. Some systems require multitextures to reside in distinct banks of memory. These banks are not hierarchical in the sense of the memory pools discussed above; but for purposes of memory management they must somehow be taken into account. For this proposal we have chosen to encode information about the use of the texture into the object type.
#define GL_OBJECT_TEXTURE_XXX ... // Currently active texture #define GL_OBJECT_TEXTURE0_XXX ... // Texture on unit 0 #define GL_OBJECT_TEXTURE1_XXX ... // Texture on unit 1, and so on ... #define GL_OBJECT_TEXTURE31_XXX ... #define GL_OBJECT_DISPLAY_LIST_XXX ...
Extensions for features such as vertex array objects would add to this list in the obvious way.
Offscreen rendering targets are also candidates for memory management. Currently they are not visible to the OpenGL core; they have no identification numbers analogous to texture object IDs or display list IDs. This requires further thought, not only for memory management, but also for extensions such as render-to-texture.
Existing OpenGL memory management mechanisms apply to single objects -- loading a texture, querying a proxy texture, etc. In order to handle cases where several objects need to be coresident to complete a single drawing operation (e.g. multitexturing), we must be able to handle groups of objects. In general, a group must consist of an array of object ID numbers and a corresponding array of object types.
Texture priorities are not quite sufficient to express the texture residence policies most applications need. They fall short in that the priority semantics aren't ironclad; neither consistent across implementations, nor guaranteed for a single implementation.
Rather than attempt to enforce new semantics for texture priorities, it makes more sense to enumerate object residence policies that cover most application needs, and then allow them to be applied to any managed object. For example, it should be possible to pin objects (preventing them from being moved), cache them, etc. And it should be possible to force them to move from one memory pool to another in a reliable way.
We propose the following set of policies:
#define GL_POLICY_PINNED_XXX ... #define GL_POLICY_LOADED_XXX ... #define GL_POLICY_CACHED_XXX ...
POLICY_PINNED_XXX means that the object,
once loaded into the desired memory pool, will be neither
unloaded nor moved by the memory manager.
(It may be unloaded, moved, or deleted explicitly by the
application.)
This is intended to give the application enough control over
memory layout to prevent fragmentation (long-lived objects can
be loaded first) and to prevent unpredictable performance hits
due to defragmentation by the memory manager.
POLICY_LOADED_XXX means that the object, once loaded,
will not be unloaded by the memory manager. It may, however, be
moved (usually during defragmentation). This gives apps the
ability to trade off some defragmentation delays for improved
memory utilization.
POLICY_CACHED_XXX means that the app prefers that the
object be loaded into the desired memory pool, but the object may be
shifted down the memory hierarchy to make room for an object
that uses the POLICY_LOADED_XXX or
POLICY_PINNED_XXX policies.
This is much like the standard OpenGL texture management
mechanism.
The following command changes the residence or memory management policy associated with each of a set of objects:
GLsizei glObjectResidenceXXX( GLsizei n, const GLenum *type, const GLuint *id, const GLenum *policy, const GLenum *pool )
Each of the arrays type, id,
policy, and pool must contain
n elements.
The corresponding elements of type and id
specify an object. The corresponding element of policy
specifies a memory management policy for that object. The
corresponding element of pool specifies a memory
pool for that object.
ObjectResidenceXXX forces the objects to be
managed according to the specified policy, and forces them
to be placed in the specified memory pool (or another in the
appropriate hierarchy, if the policy is
POLICY_CACHED_XXX).
If ObjectResidenceXXX can set all the residencies
and policies as requested, its return value is equal to n
and it does not set a GL error.
Otherwise, the return value is equal to the number of objects
that were managed as requested, and a GL error is set as follows:
Error Failure Conditions INVALID_ENUMAn element of typeorpolicywas unrecognized or impermissible in context.INVALID_VALUEAn element of idis not currently the identifier of an object of the appropriate type.OUT_OF_MEMORYOne or more objects could not be placed as specified by the app. This could be because an individual object is too large for its chosen memory pool, or because too much space in the pool is already occupied by objects with the POLICY_PINNED_XXXorPOLICY_LOADED_XXXpolicies.
See Memory Management Algorithm for further discussion of the memory management behavior on which apps can rely.
GLsizei glTestObjectResidenceXXX( GLsizei n, const GLenum *type, const GLuint *id, const GLenum *policy, const GLenum *pool, GLenum *assignedPool, GLfloat *estimatedTime )
TestObjectResidenceXXX performs a trial run of
the memory management operations needed to honor an
object residence-setting command.
The arguments n, id, policy,
and pool have the same meanings as the identically-named
arguments for ObjectResidenceXXX.
The elements of the argument assignedPool will be
set to the memory pool assigned to the corresponding objects.
This is most obviously relevant for objects with the
POLICY_CACHED_XXX policy, which might be assigned to any of
several pools in the hierarchy for the appropriate object
type.
However, it is also relevant in more subtle circumstances as
discussed below (Memory Management Algorithm).
The elements of the argument estimatedTime will
be set to an estimate of the number of seconds required to
honor the residence-setting command for the corresponding
object. If the command for a given object cannot be honored,
then the corresponding element of estimatedTime
will be set to -1.0. Thus, if all the values returned in
estimatedTime are nonnegative, their sum is
an estimate of the time required to perform the entire collection
of residence-setting operations.
TestObjectResidenceXXX returns the number of
objects placed successfully and generates errors in
the same manner as ObjectResidenceXXX.
Note that if TestObjectResidenceXXX returns successfully,
a subsequent call to ObjectResidenceXXX using the
same n, type, id,
policy, and pool arguments is guaranteed
to succeed, provided that none of the attributes of the
specified objects have been changed in the meantime.
(For example, changing a texture's minification filter from
LINEAR to LINEAR_MIPMAP_LINEAR would
require more mipmap levels to be resident.)
GLsizei glTestWorstCaseObjectResidenceXXX( GLsizei n, const GLenum *type, const GLuint *id, const GLenum *policy, const GLenum *pool, GLenum *assignedPool, GLfloat *estimatedTime )
Whereas the time estimates returned by
TestObjectResidenceXXX are based on the amount
of data transfer required to convert the current state of all
memory pools into the desired state,
the time estimates returned by
TestWorstCaseObjectResidenceXXX
are based on the amount of data transfer that would be required
in the worst case.
Worst-case time estimates are appropriate for applications that
need to:
TestObjectResidenceXXX)
are intended to be reasonably accurate, but may be
too optimistic in some cases, which can lead to dropped
frames.
Obviously the results returned by TestObjectResidenceXXX
would be worthless if we didn't guarantee that
ObjectResidenceXXX and
TestObjectResidenceXXX use the same memory management
algorithm.
So this proposal requires driver implementors to make that
guarantee.
Another tricky area concerns memory pools. It's not guaranteed that all implementations will offer the same pools; there's too much variation in hardware and in drivers. So we must specify the behavior carefully. We propose the following constraints:
MEMORY_ONBOARD_XXX
and MEMORY_HOST_XXX pools.
These should map to the highest-performance
and highest-capacity physical memory pools, respectively,
even if there is no actual onboard memory.
(For example, on systems that support only texturing from
AGP memory, MEMORY_ONBOARD_XXX will refer to
AGP memory.)
The host memory pool must support all object types.
The onboard memory pool must support at least textures
(for upward compatibility with existing OpenGL semantics).
It may also support other types, and in fact this
is encouraged wherever a performance advantage can be
gained.
POLICY_LOADED_XXX
and POLICY_PINNED_XXX policies should cause
the request to fail (OUT_OF_MEMORY).
POLICY_CACHED_XXX should cause the driver to treat the
request as if MEMORY_ONBOARD_XXX had been specified.
This ensures that the applications wishes are respected
as much as is possible under the circumstances.
We have chosen not to apply the priority concept
(as used for texture priorities) to all other objects.
Instead, we require that drivers use the order in which objects
are specified (in the arguments to ObjectResidenceXXX)
to determine priority.
The first object in the id array is first priority;
the next object is second priority; and so on.
Drivers are free to examine the entire list of objects before
allocating memory, and in fact we urge driver developers to do so
(because that can prevent cached objects from being flushed out of
a memory pool just before the driver discovers that they're still
needed).
However, we require that no lower-priority object can prevent a
higher-priority object from being loaded into the desired pool.
Note that it is still possible for a lower-priority object to be loaded
even after a higher-priority object failed to load (e.g., because of
fragmentation); this can be detected by examining the estimated
load times returned by TestObjectResidenceXXX.
With regard to time estimates, drivers should observe the following guidelines:
TestWorstCaseObjectResidenceXXX, the
estimated time is as described above, plus the time required
to copy the entire target memory pool to host memory.
This is added only once per
TestWorstCaseObjectResidenceXXX command,
not once per object!
POLICY_LOADED_XXX or
POLICY_PINNED_XXX, then the estimated time
is -1.0.
Just to emphasize: TestObjectResidenceXXX is intended
to estimate the time required to change the current
memory state to the one desired by the application.
TestWorstCaseObjectResidenceXXX is intended to
estimate the maximum time required to change any memory
state to the one desired by the application.
A basic example of do-it-yourself texture management -- forcing two textures to be loaded and pinned for multitextured rendering, and later forcing one of them to return to host memory. (If the texture isn't needed any longer, it could be deleted rather than being forced back to the host.)
// Flush any pending errors, if necessary: while (glGetError() != GL_NO_ERROR) ; // Create a couple of textures and set parameters: GLuint id[2]; glGenTextures(2, id); glBindTexture(GL_TEXTURE_2D, id[0]); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, texels); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); ...and so on for id[1] // Load the textures and pin them down: { GLenum typeTex[] = {GL_OBJECT_TEXTURE0_XXX, GL_OBJECT_TEXTURE1_XXX}; GLenum policyPin[] = {GL_POLICY_PINNED_XXX, GL_POLICY_PINNED_XXX}; GLenum poolOnboard[] = {GL_MEMORY_ONBOARD_XXX, GL_MEMORY_ONBOARD_XXX}; glObjectResidenceXXX(2, typeTex, id, policyPin, poolOnboard); } if (glGetError() != GL_NO_ERROR) // Didn't fit...choose a fallback strategy or give up // Do some rendering now... // Unpin a texture and move it back to the host: { GLenum typeTex[] = {GL_OBJECT_TEXTURE1_XXX}; GLenum policyLoad[] = {GL_POLICY_LOADED_XXX}; GLenum poolHost[] = {GL_MEMORY_HOST_XXX}; glObjectResidenceXXX(1, typeTex, &id[1], policyLoad, poolHost); }
A wrapper function to force a texture to be resident, and then bind it. (This is convenient for applications that want a simpler usage model, in which long-lived textures are preloaded to minimize fragmentation and short-lived textures are created and deleted in the remaining texture memory.)
bool BindTextureResident(GLenum target, GLuint texID) { GLenum type = GL_OBJECT_TEXTURE_XXX; // currently-active tex unit GLenum policy = GL_POLICY_PINNED_XXX; GLenum pool = GL_MEMORY_ONBOARD_XXX; if (glObjectResidenceXXX(1, &type, &texID, &policy, &pool) != 1) return false; // use glGetError to fetch error code glBindTexture(target, texID); return true; }
A shortcoming in older versions of the Win9x product line occasionally
causes the contents of onboard memory to be destroyed without
notifying the driver in time for the contents to be saved.
Since OpenGL textures must persist until explicitly deleted
by the app, this forces drivers to make a safe copy of each
texture as it's loaded for the first time (or modified
by TexSubImage2D, among other things).
This degrades performance and increases memory footprint
substantially, especially for dynamic textures that are
generated every frame.
Recent versions of Win98 have a driver-notification callback that eliminates the problem. However, if it's deemed critical to address the older versions of Windows, or if we find it advantageous in other situations (e.g. video textures, large volume textures), we might choose to add one more memory management policy:
#define GL_POLICY_VOLATILE_XXX ...
VOLATILE means that the object may be destroyed at
any time. There is no explicit notification; the app may discover
that the object has been destroyed only by attempting to change
its storage policy and generating a GL error because the ID is no
longer valid.
We have chosen not to include this in the current proposal because it needs further thought. There are several applications in which volatile textures might be used, but it's possible that they would be better-served by slightly different policies.
This proposal mentions using time estimates for scheduling rendering operations. A full consideration of this subject would distract us from the memory management discussion. However, it's worth mentioning that the worst-case time estimates proposed here might turn out to be too pessimistic to be useful in practice.
If this turns out to be true, then a more complex design may be needed. One approach is to encapsulate memory-management state in memory-management objects. One such object would be passed to a time-estimation command to represent the "initial" state; another such object would be produced by the command and would represent the "final" state after the memory-management operation is complete. This would allow more accurate estimation and would support a variety of search strategies. It would not, however, permit the use of heuristics that assume constant state-transition costs. The current formulation permits them.
Aside from the problem with scheduling heuristics, we have decided not to propose such a design here because it may be better to consider it as part of an extension for OpenGL state objects.
We know of systems in which rendering targets (back buffers, pbuffers, etc.) compete with textures and other objects for the same memory pool. The extension discussed in this proposal is intended to be powerful enough to deal with such systems, but at the moment there is no way to refer to the rendering targets by identification number from within the OpenGL core. (They are typically referenced only by a window-system-specific handle of some kind in the OpenGL window-system binding.) This needs to be addressed in some way, not only for memory management, but also for new features that might render to textures or to buffers containing geometry.
Suppose a multithreaded app performs a residency test and concludes that the working set of textures needed for the next frame will not fit in high-speed texture memory. It might reset the minimum LOD parameter for a mipmapped texture in order to reduce the amount of memory required by that texture, and perform a new residency test.
Unfortunately, if the texture is being used for rendering by another thread, a race condition arises.
Currently we believe this is a case of "Doctor, it hurts when I do this." Multithreaded apps should interlock changes to shared objects such as textures to avoid the problem.
Eventually we will face new hardware with memory pools that differ greatly from those we currently regard as standard. Although the memory management algorithm described in this proposal will substitute pools when necessary, it may be desirable to help applications choose memory pools more directly.
One solution might be to have the app characterize the accesses
that will be made to the object (e.g. read, modify, and write by
the host CPU and by the graphics accelerator), and provide
a query function to map that access information into the
enumerant for an appropriate memory pool.
The NV_vertex_array_range extension
uses a similar technique for the wglAllocateMemoryNV
function.
This proposal hasn't specified the role of priorities in the cache victim-selection process, but it might be wise to do so.
With little effort texture priorities could still be used for textures, but a new priority mechanism would be needed to cover the other types of objects. Perhaps we should add a general priority-setting function for all object types, and allow texture priority to be set through that function as well as the current mechanism.
We have avoided the issue for the moment because it isn't clear that apps really want to use a priority mechanism anyway.
For convenience, it may be desirable to have both scalar and vector forms of the residence-setting and residence-testing commands.
An earlier proposal suggested that TestObjectResidenceXXX
simply return a single time estimate for performing the entire
array of object residence changes.
We have chosen to return an array of estimates
so that the app can determine which
residence-change commands could not be honored.
With a single return value, the app would have to binary-search
to determine which commands failed.
We have elected to create a single interface that applies to all objects, hopefully making it clear that memory may be traded off between objects of various types. However, it should be noted that there are object-specific characteristics not controlled by this new API that nevertheless affect memory management. Texture filtering methods and min/max LOD for mipmap pyramids, for example.
Much of this proposal deals with issues that were addressed decades ago in the design of memory management for general-purpose CPUs. Borrowing concepts from those main-memory subsystems might make graphics programming simpler and more efficient.
Probably the first priority is to build systems with fewer memory fragmentation problems. Allocating memory in fixed-size pages (even if they're not automatically swapped between levels in the memory hierarchy) would be a big help.
Note that it you choose to support automatic paging, it would still be necessary to allow objects to be pinned in memory, so that performance is predictable in real-time applications.
With respect to optimizing rendering order, making state-change costs independent of past history is a huge win. Most scheduling problems are NP-complete, but there are good heuristics for many cases in which state-change costs are constant and predictable. There are few good heuristics when state-change cost depends on all previous state changes (as is currently the case when managing textures). Paged memories help this as well. There may be some waste of memory due to internal fragmentation (unused space within a page), but paging eliminates the need for objects to be physically contiguous, thus avoiding external fragmentation and making the time to load an object independent of the arrangement of the other objects in the memory pool.
We expect this proposal will generate debate, and modifications will be required before reaching consensus.
We suggest that the next round of comment take place on the opengl-participants@corp.sgi.com mailing list.
Afterwards, a complete extension specification can be developed and reviewed by the ARB or a subgroup of vendors if there is sufficient interest.
Brian Paul proposed the current form of the API. Gareth Hughes independently developed many of the concepts discussed in this document during his investigation of texture priorities.
Remi Arnaud, Avi Bar-Zeev, Michael Jones, Craig Phillips, and Hansong Zhang greatly influenced the author's thinking about state-change optimization techniques.
Chris Hecker, Dave Moore, and Tim Sweeney contributed requirements from a (possibly the) key application area, game development.
| Version | Changes |
|---|---|
| Draft 2 |
Made ObjectResidenceXXX,
TestObjectResidenceXXX, and
TestWorstCaseObjectResidenceXXX
return the number of objects successfully placed as requested
by the app.
This makes it unnecessary to call GetError after
each command, and simplifies the application's logic a little.
|
| Clarified that memory may be defragmented and cached objects may be flushed as needed to make space for a new object. This affects the description of time estimates. | |
Emphasized that TestObjectResidenceXXX makes
time estimates with respect to the current memory state,
while TestWorstCaseObjectResidenceXXX makes
time estimates that shouldn't be exceeded for any memory
state.
|
|
Noted that if TestObjectResidenceXXX
succeeds, then ObjectResidenceXXX with
the same arguments will also succeed, unless
some attributes of the objects have changed between the
two commands.
|
|
Added OBJECT_TEXTURE_XXX object type to make
object management simpler when multitexturing.
Added an example using this to show how Tim Sweeney's
proposed BindTextureResident command could be
implemented.
|
|
| Version 1 | Changed release date and version for general distribution; otherwise same as Draft 2. |