User:ElFarto/OpenGLBM

From OpenGL.org
Jump to: navigation, search

This is my idea for an new OpenGL API. It's not real, nor is it ever likely to be.
OpenGL Bare Metal is designed to be the thinnest possible layer, while allowing you to fill a GPU's memory and command buffer in a platform independent way. There is no hand holding, it will not go out of it's way to stop you making a mistake or shooting yourself in the foot.
The Playstation 3's libGCM, NVIDIA's bindless extension and the ATI/AMD GPU reference documents have all served as inspiration for this (but mostly libgcm).
Not everything is described on this page, the reader is assumed to have some knowledge of OpenGL and GPUs, and can fill in the blanks.

Design Philosophy

The API is designed to be as thin as possible. It is designed to minimize the amount of information the GPU's driver has to keep around, therefore there are few objects (mainly for sync purposes), the user is expected to keep track of all this information. This simplifies driver development, and helps performance by attempting to avoid cache-misses when binding objects (since the application is in control, it can prefetch upcoming information, where-as the driver can't).

The API is only designed for OpenGL 3.x+/DirectX 10 hardware.

The API gives direct access to the GPU's memory, albeit in an indirect way. Memory addresses are treated as just a regular 64-bit integer, with it starting at 0 and containing BM_MEMORY_SIZE bytes. It was originally designed to have the entire GPU memory mapped into the application, however there are 2 issues with this:

  1. 32-bit applications on Windows are limited to 2GiB of usable address space. With GPUs coming with > 1GiB of RAM now, this would leave little for the actual application.
  2. The GPU doesn't know where it's memory is allocated in the application, therefore requiring constant conversion by the application between the mapped address, and the 'local' address (that is, local to the GPU).

The current design side steps both of these problems, and still allows 64-bit applications to map the whole address space if they wish.

There is still an issue of where in memory things can be stored. I've seen hints in the GL_ATI_meminfo extension, that memory is broken up into different regions (VBO, texture, renderbuffer). I'm not sure if this is a limitation of the hardware, or just how the driver manages the memory. For now I'm assuming that any data can be placed in any part of RAM without any problems.

Contexts are designed to be per-process, rather than the per-thread of OpenGL. However some concept of multiple command buffers, or optional per-thread command buffers is likely to be included.

System Properties

 int64 memorySize = bmGetInteger64(BM_MEMORY_SIZE);

This method will return the total amount of RAM you may use, in bytes.

Textures

 int maxTextureUnits = bmGetInteger(BM_TEXTURE_UNITS);

Returns the amount of texture units. The texture unit range is from 0 to BM_TEXTURE_UNITS-1.

 //textures
 bmTextureParameters(unit, format, layout, remap, mipmap, dimensions, width, height, depth, pitch, address);
 bmTextureAddressParameters(unit, xwrap, ywrap, zwrap, depthCompare);
 bmTextureFilterParameters(unit, minFilter, maxFilter, maxAnisotropy);

These 3 functions are used to configure a texture unit. Textures are assumed to be continuous in memory with the mipmaps directly (possibly with padding for alignment purposes) following the base layer.

Most of the parameters will make sense, except possibly for layout. There are currently 2 possible values for this:

  • LINEAR - The texture is laid out as follows: RGBARGBARGBA... from left-to-right top-to-bottom.
  • SWIZZLED - For a description of this, see here. I'm not sure it will be called swizzled, since it conflicts with the other texture swizzling.

Dimensions can be 1D, 1D_ARRAY, 2D, 2D_ARRAY, 3D, CUBEMAP, CUBEMAP_ARRAY, RECTANGLE

TODO: anti-aliased textures

Surfaces

 bmColourSurface(index, type, format, width, height, antialiasing, address, pitch);
 bmDepthSurface(type, format, width, height, antialiasing, address, pitch);

These functions are used to configure colour and depth surfaces. These replace both FBOs and the platforms framebuffer.

 bmBlendMode(index, enabled, rgbEquation, srcRGB, dstRGB, alphaEquation, srcAlpha, dstAlpha);
 bmStencilOp(frontStencilFail, frontDepthFail, frontDepthPass, backStencilFail, backDepthFail, backDepthPass);
 bmStencilFunc(frontFunc, frontRef, frontMask, backFunc, backRef, backMask);
 
 bmDepthFunc(enabled, func);
 bmDepthRange(viewportIndex, near, far);
 
 bmAlphaFunc(enabled, func, ref);//does hardware still have this?

Blend, Stencil, Depth and Alpha functions. I've tried to condense all of the existing variants into a single call.

 //utility functions
 bmCalculateSize(format, layout, width, height, depth, mipmaps, antialias, *size, *pitch, *alignment); //do we need a separate function for depth?

This function allows the driver to help the application calculate the size, pitch and alignment of a specific surface or texture format.

 anti-aliasing levels (0 = 1 sample, 1 = 2 samples, 2 = 4 samples, 3 = 32x CSAA, etc...)

The idea here, is to replace the request for an amount of samples, with a list of different sampling modes the GPU is capable of. This allows for easy representation of any multi-sampling algorithm, including ones that require more than one parameter, NVIDIA's CSAA for example.


Vertex Arrays

 bmVertexAttrib(index, components, stride, type, normalise, divisor);
 bmVertexAttribI(index, components, stride, type, divisor);
 bmVertexAttribAddress(index, offset);
 bmElementArray(type, offset);
 bmEnableAttribs(bitmap); //0b101 enables attribute 0 and 2 only

These functions are based on NVIDIA's bindless extension.


Drawing

 bmDrawElements(primtype, count, instances, baseVertex, baseInstance);
 bmDrawArrays(primtype, count, instances, baseInstance);
 //indirect functions
 //transform feedback functions

Shaders

 BMShader sh = bmCompileShader(shaderType, sourceType, size, *string);
 bmGetShaderUCode(sh, &size, &ucodePtr);
 bmSetShader(type, offset);
 bmSetShaderData(type, index, offset);
 //functions for getting and setting uniforms
 bmGetUniformBlockIndexSize(sh, name, *index, *size);
 bmGetUniformIndexOffsetSize(sh, name, *index, *offset, *size);
 //functions for shader subroutines, if needed

Shaders are one of the very few objects to be created. This allows the compiled shader code, offset/size of uniforms and attribute indexes to be retrieved.

Shaders access an array of slabs of memory for their uniforms, which can be set via the bmSetShaderData method.

Memory

 bmCopyToGPU(void *src, size, int64 dst); //+async version
 bmCopyFromGPU(int64 src, size, void *dst); //+async version
 bmCopy(int64 src, size, int64 dst); //gpu-to-gpu version
 void* bmMapMemory(int64 mem, size); //+flags
 bmUnmapMemory(void* addr);
 //flush commands

The copy functions are the main way data is transferred to the GPU. It's hoped that the async versions of these function will allow the driver to setup a DMA copy. The async versions would also return a sync object that allows the application to wait for completion.

Misc

 clear colour/depth/stencil functions
 viewport functions //+indexed versions
 scissor functions
 
 provoking vertex
 
 flip page function
 alpha-to-coverage functions
 
 point-sprites?

Nothing particually interesting here, these are mostly copied from OpenGL. The flip page function would allow the application to flip the buffers without waiting for a vsync.

Sync

 gpu wait command
 cpu wait command
 vsync wait
 flush
 
 vertex/texture cache invalidation

Mostly the same as OpenGL, but with the ability to wait for a vsync.

Timers

Functions for timing things on the GPU.

Multisampling

Functions to control multi-sampling, resolving, enabled, disabled, sample locations maybe.

Transform feedback

 bmSetTransformFeedbackBuffer(index, address, length);
 bmSetTransformFeedbackVaryings(index, count, *varyings, mode);
 bmBeginTransformFeedback(primMode);
 bmEndTransformFeedback();

Functions to control transform feedback

Tessalation

Functions to control tessalation

Queries

 conditional rendering

Command Buffers

 BMCommandBuffer* bmCreateCommandBuffer();
 bmDeleteCommandBuffer(cmdBuf); //cmdBuf can't be NULL
 bmSetThreadCommandBuffer(cmdBuf); //if cmdBuf == NULL, the default command buffer is used
 bmExecuteCommandBuffer(cmdBuf); //places the contents of cmdBuf into the current command buffer, cmdBuf can't be NULL
 bmClearCommandBuffer(cmdBuf); //removes all the commands from the command buffer, cmdBuf can't be NULL

Command buffers allow commands to be built up before they are sent to the GPU. They can be used to build up frequent commands and insert them into the current command buffer to help with performance.

Any command that would normally send a command to the GPU, will instead append the command to the command buffer currently bound to the thread.

The driver is free to chose how much information is stored in the command buffer, either from a high level "this function was called with these params", or a low level "send this command to the GPU".