This is an idea I’ve been batting about for a while; can’t decide if it’s nifty or horrible. See what you think.
PROBLEM
Particles are being used more and more as hardware triangle throughput increases. Lots of things (clouds, smoke etc) can’t feasibly be done any other way.
Particle systems are usually rendered in a standard way; as textured quads aligned with the screen.
Under current OpenGL this is horribly inefficent. For each particle you’re sending 4 vertices (3 floats each) + 4 texcoords (2 floats each) = 20 floats per particle. Added to this, the application will generally have to transform each particle position vector into eye space so that the quads are screen-aligned. With the CPU rapidly becoming the bottleneck in 3D this is a Bad Thing ™.
SOLUTION
Introduce a new primitive type, GL_PARTICLES, alongside the existing ones. Each particle is specified by only 3 floats, its untransformed position vector.
OpenGL takes this vector and transforms it to eye space (with the possibility of GPU assist). It then generates the quad verts by just adding +/- 0.5 to the x and y coords of the transformed vert. (Alternatively, it could combine these translations with the eyespace transform, generating the four matrices at the start of the glBegin(GL_PARTICLES) block.) The size of each particle wouldn’t have to be 1x1; it could be set as global state (NOT per particle).
If texturing is enabled, each corner of the quad is automatically assigned a standard texcoord at the corresponding corner of the texture.
Overall you could save 16 floats per particle of bus bandwidth, a whole bunch of CPU calculations, and a fair bit of application complexity.
CEL-ANIMATION TEXTURING
If we ever get 3D texturing in hardware, it would be fun to have a “glTexCel(GLfloat)” function which selects a “layer” of a 3D texture to which subsequent glTexCoord2 calls will refer. This would be nice for the above particle scheme because you could select a different image for each particle with only 1 float of memory bandwidth. As well as allowing systems of heterogenous particles, you could do animated particles and get frame interpolation for free from the texture filter.
Note that with a couple of stock textures (square and round) you could implement most of the existing GL_POINTS primitive type in terms of this one, and avoid the annoying problem of large points being culled as soon as their centres go offscreen. (Textured points would be a bit of a pain though.)
Also, note that the texture cel concept could give you very fast texfont rendering, with only one glVertex2 call and one glTexCel call per character.
Thoughts? Worth doing, or do we need another primitive type like a hole in the head?