first let me say that i realize there are a lot of topics covering this domain already available in these forums… but i think in this case it would be better to start fresh.
i do have a specific hardware and VBO api issue, but i would also like very much to discuss general strategies for this unique situation.
before i try to offer some brief context, here is an illustrative aid to refer to:
http://arcadia.angeltowns.com/share/genesis-mosaics-lores.jpg
a new screen hot off the presses. next i will try to describe what is going on in this image.
basicly, this is a ROAM (real-time optimally adapting mesh)system. however i believe it is quite unique in contrast with its predescessors, and i believe, in spirit, perhaps as optimally effecient as is possible with contemporary hardware…
there are essentially two subdivision tiers to this system. the first tier is dynamic, whereas the second is relatively static. splitting takes place in both tiers, but merging only in the first.
the first tier is an equilateral sierpinski tesselation… think taking a triangle, and fitting an upside down copy of itself in its center with vertices at the parent triangle’s edge’s midpoints. a discontinuity of plus or minus one is allowed along the edges of the first tier, as they will be mended in the second tier. this ROAM system is designed with arbitrary manifolds (non-planar) in mind, so sierpinski is an ideal fit for the first tier.
managing the first tier is a complex memory problem as with any such system comporable perhaps only to a complex real-time dynamic compiler (interpreter), in my opinion. the second tier however, where most of the resolution exist, is designed to off load as much as this memory management as possible.
each triangle, or node, in the first tier forms a discrete component of the second tier. i will refer to this as a ‘mesh’ or maybe a ‘mosaic’ at some point. all of these meshes share the exact same connectivity data and require the exact same amount of system and video memory. just about everything about them can be computed only once on boot. all that is left for each ‘instance’ is essentially per vertex data (position,texcoords,normals,colour,etc), per vertex ‘variance’ weights, and a single flag byte for each face in the mesh. multiple cameras viewing the same mesh only require that the face flags be duplicated so that unique views can be produced for each camera.
for many reasons it turns out that 8x8 is the optimal resolution for the second tier meshes. which means that each edge of the triangular mesh can be subdivided into 8 edges. in the end the only real task arises from mending the borders between second tier meshes, but a lot of data can be precomputed to aid in this mapping process, which is essentially limited only to 6 cases (xx yy zz xy xz yz) as i recall. there are also many compromises which can be made which could speed up the border mending process while sacrificing the accuracy of the of the tesselation slightly.
finally, the connectivity of the base mesh is more complex than normal connectivity, as essentially it is a multi-resolution connectivity containing all of the information of any view/variance based tesselation of the mesh. though the fully tesselated 8x8 mesh contains 64 faces, the total multi-resolution mesh contains 127 (64+32+16+8+4+2+1) faces, as does each instance, which requires 127 bytes per frustum to store its state.
with this in mind, it is possible to compile a 128bit signature for each possible permutation… it is this signature which i more accurately will refer to as a ‘mosaic’. the signature is a series of bits which are set off or on depending upon whether or not the corresponding face is a leaf (visible) or not.
i’ve built an empyrical database of all possible mosaics. ( through a process which basicly envolves me setting up a gerbil wheel simulation and strapping a rubber band around my joystick thruster… which ran for about 3 days that way with various ever finer splitting constants )
the final result is about 200,000 mosaics, technicly around 193,300… but that number is still growing now and then as extremely rare mosaics are found, but i don’t expect to grow too much further.
finally, with that in mind, offline i have used the nvidia triangle stripper utility to compute strips for each mosaic… a database which on disk requires about 20MB, about 4 of which are 128bit keys and 32bit offsets.
finally the basic task is to solve the signature of each second tier mesh given its per-vertex weights and the camera’s position. use that signature as a key to quickly look up the apropriate mosaic, and assign it to that mesh for the later rendering routine.
as far as VBO is concerned. each mesh upon creation uploads its per vertex data to video memory set to DYNAMIC_DRAW. when a mesh dies, it is recycled and its video memory is handed over to an incomming mesh, which then simply writes its per-vertex data over the previous owners data… that is to say that the handle is not deleted.
as for the mosaics, as soon as a new mosaic is discovered, its index data is uploaded to video memory in STATIC_DRAW mode, as it will never be overwritten. the mosaic handles are just passed around to meshes as their signature changes. that is to say as well that it is possible for multiple meshes to share the same mosaic handles.
in all there are 45 vertices in each mesh. unlike faces, vertices are shared at every level of subdivision. as well the mean length of the mosaics (tristrip indices) were ~130 before primitive_restart builds, now slightly less. armed with these facts it is possible for the mosaics to be byte encoded, because values greater than 0xFF are never required. 0xFF is the primitive_restart index. this saves consiberable memory, but if there is a performance hit in using byte indices i would like to know.
mostly i would simply like to know how best to satisfy hardware constraints. in the future it would probably be useful to agressively track VBO references and delete them as is appropriate. so this sort of driver behavior would be useful to me. i wonder if i should set the per-vertex uploads to STATIC_DRAW rather than DYNAMIC_DRAW to ensure video memory. the life span of a first tier node is pretty long in computational terms, but depends mostly on circumstance though the minimum life span is also regulated.
EDIT: -to: new readers- BUG SOLVED: STILL OPEN TO DISCUSS OPTIMIZATION HOWEVER
my major concern though, without which i probably would not at this time be sharing this information here, is a sever performance anomaly. the ‘mosaic’ component of this system was a relatively new idea which has caused me to revisit the system and devote a fair amount of attention to it. since implimenting it the performance has been as good as i had hoped, but occasionally the sytem appears to fail in hardware. i’m fairly certain the source of the matter exists in the graphics hardware.
essentially, i work with the card synched at 60 frames per second. if i push the resolution up so that normal performance is right around 100%, i am best able to guage slow down. presently, i occasionally experience a sharp 50% drop in performance on occasion. i believe this drop occurs at all resolutions, but i have yet to break the cards vsync (i believe it is called) status to see. but in any case this behavior is pretty much unexplainable. it is not fill limited, nor geometry limited, it occurs when both fill and geometry can not be an issue. also it is the case that this effect occurs when the frustum is aimed into particular regions, the boundaries of which could be said to be no more than a single pixel (skirting perspective division). that is to say, i can move the camera only the very slightest, and suddenly i will see the 50% hit, move it back, and frames go back to normal.
i can do this with no virtually no cpu restrictions whatsoever. once the view is set, i can drop the camera, which basicly reduces the simulation to a pure rendering loop. which leads me to believe this is NOT* happening on the cpu. (at least not in my code) also it happens simply by changing the gpu modelview matrix, where as my numbers don’t change a bit. so it seems like some kind of hardware culling is slowing things down.
it doesn’t occure from calling more or less DrawElements
i’m tempted to think it is some kind of driver bug… i might see if i can find alternative drivers. but it seems more like something that would happen purely on hardware. or maybe the driver is hitting a bug while trying to cull an entire vertex buffer for me or something… which would be much more than i would ask of a driver… especially as i already do this for myself.
maybe something at that point is causing the driver to offload my VBOs to agp memory. ( assuming it isn’t in the first place ) … but even still, i don’t see how it could make that decision when the only thing changing for hardware is the modelview matrix. i can’t stress that more than anything… the only factor which causes the hit is the modelview matrix. so it must be some form of hardware/drive culling, or culling based memory management, causing this best i can figure.
i don’t know what else to say, except that this hit is AWESOME out of the blue, and i can’t live with it, and it is totally inexplicable.
for the record, i’m using nvidia drivers from nvidia.com, which i downloaded for glsl support not too long ago.
sincerely,
michael
PS: i’m not here to discuss ROAM vs. static geometry… this work isn’t about seeing raw performance gains, its about managing massive streaming geometry with infinite scalability… yes one day all top class simulations will utilize ROAM systems, because that is the only way it is realisticly possible to seamlessly study the volumetric texture of say… ‘tree bark’, from 10 centimeters to 10 kilometers. or more practicly perhaps, drive a simulation of planet earth from an extremely high resolution elevation map. (which i have done with this system with the ETOPO2 2minute earth topological database)
*edit: added missing negative (NOT)