Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 12 12311 ... LastLast
Results 1 to 10 of 111

Thread: unique ROAM VBO issues and a clincher

  1. #1
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    433

    unique ROAM VBO issues and a clincher

    first let me say that i realize there are a lot of topics covering this domain already available in these forums... but i think in this case it would be better to start fresh.

    i do have a specific hardware and VBO api issue, but i would also like very much to discuss general strategies for this unique situation.

    before i try to offer some brief context, here is an illustrative aid to refer to:

    http://arcadia.angeltowns.com/share/...aics-lores.jpg

    a new screen hot off the presses. next i will try to describe what is going on in this image.

    basicly, this is a ROAM (real-time optimally adapting mesh)system. however i believe it is quite unique in contrast with its predescessors, and i believe, in spirit, perhaps as optimally effecient as is possible with contemporary hardware...

    there are essentially two subdivision tiers to this system. the first tier is dynamic, whereas the second is relatively static. splitting takes place in both tiers, but merging only in the first.

    the first tier is an equilateral sierpinski tesselation... think taking a triangle, and fitting an upside down copy of itself in its center with vertices at the parent triangle's edge's midpoints. a discontinuity of plus or minus one is allowed along the edges of the first tier, as they will be mended in the second tier. this ROAM system is designed with arbitrary manifolds (non-planar) in mind, so sierpinski is an ideal fit for the first tier.

    managing the first tier is a complex memory problem as with any such system comporable perhaps only to a complex real-time dynamic compiler (interpreter), in my opinion. the second tier however, where most of the resolution exist, is designed to off load as much as this memory management as possible.

    each triangle, or node, in the first tier forms a discrete component of the second tier. i will refer to this as a 'mesh' or maybe a 'mosaic' at some point. all of these meshes share the exact same connectivity data and require the exact same amount of system and video memory. just about everything about them can be computed only once on boot. all that is left for each 'instance' is essentially per vertex data (position,texcoords,normals,colour,etc), per vertex 'variance' weights, and a single flag byte for each face in the mesh. multiple cameras viewing the same mesh only require that the face flags be duplicated so that unique views can be produced for each camera.

    for many reasons it turns out that 8x8 is the optimal resolution for the second tier meshes. which means that each edge of the triangular mesh can be subdivided into 8 edges. in the end the only real task arises from mending the borders between second tier meshes, but a lot of data can be precomputed to aid in this mapping process, which is essentially limited only to 6 cases (xx yy zz xy xz yz) as i recall. there are also many compromises which can be made which could speed up the border mending process while sacrificing the accuracy of the of the tesselation slightly.

    finally, the connectivity of the base mesh is more complex than normal connectivity, as essentially it is a multi-resolution connectivity containing all of the information of any view/variance based tesselation of the mesh. though the fully tesselated 8x8 mesh contains 64 faces, the total multi-resolution mesh contains 127 (64+32+16+8+4+2+1) faces, as does each instance, which requires 127 bytes per frustum to store its state.

    with this in mind, it is possible to compile a 128bit signature for each possible permutation... it is this signature which i more accurately will refer to as a 'mosaic'. the signature is a series of bits which are set off or on depending upon whether or not the corresponding face is a leaf (visible) or not.

    i've built an empyrical database of all possible mosaics. ( through a process which basicly envolves me setting up a gerbil wheel simulation and strapping a rubber band around my joystick thruster... which ran for about 3 days that way with various ever finer splitting constants )

    the final result is about 200,000 mosaics, technicly around 193,300... but that number is still growing now and then as extremely rare mosaics are found, but i don't expect to grow too much further.

    finally, with that in mind, offline i have used the nvidia triangle stripper utility to compute strips for each mosaic... a database which on disk requires about 20MB, about 4 of which are 128bit keys and 32bit offsets.

    finally the basic task is to solve the signature of each second tier mesh given its per-vertex weights and the camera's position. use that signature as a key to quickly look up the apropriate mosaic, and assign it to that mesh for the later rendering routine.

    as far as VBO is concerned. each mesh upon creation uploads its per vertex data to video memory set to DYNAMIC_DRAW. when a mesh dies, it is recycled and its video memory is handed over to an incomming mesh, which then simply writes its per-vertex data over the previous owners data... that is to say that the handle is not deleted.

    as for the mosaics, as soon as a new mosaic is discovered, its index data is uploaded to video memory in STATIC_DRAW mode, as it will never be overwritten. the mosaic handles are just passed around to meshes as their signature changes. that is to say as well that it is possible for multiple meshes to share the same mosaic handles.

    in all there are 45 vertices in each mesh. unlike faces, vertices are shared at every level of subdivision. as well the mean length of the mosaics (tristrip indices) were ~130 before primitive_restart builds, now slightly less. armed with these facts it is possible for the mosaics to be byte encoded, because values greater than 0xFF are never required. 0xFF is the primitive_restart index. this saves consiberable memory, but if there is a performance hit in using byte indices i would like to know.

    mostly i would simply like to know how best to satisfy hardware constraints. in the future it would probably be useful to agressively track VBO references and delete them as is appropriate. so this sort of driver behavior would be useful to me. i wonder if i should set the per-vertex uploads to STATIC_DRAW rather than DYNAMIC_DRAW to ensure video memory. the life span of a first tier node is pretty long in computational terms, but depends mostly on circumstance though the minimum life span is also regulated.

    EDIT: -to: new readers- BUG SOLVED: STILL OPEN TO DISCUSS OPTIMIZATION HOWEVER

    my major concern though, without which i probably would not at this time be sharing this information here, is a sever performance anomaly. the 'mosaic' component of this system was a relatively new idea which has caused me to revisit the system and devote a fair amount of attention to it. since implimenting it the performance has been as good as i had hoped, but occasionally the sytem appears to fail in hardware. i'm fairly certain the source of the matter exists in the graphics hardware.

    essentially, i work with the card synched at 60 frames per second. if i push the resolution up so that normal performance is right around 100%, i am best able to guage slow down. presently, i occasionally experience a sharp 50% drop in performance on occasion. i believe this drop occurs at all resolutions, but i have yet to break the cards vsync (i believe it is called) status to see. but in any case this behavior is pretty much unexplainable. it is not fill limited, nor geometry limited, it occurs when both fill and geometry can not be an issue. also it is the case that this effect occurs when the frustum is aimed into particular regions, the boundaries of which could be said to be no more than a single pixel (skirting perspective division). that is to say, i can move the camera only the very slightest, and suddenly i will see the 50% hit, move it back, and frames go back to normal.

    i can do this with no virtually no cpu restrictions whatsoever. once the view is set, i can drop the camera, which basicly reduces the simulation to a pure rendering loop. which leads me to believe this is NOT* happening on the cpu. (at least not in my code) also it happens simply by changing the gpu modelview matrix, where as my numbers don't change a bit. so it seems like some kind of hardware culling is slowing things down.

    it doesn't occure from calling more or less DrawElements

    i'm tempted to think it is some kind of driver bug... i might see if i can find alternative drivers. but it seems more like something that would happen purely on hardware. or maybe the driver is hitting a bug while trying to cull an entire vertex buffer for me or something... which would be much more than i would ask of a driver... especially as i already do this for myself.

    maybe something at that point is causing the driver to offload my VBOs to agp memory. ( assuming it isn't in the first place ) ... but even still, i don't see how it could make that decision when the only thing changing for hardware is the modelview matrix. i can't stress that more than anything... the only factor which causes the hit is the modelview matrix. so it must be some form of hardware/drive culling, or culling based memory management, causing this best i can figure.

    i don't know what else to say, except that this hit is AWESOME out of the blue, and i can't live with it, and it is totally inexplicable.

    for the record, i'm using nvidia drivers from nvidia.com, which i downloaded for glsl support not too long ago.

    sincerely,

    michael

    PS: i'm not here to discuss ROAM vs. static geometry... this work isn't about seeing raw performance gains, its about managing massive streaming geometry with infinite scalability... yes one day all top class simulations will utilize ROAM systems, because that is the only way it is realisticly possible to seamlessly study the volumetric texture of say... 'tree bark', from 10 centimeters to 10 kilometers. or more practicly perhaps, drive a simulation of planet earth from an extremely high resolution elevation map. (which i have done with this system with the ETOPO2 2minute earth topological database)

    *edit: added missing negative (NOT)
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  2. #2
    Senior Member OpenGL Guru knackered's Avatar
    Join Date
    Aug 2001
    Location
    UK
    Posts
    3,032

    Re: unique ROAM VBO issues and a clincher

    Have you looked into geometry clipmaps?
    As far as I can see, they offer the best compromise between performance and memory, and pretty much leave the cpu to just upload small amounts of vertex data spread over many frames. CPU/bandwidth usage can be throttled on a per-frame basis....etc.etc. there's loads of good things about them.
    Vertex and texture detail can be sampled from compressed data, or generated using any algorithm you like, such as perlin noise.
    The active regions (the regions which decide what data you want to 'view' essentially) can be dynamically changed depending on whether you want to zoom into minute detail or into the stratosphere to get an overview of the whole planet....leaving the clip regions to catch up using whatever number of cpu cycles you want to allocate to the task. The only penalty to allocating less cpu time is the rendering of less detail, while the rendering speed actually goes up!
    ROAM, even a modified version such as yours, is pretty much redundant as a concept. Streaming into static vertex buffers is where it's at!
    Knackered

  3. #3
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    433

    Re: unique ROAM VBO issues and a clincher

    Originally posted by knackered:
    Have you looked into geometry clipmaps?
    As far as I can see, they offer the best compromise between performance and memory, and pretty much leave the cpu to just upload small amounts of vertex data spread over many frames. CPU/bandwidth usage can be throttled on a per-frame basis....etc.etc. there's loads of good things about them.
    Vertex and texture detail can be sampled from compressed data, or generated using any algorithm you like, such as perlin noise.
    The active regions (the regions which decide what data you want to 'view' essentially) can be dynamically changed depending on whether you want to zoom into minute detail or into the stratosphere to get an overview of the whole planet....leaving the clip regions to catch up using whatever number of cpu cycles you want to allocate to the task. The only penalty to allocating less cpu time is the rendering of less detail, while the rendering speed actually goes up!
    the algorithm screen linked to from above facilitates all of these constraints. i could've said a whole lot more about the operation, and actually intended to say a little bit more that i forgot... but anyhow, i admit though that i've never heard of 'geometry clipmaps' as a conventional terminology... sounds like space partition rendering though, which would not begin to facilitate LOD realisticly, or at least smoothly.


    ROAM, even a modified version such as yours, is pretty much redundant as a concept. Streaming into static vertex buffers is where it's at!
    if you are streaming into a buffer it is no longer static as far as i know. the algorithm i described, even as scantly clad as above, if you pay attention, you will see that it aproaches rendering static geometry very closely, and probably hits just about at par with the algorithm you've described, only with many much cleaner features.

    anyhow, i would really like to discuss hardware, primarilly VBOs as they relate to the algorithm described.

    my highest priority here though is to find the source of this crazy driver/hardware performance hit described above.

    i would also eventually like to discuss cpu/gpu parallelism and whatever options might exist there.

    i meant to say a lot more in the introductory post, but i will save it for if and when it is able to draw attention.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  4. #4
    Senior Member OpenGL Guru knackered's Avatar
    Join Date
    Aug 2001
    Location
    UK
    Posts
    3,032

    Re: unique ROAM VBO issues and a clincher

    I must admit, I just didn't read most of your original post...looked to be a hell of a lot of cpu work, which isn't acceptable....still haven't read it all, it's way too long. Take a look at geometry clipmaps, that's my advice, and if your method works out more efficient then write it up and publish it, then I'll read it...until then, life's too short to read unqualified essays on newsgroups.
    Oh, yes obviously if you change the content of a vertex buffer it ceases to be *literally* static, but if you only update small portions of it every 10 or 20 frames, then, in all but wording, it is static.
    http://research.microsoft.com/~hoppe/#geomclipmap
    Knackered

  5. #5
    Advanced Member Frequent Contributor
    Join Date
    May 2000
    Location
    London, UK
    Posts
    548

    Re: unique ROAM VBO issues and a clincher

    You need to turn off vsync before doing any kind of performance testing. It's in performance and quality settings. Click on vertical sync then the 'application controlled' checkbox and move the slider to off.

    btw the image link is broken.

    Originally posted by michagl:
    its about managing massive streaming geometry with infinite scalability...
    You dont have to use ROAM to achieve that.

  6. #6
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    433

    Re: unique ROAM VBO issues and a clincher

    Originally posted by knackered:
    I must admit, I just didn't read most of your original post...looked to be a hell of a lot of cpu work, which isn't acceptable....still haven't read it all, it's way too long. Take a look at geometry clipmaps, that's my advice, and if your method works out more efficient then write it up and publish it, then I'll read it...until then, life's too short to read unqualified essays on newsgroups.
    Oh, yes obviously if you change the content of a vertex buffer it ceases to be *literally* static, but if you only update small portions of it every 10 or 20 frames, then, in all but wording, it is static.
    http://research.microsoft.com/~hoppe/#geomclipmap
    the post requires about a minute to read if you are comfortable with english. i kept it very succinct. the system as described is gpu limited rather than cpu limited. the cpu is basicly just responsible for calculating distances for lod testing... but as i understand pciexpress technology that could probably be offloaded to the gpu in time. as for reading in small portions every 10 or 20 frames that is exactly what i'm doing. only less often and quite small portions -- 8x8 blocks. finally i don't keep up with pop terminology, but if hoppe has published it i've read it, unless it is very new.

    edit: ok, in fairness, 3 or 4 minutes. but for what its worth, the first half is a brief description, the second half trouble shooting.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  7. #7
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    433

    Re: unique ROAM VBO issues and a clincher

    Originally posted by Adrian:
    You need to turn off vsync before doing any kind of performance testing. It's in performance and quality settings. Click on vertical sync then the 'application controlled' checkbox and move the slider to off.

    btw the image link is broken.

    Originally posted by michagl:
    its about managing massive streaming geometry with infinite scalability...
    You dont have to use ROAM to achieve that.
    yes, i'm aware of vsync. i welcome the concern none the less though.

    as for the image, the original url haw 'www.' in it... i ihave no idea why i stuck that in there, habit i guess. anyhow, it should work now. keep in mind the image in very low resolution for illustrative purposes. the highlighted triangle is a fully tesselated second tier mesh.

    as for your final comment. i use the term ROAM very literally. i'm not associating with any past project(s), i simply mean 'real-time optimally adapting mesh'... which as far as i'm concerned applies to any algorithm which dynamicly and smoothly manages a mesh with respect to the frustum and topological turbulance, and perhaps other similar features. the 'smoothly' qualifier means that the mesh must make seamless transitions, meaning using 'blending' geometry doesn't count in my opinion.
    i must admit i'm partial to elegant solutions.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  8. #8
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    433

    Re: unique ROAM VBO issues and a clincher

    before i sign off, i had a pretty good idea last night. i was thinking about the ROAM critique*. the only 'flaw' i could find in the system is having to calculate frustum distances for every updated face. i came up with some ways to aproximate this process in one foul swoop per mesh.

    but i had another idea which would not preclude any others. to up the resolution of the mesh, without sacrificing any of the great qualities of 8x8. it is possible to add another vertex in the center of each face, and break teh faces up into 3 self contained triangles.

    i will spare the details, but this would up the number of triangles in a mesh 3 fold for free, without sacrificing any existing qualities of teh system.

    the only remaining issue, is the triangles would be slightly accute, but i figure they will look ok... if not, i don't plan to make the change cold turkey... it will have to be optional.

    *edit: brainflop - replace 'technique' with 'critique'

    -FOLLOW UP------------------------

    i found a beautiful solution in this vane... as it turns out, it is possible to flip the resulting scalene triangles along their bases (which is the hypotonuse of a quad). this flipping operation can be performed before building the preprocessed mosaics. the result is a mesh with much better fit triangles across the board, fairly closely aproaching equilateral triangles).

    the resulting mesh shares exactly the vertices of the mesh fed into the subdivision algorithm, but opposite edges. the edges exist only in the offline preprocessed strip indices as far as the gpu is concerned.

    the result is a much smaller mesh can drive teh lod based tesselation of a much finer mesh with optimal triangle cover with zero online performance hit.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  9. #9
    Advanced Member Frequent Contributor
    Join Date
    May 2000
    Location
    London, UK
    Posts
    548

    Re: unique ROAM VBO issues and a clincher

    i work with the card synched at 60 frames per second
    but i have yet to break the cards vsync (i believe it is called) status
    From your original post I read it that you had vsync on and hadnt figured out how to turn it off.

    Your screenshot shows an FPS of 102%. I've never seen an fps as a percentage. What is it a percentage of?

  10. #10
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    433

    Re: unique ROAM VBO issues and a clincher

    Originally posted by Adrian:
    From your original post I read it that you had vsync on and hadnt figured out how to turn it off.

    Your screenshot shows an FPS of 102%. I've never seen an fps as a percentage. What is it a percentage of?
    yes i have vsync on, but i never said i don't know how to turn it off... i just don't feel like turning it off generally. i can generally guage the efficiency of my code by its organization... if people want to ask for hard numbers i might fool with it. but there is no reason i can see to do so now.

    as for 102%, that is the frames are running at 102% of 60 frames a second. the vsync is limiting at 102%, though vsync is set for 60. if i turned it off, that percentage would go up outrageously depending on render modes. if i want to test performance, i just turn up the sudivision attenuation coefficients until the frames drop below 100%, then i know i'm in the ball park for realistic testing. i tend to animate my machines... that is i attribute animistic qualities to them... so i feel bad about making them do excessive work unecesarrilly. also turning off vsync is a good way to make windows scheduling even more difficult to work with if things get out of hand.

    as for FPS, in short that number is the number of seconds per frame averaged over 60 frames, updated once a second so that it doesn't jump around too erraticly to keep up with. below 50% it is displayed in red, up to 90% in yellow, then green up to 100% and white 100% and above.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •