unique ROAM VBO issues and a clincher

michagl · March 2, 2005, 12:45pm

first let me say that i realize there are a lot of topics covering this domain already available in these forums… but i think in this case it would be better to start fresh.

i do have a specific hardware and VBO api issue, but i would also like very much to discuss general strategies for this unique situation.

before i try to offer some brief context, here is an illustrative aid to refer to:

http://arcadia.angeltowns.com/share/genesis-mosaics-lores.jpg

a new screen hot off the presses. next i will try to describe what is going on in this image.

basicly, this is a ROAM (real-time optimally adapting mesh)system. however i believe it is quite unique in contrast with its predescessors, and i believe, in spirit, perhaps as optimally effecient as is possible with contemporary hardware…

there are essentially two subdivision tiers to this system. the first tier is dynamic, whereas the second is relatively static. splitting takes place in both tiers, but merging only in the first.

the first tier is an equilateral sierpinski tesselation… think taking a triangle, and fitting an upside down copy of itself in its center with vertices at the parent triangle’s edge’s midpoints. a discontinuity of plus or minus one is allowed along the edges of the first tier, as they will be mended in the second tier. this ROAM system is designed with arbitrary manifolds (non-planar) in mind, so sierpinski is an ideal fit for the first tier.

managing the first tier is a complex memory problem as with any such system comporable perhaps only to a complex real-time dynamic compiler (interpreter), in my opinion. the second tier however, where most of the resolution exist, is designed to off load as much as this memory management as possible.

each triangle, or node, in the first tier forms a discrete component of the second tier. i will refer to this as a ‘mesh’ or maybe a ‘mosaic’ at some point. all of these meshes share the exact same connectivity data and require the exact same amount of system and video memory. just about everything about them can be computed only once on boot. all that is left for each ‘instance’ is essentially per vertex data (position,texcoords,normals,colour,etc), per vertex ‘variance’ weights, and a single flag byte for each face in the mesh. multiple cameras viewing the same mesh only require that the face flags be duplicated so that unique views can be produced for each camera.

for many reasons it turns out that 8x8 is the optimal resolution for the second tier meshes. which means that each edge of the triangular mesh can be subdivided into 8 edges. in the end the only real task arises from mending the borders between second tier meshes, but a lot of data can be precomputed to aid in this mapping process, which is essentially limited only to 6 cases (xx yy zz xy xz yz) as i recall. there are also many compromises which can be made which could speed up the border mending process while sacrificing the accuracy of the of the tesselation slightly.

finally, the connectivity of the base mesh is more complex than normal connectivity, as essentially it is a multi-resolution connectivity containing all of the information of any view/variance based tesselation of the mesh. though the fully tesselated 8x8 mesh contains 64 faces, the total multi-resolution mesh contains 127 (64+32+16+8+4+2+1) faces, as does each instance, which requires 127 bytes per frustum to store its state.

with this in mind, it is possible to compile a 128bit signature for each possible permutation… it is this signature which i more accurately will refer to as a ‘mosaic’. the signature is a series of bits which are set off or on depending upon whether or not the corresponding face is a leaf (visible) or not.

i’ve built an empyrical database of all possible mosaics. ( through a process which basicly envolves me setting up a gerbil wheel simulation and strapping a rubber band around my joystick thruster… which ran for about 3 days that way with various ever finer splitting constants )

the final result is about 200,000 mosaics, technicly around 193,300… but that number is still growing now and then as extremely rare mosaics are found, but i don’t expect to grow too much further.

finally, with that in mind, offline i have used the nvidia triangle stripper utility to compute strips for each mosaic… a database which on disk requires about 20MB, about 4 of which are 128bit keys and 32bit offsets.

finally the basic task is to solve the signature of each second tier mesh given its per-vertex weights and the camera’s position. use that signature as a key to quickly look up the apropriate mosaic, and assign it to that mesh for the later rendering routine.

as far as VBO is concerned. each mesh upon creation uploads its per vertex data to video memory set to DYNAMIC_DRAW. when a mesh dies, it is recycled and its video memory is handed over to an incomming mesh, which then simply writes its per-vertex data over the previous owners data… that is to say that the handle is not deleted.

as for the mosaics, as soon as a new mosaic is discovered, its index data is uploaded to video memory in STATIC_DRAW mode, as it will never be overwritten. the mosaic handles are just passed around to meshes as their signature changes. that is to say as well that it is possible for multiple meshes to share the same mosaic handles.

in all there are 45 vertices in each mesh. unlike faces, vertices are shared at every level of subdivision. as well the mean length of the mosaics (tristrip indices) were ~130 before primitive_restart builds, now slightly less. armed with these facts it is possible for the mosaics to be byte encoded, because values greater than 0xFF are never required. 0xFF is the primitive_restart index. this saves consiberable memory, but if there is a performance hit in using byte indices i would like to know.

mostly i would simply like to know how best to satisfy hardware constraints. in the future it would probably be useful to agressively track VBO references and delete them as is appropriate. so this sort of driver behavior would be useful to me. i wonder if i should set the per-vertex uploads to STATIC_DRAW rather than DYNAMIC_DRAW to ensure video memory. the life span of a first tier node is pretty long in computational terms, but depends mostly on circumstance though the minimum life span is also regulated.

EDIT: -to: new readers- BUG SOLVED: STILL OPEN TO DISCUSS OPTIMIZATION HOWEVER

my major concern though, without which i probably would not at this time be sharing this information here, is a sever performance anomaly. the ‘mosaic’ component of this system was a relatively new idea which has caused me to revisit the system and devote a fair amount of attention to it. since implimenting it the performance has been as good as i had hoped, but occasionally the sytem appears to fail in hardware. i’m fairly certain the source of the matter exists in the graphics hardware.

essentially, i work with the card synched at 60 frames per second. if i push the resolution up so that normal performance is right around 100%, i am best able to guage slow down. presently, i occasionally experience a sharp 50% drop in performance on occasion. i believe this drop occurs at all resolutions, but i have yet to break the cards vsync (i believe it is called) status to see. but in any case this behavior is pretty much unexplainable. it is not fill limited, nor geometry limited, it occurs when both fill and geometry can not be an issue. also it is the case that this effect occurs when the frustum is aimed into particular regions, the boundaries of which could be said to be no more than a single pixel (skirting perspective division). that is to say, i can move the camera only the very slightest, and suddenly i will see the 50% hit, move it back, and frames go back to normal.

i can do this with no virtually no cpu restrictions whatsoever. once the view is set, i can drop the camera, which basicly reduces the simulation to a pure rendering loop. which leads me to believe this is NOT* happening on the cpu. (at least not in my code) also it happens simply by changing the gpu modelview matrix, where as my numbers don’t change a bit. so it seems like some kind of hardware culling is slowing things down.

it doesn’t occure from calling more or less DrawElements

i’m tempted to think it is some kind of driver bug… i might see if i can find alternative drivers. but it seems more like something that would happen purely on hardware. or maybe the driver is hitting a bug while trying to cull an entire vertex buffer for me or something… which would be much more than i would ask of a driver… especially as i already do this for myself.

maybe something at that point is causing the driver to offload my VBOs to agp memory. ( assuming it isn’t in the first place ) … but even still, i don’t see how it could make that decision when the only thing changing for hardware is the modelview matrix. i can’t stress that more than anything… the only factor which causes the hit is the modelview matrix. so it must be some form of hardware/drive culling, or culling based memory management, causing this best i can figure.

i don’t know what else to say, except that this hit is AWESOME out of the blue, and i can’t live with it, and it is totally inexplicable.

for the record, i’m using nvidia drivers from nvidia.com, which i downloaded for glsl support not too long ago.

sincerely,

michael

PS: i’m not here to discuss ROAM vs. static geometry… this work isn’t about seeing raw performance gains, its about managing massive streaming geometry with infinite scalability… yes one day all top class simulations will utilize ROAM systems, because that is the only way it is realisticly possible to seamlessly study the volumetric texture of say… ‘tree bark’, from 10 centimeters to 10 kilometers. or more practicly perhaps, drive a simulation of planet earth from an extremely high resolution elevation map. (which i have done with this system with the ETOPO2 2minute earth topological database)

*edit: added missing negative (NOT)

knackered · March 2, 2005, 1:26pm

Have you looked into geometry clipmaps?
As far as I can see, they offer the best compromise between performance and memory, and pretty much leave the cpu to just upload small amounts of vertex data spread over many frames. CPU/bandwidth usage can be throttled on a per-frame basis…etc.etc. there’s loads of good things about them.
Vertex and texture detail can be sampled from compressed data, or generated using any algorithm you like, such as perlin noise.
The active regions (the regions which decide what data you want to ‘view’ essentially) can be dynamically changed depending on whether you want to zoom into minute detail or into the stratosphere to get an overview of the whole planet…leaving the clip regions to catch up using whatever number of cpu cycles you want to allocate to the task. The only penalty to allocating less cpu time is the rendering of less detail, while the rendering speed actually goes up!
ROAM, even a modified version such as yours, is pretty much redundant as a concept. Streaming into static vertex buffers is where it’s at!

michagl · March 2, 2005, 2:45pm

Originally posted by knackered:
Have you looked into geometry clipmaps?
As far as I can see, they offer the best compromise between performance and memory, and pretty much leave the cpu to just upload small amounts of vertex data spread over many frames. CPU/bandwidth usage can be throttled on a per-frame basis…etc.etc. there’s loads of good things about them.
Vertex and texture detail can be sampled from compressed data, or generated using any algorithm you like, such as perlin noise.
The active regions (the regions which decide what data you want to ‘view’ essentially) can be dynamically changed depending on whether you want to zoom into minute detail or into the stratosphere to get an overview of the whole planet…leaving the clip regions to catch up using whatever number of cpu cycles you want to allocate to the task. The only penalty to allocating less cpu time is the rendering of less detail, while the rendering speed actually goes up!
the algorithm screen linked to from above facilitates all of these constraints. i could’ve said a whole lot more about the operation, and actually intended to say a little bit more that i forgot… but anyhow, i admit though that i’ve never heard of ‘geometry clipmaps’ as a conventional terminology… sounds like space partition rendering though, which would not begin to facilitate LOD realisticly, or at least smoothly.

ROAM, even a modified version such as yours, is pretty much redundant as a concept. Streaming into static vertex buffers is where it’s at!
if you are streaming into a buffer it is no longer static as far as i know. the algorithm i described, even as scantly clad as above, if you pay attention, you will see that it aproaches rendering static geometry very closely, and probably hits just about at par with the algorithm you’ve described, only with many much cleaner features.

anyhow, i would really like to discuss hardware, primarilly VBOs as they relate to the algorithm described.

my highest priority here though is to find the source of this crazy driver/hardware performance hit described above.

i would also eventually like to discuss cpu/gpu parallelism and whatever options might exist there.

i meant to say a lot more in the introductory post, but i will save it for if and when it is able to draw attention.

knackered · March 2, 2005, 10:14pm

I must admit, I just didn’t read most of your original post…looked to be a hell of a lot of cpu work, which isn’t acceptable…still haven’t read it all, it’s way too long. Take a look at geometry clipmaps, that’s my advice, and if your method works out more efficient then write it up and publish it, then I’ll read it…until then, life’s too short to read unqualified essays on newsgroups.
Oh, yes obviously if you change the content of a vertex buffer it ceases to be literally static, but if you only update small portions of it every 10 or 20 frames, then, in all but wording, it is static.
http://research.microsoft.com/~hoppe/#geomclipmap

imported_Adrian1 · March 2, 2005, 11:24pm

You need to turn off vsync before doing any kind of performance testing. It’s in performance and quality settings. Click on vertical sync then the ‘application controlled’ checkbox and move the slider to off.

btw the image link is broken.

Originally posted by michagl:
its about managing massive streaming geometry with infinite scalability…

You dont have to use ROAM to achieve that.

michagl · March 3, 2005, 4:49am

Originally posted by knackered:
I must admit, I just didn’t read most of your original post…looked to be a hell of a lot of cpu work, which isn’t acceptable…still haven’t read it all, it’s way too long. Take a look at geometry clipmaps, that’s my advice, and if your method works out more efficient then write it up and publish it, then I’ll read it…until then, life’s too short to read unqualified essays on newsgroups.
Oh, yes obviously if you change the content of a vertex buffer it ceases to be literally static, but if you only update small portions of it every 10 or 20 frames, then, in all but wording, it is static.
http://research.microsoft.com/~hoppe/#geomclipmap
the post requires about a minute to read if you are comfortable with english. i kept it very succinct. the system as described is gpu limited rather than cpu limited. the cpu is basicly just responsible for calculating distances for lod testing… but as i understand pciexpress technology that could probably be offloaded to the gpu in time. as for reading in small portions every 10 or 20 frames that is exactly what i’m doing. only less often and quite small portions – 8x8 blocks. finally i don’t keep up with pop terminology, but if hoppe has published it i’ve read it, unless it is very new.

edit: ok, in fairness, 3 or 4 minutes. but for what its worth, the first half is a brief description, the second half trouble shooting.

michagl · March 3, 2005, 5:05am

Originally posted by Adrian:
[b]You need to turn off vsync before doing any kind of performance testing. It’s in performance and quality settings. Click on vertical sync then the ‘application controlled’ checkbox and move the slider to off.

btw the image link is broken.

[quote]Originally posted by michagl:
its about managing massive streaming geometry with infinite scalability…

You dont have to use ROAM to achieve that.[/b][/QUOTE]yes, i’m aware of vsync. i welcome the concern none the less though.

as for the image, the original url haw ‘www.’ in it… i ihave no idea why i stuck that in there, habit i guess. anyhow, it should work now. keep in mind the image in very low resolution for illustrative purposes. the highlighted triangle is a fully tesselated second tier mesh.

as for your final comment. i use the term ROAM very literally. i’m not associating with any past project(s), i simply mean ‘real-time optimally adapting mesh’… which as far as i’m concerned applies to any algorithm which dynamicly and smoothly manages a mesh with respect to the frustum and topological turbulance, and perhaps other similar features. the ‘smoothly’ qualifier means that the mesh must make seamless transitions, meaning using ‘blending’ geometry doesn’t count in my opinion.
i must admit i’m partial to elegant solutions.

michagl · March 3, 2005, 5:21am

before i sign off, i had a pretty good idea last night. i was thinking about the ROAM critique*. the only ‘flaw’ i could find in the system is having to calculate frustum distances for every updated face. i came up with some ways to aproximate this process in one foul swoop per mesh.

but i had another idea which would not preclude any others. to up the resolution of the mesh, without sacrificing any of the great qualities of 8x8. it is possible to add another vertex in the center of each face, and break teh faces up into 3 self contained triangles.

i will spare the details, but this would up the number of triangles in a mesh 3 fold for free, without sacrificing any existing qualities of teh system.

the only remaining issue, is the triangles would be slightly accute, but i figure they will look ok… if not, i don’t plan to make the change cold turkey… it will have to be optional.

*edit: brainflop - replace ‘technique’ with ‘critique’

-FOLLOW UP------------------------

i found a beautiful solution in this vane… as it turns out, it is possible to flip the resulting scalene triangles along their bases (which is the hypotonuse of a quad). this flipping operation can be performed before building the preprocessed mosaics. the result is a mesh with much better fit triangles across the board, fairly closely aproaching equilateral triangles).

the resulting mesh shares exactly the vertices of the mesh fed into the subdivision algorithm, but opposite edges. the edges exist only in the offline preprocessed strip indices as far as the gpu is concerned.

the result is a much smaller mesh can drive teh lod based tesselation of a much finer mesh with optimal triangle cover with zero online performance hit.

imported_Adrian1 · March 3, 2005, 5:38am

i work with the card synched at 60 frames per second

but i have yet to break the cards vsync (i believe it is called) status
From your original post I read it that you had vsync on and hadnt figured out how to turn it off.

Your screenshot shows an FPS of 102%. I’ve never seen an fps as a percentage. What is it a percentage of?

michagl · March 3, 2005, 7:43am

[b]Originally posted by Adrian:
From your original post I read it that you had vsync on and hadnt figured out how to turn it off.

Your screenshot shows an FPS of 102%. I’ve never seen an fps as a percentage. What is it a percentage of?[/b]
yes i have vsync on, but i never said i don’t know how to turn it off… i just don’t feel like turning it off generally. i can generally guage the efficiency of my code by its organization… if people want to ask for hard numbers i might fool with it. but there is no reason i can see to do so now.

as for 102%, that is the frames are running at 102% of 60 frames a second. the vsync is limiting at 102%, though vsync is set for 60. if i turned it off, that percentage would go up outrageously depending on render modes. if i want to test performance, i just turn up the sudivision attenuation coefficients until the frames drop below 100%, then i know i’m in the ball park for realistic testing. i tend to animate my machines… that is i attribute animistic qualities to them… so i feel bad about making them do excessive work unecesarrilly. also turning off vsync is a good way to make windows scheduling even more difficult to work with if things get out of hand.

as for FPS, in short that number is the number of seconds per frame averaged over 60 frames, updated once a second so that it doesn’t jump around too erraticly to keep up with. below 50% it is displayed in red, up to 90% in yellow, then green up to 100% and white 100% and above.

3B1 · March 3, 2005, 8:18am

Do you actually get values between 30 and 60 FPS with vsync on? I would have expected you to be limited to factors of 60 (60,30,20,15, etc), unless you are right on the edge and not all frames are taking the same amount of time, or frame times are varying a lot within the 1 second sample…

michagl · March 3, 2005, 8:26am

Originally posted by 3B:
Do you actually get values between 30 and 60 FPS with vsync on? I would have expected you to be limited to factors of 60 (60,30,20,15, etc), unless you are right on the edge and not all frames are taking the same amount of time, or frame times are varying a lot within the 1 second sample…
that is interesting. if vsync really works as i believe you describe it, that might actually explain the crazy 50% hit i’m seeing.

i don’t understand why vsync would work like that. but assuming it does, i will look into it asap. if you can point me to technical docs regarding the nature of vsync, i would apreciate it.

yes i do get numbers varying across the board, but like you notice, i am averaging over a second for readability, so the average value could be misleading if that is indeed what is happening.

in the past i’ve noticed better performance across the board when programming various systems with vsync disabled. i realize there is an extension for manually managing vsync at run-time which i’ve considered using when the frames go over a certain threshold.

in any case, i apreciate the insight, and i would like very much to persue this line of reasoning.

3B1 · March 3, 2005, 9:01am

Basically what happens with vsync on, is that you can only swap buffers when the monitor finishes displaying the current frame (aka at the beginning of the vertical retrace interval, which is triggered by the vertical sync signal, thus the name ‘vsync’), in this case every 60 seconds. If you take too long rendering, you miss a chance to swap buffers, and have to wait for the next retrace, so it takes a total of 1/30sec since the previous frame before the new one is displayed, leading to the sudden drop to 30 FPS.
In other words, vsync effectively round your frame time up to the next integer multiple of 1 monitor frame, so you get 60/1=60Hz,60/2=30Hz,60/3=20Hz, etc.

Korval · March 3, 2005, 9:53am

What’s probably happenning is that most of the time, you’re pushing the hardware so that rendering takes about 16.5 seconds. Enough for 60fps, but just barely. However, sometimes, the hardware has to do some extra clipping or something (camera-dependent) that makes it take 16.7 seconds or so, which is just past the 16.6667 threshold for maintaining 60fps. Since you’re v-sync’d, you’re going to drop to 30fps.

To verify this, turn off v-sync. If you find a mild framerate drop (60 to 58, for example), then this is probably what is happenning.

michagl · March 3, 2005, 10:16am

Originally posted by 3B:
Basically what happens with vsync on, is that you can only swap buffers when the monitor finishes displaying the current frame (aka at the beginning of the vertical retrace interval, which is triggered by the vertical sync signal, thus the name ‘vsync’), in this case every 60 seconds. If you take too long rendering, you miss a chance to swap buffers, and have to wait for the next retrace, so it takes a total of 1/30sec since the previous frame before the new one is displayed, leading to the sudden drop to 30 FPS.
In other words, vsync effectively round your frame time up to the next integer multiple of 1 monitor frame, so you get 60/1=60Hz,60/2=30Hz,60/3=20Hz, etc.
seems like you are correct… i disabled the vsync, which unfortunately is not as easy as it should be on a win2k and linux machine with 3 monitors using nview and a pci card — i’m hoping nvidia will ever fix this bug in case anyone from nvidia is reading.

anyhow, that seems to explain the sharpness of the hit… it would just cross the tiniest threshold due to hardware culling and then round off to the next edge.

i worry though about the visual effects which might occur from not adhereing to vsync… honestly i just mostly use it as a cheap built in performance limiter, and have never really thought of what visual artifacts might occur from being out of sync with the monitor. care to explain anyone?

and finally, as a major bonus to me… i noticed some funny behavior when i had the system set to not render any primitives. i only recently picked this system back up for a couple reasons, but the major reason i’ve stuck with it lately is this really great ‘mosaics’ idea i had in the process. anyhow, turns out last time i dropped the system… i believe i had to run off to visit portland/seattle out of the blue… anyhow, i was trying out the hardware occlusion solutions out there, which i found completely inadequate, but there was a line of code i had missed to comment out, which was effectively rendering everything twice, once to occlusion system… and to make matters worse, it was not retrofitted for mosaics, so it was passing an 8bit index buffer as 16bits which meant A) it was overflowing, and B) there was no telling what kind of crazy index values were comming out of combining the bytes in that buffer.

i could’ve swore the system had a lot more kick when i left it than lately… now it is running incromprehensively fast… with the new mosaic system and removal of this major bug… so i will have to spend a little time playing with it before i can really comment to much.

however i’m still very interested in persueing the hardware aspects of this system. as i fill it is fairly special, and could possibly very well be a corner stone of future graphics systems. it is an ideal candidate for hardware mapping as well i believe… if and when progressive meshing is integrated at the hardware level – much better than the hardware nurbs tesselators out there, though i admit i’m really not familiar with them as a developer.

michagl · March 3, 2005, 10:21am

Originally posted by Korval:
[b]What’s probably happenning is that most of the time, you’re pushing the hardware so that rendering takes about 16.5 seconds. Enough for 60fps, but just barely. However, sometimes, the hardware has to do some extra clipping or something (camera-dependent) that makes it take 16.7 seconds or so, which is just past the 16.6667 threshold for maintaining 60fps. Since you’re v-sync’d, you’re going to drop to 30fps.

To verify this, turn off v-sync. If you find a mild framerate drop (60 to 58, for example), then this is probably what is happenning.[/b]
yeah, i think your diagnosis is spot on. i missed your post before posting my last for what its worth, or i woould’ve given you credit.

michagl · March 3, 2005, 10:43am

ok, first… i was happy because vsync definately seems to be the reason for the sudden sharpness of the performance hit… but i was still a bit worried, because even with vsync disabled, i still go the gradual performance hit.

i haven’t tested anything, but i’m pretty sure i know where the hit was coming from. it seemed to occur most when the final mosaic was on screen… which happens to be the full mosaic. the reason it would occur, is because the mosaics all share the same buffer, so when the occlusion system would pass crazy index values to hardware, they would still be inside the buffer, and get relatively reasonable values. but when it go near the end of the buffer, the values would jump outside the consolidated buffer, meaning anything could get passed to the hardware for rendering… meaning crazy fill was likely going on, because it was not rendered to the screen buffer, because it was going to offscreen occlusion ‘buffer’ and probably simultaneously to the depth buffer rather.

edit: ok that reasoning really isn’t accurate, and i knew that really when i was writing it… but something similar to that is going on, though more likely the real nature would depend more on the organization of the video memory than system memory. the final factor is, that some mosaics would create weird fill rates, while others might not, because the vertices are all normalized in a local space for optimal precision, and the whole system itself takes place in a normalized space around 1.0… so likely all of the vertices on the card would not have caused serious fill issues, until an index picked outside a vector region.

so that is my theory for the performance lulls… not really a theory though because i’m sure its correct. there probably aren’t any other performance issues out there.

so finally, if anyone feels like they understand what i’m doing, see promise in it, or would just like to be helpful. i’m very open to hardware and opengl api related advice anyone feels like they can contribute. i will try to take names a give credit if you like.

i’m mostly interested in what i can do to optimize parallelism. how most effectively to utilize VBOs. and i would very much like to know if there is anyway to make a pact with driver when uploading VBOs… for instance to tell it, “here is a buffer, it needs to be uploaded, but it won’t change until the buffer needs to be rendered, so take your time uploading if you need to”… i figure there could be some parallelism in DMA (direct memory access) or something, if you can promise not to mess with some system memory, so that the driver can get around to copying it when it feels like it. i don’t see any room for that in the VBO api… does parallelism stop when you need to upload a buffer? is it immediately copied into AGP memory for upload to video memory? or can DMA come around and catch it later… i take it everything must go through AGP memory to get to video memory, correct?

i don’t pretend to understand any of this stuff… so if anyone feels like weighing in. i would apreciate it greatly.

knackered · March 3, 2005, 10:44am

Sorry mate, you started waffling on about gerbils at which point I lost my optimism and gave up reading.
I still haven’t read it. You need diagrams and such like to best describe your algorithm. Oh, and you need to properly analyse it under stress tests and a good profiler…unfortunately, this also requires knowledge of such fundamentals as vsyncing, thread throttling, load analysis, cache coherency…before declaring your method optimal.
Have you read the hoppes paper yet? If you consider siggraph’04 as being new, then yes it’s new.

michagl · March 3, 2005, 12:03pm

Originally posted by knackered:
Sorry mate, you started waffling on about gerbils at which point I lost my optimism and gave up reading.
the gerbil is an analogy. solving every possible permutation of a base to apex subdivision is a lot more complex than it sounds. so its reasonable to build an empyrical database, before attempting an automated algorithm… so you have something to compare your results to. the ‘gerbil’ simulation facilitates building the empyrical data base. the camera is the gerbil, the environment is actually a model in the works, of the alien/god/satellite known as gaea from the classic scifi novels penned by john varley, Titan, Wizard, and Demon… part of a promotional demo, essentially though its a gerbil wheel, or a world with simulated centripetal ‘gravity’, such as the space module from 2001. a ‘gerbil wheel’. anyhow, could’ve been done on a model of earth the same way… but the gerbil wheel seemed more appropriate for the task.

Originally posted by knackered:
I still haven’t read it. You need diagrams and such like to best describe your algorithm. Oh, and you need to properly analyse it under stress tests and a good profiler…unfortunately, this also requires knowledge of such fundamentals as vsyncing, thread throttling, load analysis, cache coherency…before declaring your method optimal.
Have you read the hoppes paper yet? If you consider siggraph’04 as being new, then yes it’s new.
sorry mate, but i’m not an acedemic, so all that will have to wait until a complete system is online, and continued development has become an after thought.

as far as vsyncing, thread throttling, load analysis, and what not, that is all quite secondary to the theoretical algorithm… little more than an implimentation concern. and i’m not claiming optimal, and certaintly not claiming perfect, but if you know your stuff, then you aught to be able to look at the algorithm and apply your knowledge of such really hardware api level concerns.

as far as thread throttling, i’m not sure if you mean managing a multithreaded app, or fooling around with os scheduling, but i’ve stayed away from the windows multi-threading api, because i don’t like it, and haven’t felt like wrapping it… though the thought does cross my mind on occasion.

as for load analysis, i know where my loads are… but if you know of a good free app for analysis, my ears are open.

as for cache coherency, i don’t know if you are referring to cpu, or gpu, but i assume hardware manufacturers don’t expect application programmers to keep up with every little hardware development, and that is why nvidia freely provides tools such as their stripper, to manage such matters.

besides, don’t get me wrong, because i’m nowhere near those stages with this system. a lot more work needs to be done, and there is a sister system as well even further behind in development, which is responsible for stream writing and reading massive multiresolution images to disk with an opengl type interface. it does everything but triangles right now, which i intend to pick up very soon.

end of the story, chill out, i’m not trying to step on anyone’s toes.

edit: sorry about not saying anything about hoppe’s paper… i’ve been meaning to look at it, but a lot has been happening. i’m fairly certain i’ve read it though… i’m assuming its his streaming simulation of a washington area mountain region. i didn’t find anything personally interesting in that paper… i could offer impressions, but it was a long time ago that i read it, so my memories probably would not be accurate. of course the paper may be new, but either way i will give it a look.

– FOLLOW UP --------

yeah, thats the one i read… though the abstract says its the entire USA rather than just a washington area… i just made a gernalization on that account. anyhow i don’t remember the details, but as i recall, it is really more of a demo, than a real simulation environment… and the fact taht it is done on a regular grid, rids it fairly useless for non-planar geometry. i recall reading it all before, and finding nothing useful… i probably remember it particularly, because i usually do come away with something useful from hoppe’s papers… but not that time.

if you insist i will give it a look… but microsofts pdf’s are very hard to get at, because they don’t allow download’s, and i’m on a 56k modem in the middle of no where, a long way away from the nearest hub… so its really more like 28k at best.

knackered · March 3, 2005, 2:27pm

You’re actually saying nothing about your own algorithm. You’re just pompously rambling on to yourself about essentially nothing, self editing your own text using some kind of third persona. Who are you really talking to?
You’ve read hoppes paper? Good, you found nothing interesting in it? Because it concerned itself specifically with planar regular grids? That’s because it’s an optimisation technique for planar regular grids, it wasn’t intended to be another progressive mesh technique. It’s intended for extremely large planar datasets, and it’s the most optimal technique I’ve come across so far for these datasets.
Your technique is for general meshes then? Yet you say you think it’s up to nvidia to provide you with geometry optimisation tools, which suggests you precompute your tesselations offline, which suggests it’s limited to relatively small datasets…either that or there’s a hell of a lot of cpu work going on at runtime.
I have actually read what you’ve said now, and I’m none the wiser, all I see is gerbils playing chess inside “equilateral sierpinski’s”.
For me, try to explain your algorithm in no more than, say, 10 sentences using words of no more than, say, two syllables (avoiding all gerbal analogies, no matter how tempting).
Or, if you prefer, dismiss me with a patronising wave of your keyboard.