OpenGL needs real curved surfaces

OpenGL really needs direct scan conversion of biparametric rational curved surfaces of at least order 4 instead of approximating them with polygon meshes. Here’s why:

  • To get a good approximation of many simple curves sometimes requires more polygons than current systems can render in real time.

  • The increased expense of scan converting curves as compared to scan converting lines is frequently used as a justification for not diplaying them directly but instead approximating them with lines. A similar argument is used for approximating curved surfaces with polygon meshes. However these arguments ignore the fact that the curve or curved surface is scan converted just once, whereas many lines or polygons are often needed for a close to reasonable approximation of a curve, and each of the lines is scan converted individually. So even though scan converting a curve is a harder problem than scan converting a line, directly scan converting a curve could turn out to be easier than scan converting the same curve which has been approximated with lines (see below for an analysis), and the same goes for curved surfaces and polygon meshes.

  • It eliminates the need for recursive subdivision, flatess tests, level of detail handling and crack removal, and other potentially complex codes, all of which might be necessary for good performance and reasonable accuracy with a large number of curved surfaces.

  • Order 3 rational curves can represent exactly any quadric or conic surface. Important for modelling common objects like spheres, ellipsoids, cones, cylinders, as well as objects less common but important to some applications like molecular modelling and astronomy.

  • Order 4 rational curves can represent NURBS and other bicubic surfaces, useful for approximating general curved surfaces and many natural objects (eg. the human body).

  • Control points could be represented as just another sort of vertex, subject to the same projective transforms as other vertices. Correct scan conversion is possible in this case if the curves are rational.

  • If a curves and surfaces with the convex hull property were chosen for the internel representation, trivial accept/reject clipping would be easy, and would consist of accepting/rejecting the convex hull of the control points. Nontrivial clipping could be accomplished analytically using parametric subdivision, or it could be carried out after rasterization using simple scissor tests.

I will show direct rasterization of a parametric cubic curve in 2 space using forward differences is frequently as efficient as scan converting the large number of lines needed for a good approximation. Result should generalize for non rational bicubic surfaces in 4 space which with the division by w that is carried out by OpenGL are equivalent to rational bicubic surfaces in 3 space.

Here’s some numbers:

Scan converting a line using the efficient midpoint algorithm will need 4 integer subtractions and 3 integer multiplies by 2 (shifts?) for setup per line, and a further 2 comparisions (integer subtractions), 1 integer addition and 1 or 2 increments of an integer by 1 per fragment that intersects the line (Foley, et al, 1997).

If we assume that on average each curve is approximated by 8 line segments and that each line covers on average 8 fragments the cost of scan converting the approximated curve will be 32 integer subtractions and 24 integer multiplies by 2 for setup and a further 192 integer subtractions/additions and on average 96 increments of an integer by 1. This gives an average total of 224 integer additions/subtractions, 24 integer multiplies and 96 increments of an integer by 1 for scan converting the lines.

Before the lines are scan converted however, we must subdivide the curve to give the line segments. We can use the forward difference subdivision algorithm from Foley, et al. I will ignore the per curve setup cost of this algorithm because it is small (a few adds and multiplies), and because I will use the same algorithm for direct scan conversion, it is constant for both approaches. The per point cost of this algorithm is given as 9 additions for the case of a curve in 3 space, which reduces to 6 integer additions for our curve in 2 space, for a grand total of 230 integer additions/subtractions, 24 integer multiplies by 2 and 96 increments of an integer by 1, plus a small setup cost which it shares with the direct scan conversion method.

And the approximation is far from perfect. Even if we assume that there are 4 fragments per pixel (unlikely?), the curve will still be seen as being made up of small lines. Even if there are 8 fragments per pixel there will still be sampling errors, because the sample rate is lower than the Nyquist frequency (Foley, et al, 1997).

The same curve can be directly scan converted using the same curve subdivion algorithm if instead of line segments we draw fragments. The cost per curve is then equal to the number of fragments intersected the curve (approximately 8*8=64) times number of operations per fragment (6 integer additions, as above) which is 384 integer additions, plus the small constant setup, comparible in complexity to scan converting the linear approximation, but the curve is drawn exactly and is completely smooth!

for the underlying representation I would suggest a curve with a uniform basis as well as the convex hull property, for ease of computation and clipping. As suggested in Foley et al, curves and surfaces with a non uniform basis like NURBS could be subdivided into Bezier curves using the Bohm or Oslo algorithms and then the control points multiplied by a basis conversion matrix before being handed to the implementation to display. In OpenGL the most appropriate place for this conversion of forms would be the utility library.

Bibliography:

James D. Foley, Andries van Dam, Steven K. Feiner, John F Hughes, 1997, “Computer Graphics: Principles and Practice”, 2ed (in C), Addison Wesley Publishing Company.

I do agree !

I don’t.

Before the lines are scan converted however, we must subdivide the curve to give the line segments.

The time you as a user spend setting something up is irrelevant; it’s on the CPU, so it happens concurrently with the rendering of other stuff.

comparible in complexity to scan converting the linear approximation

So, how does this prove that curved surfaces are reasonable to do this way? You may have demonstrated it for lines (but you haven’t shown why it’s worthwhile to waste the transistors to make it work), but surfaces are another matter altogehter.

Most importantly, you haven’t shown how such a feature would alter basic shader logic. Linear interpolation of parameters across a bezier surface is not reasonable. So at the very least, you’re going to have to interpolate each varying in a different way.

Also, since you’re dealing with an “exact” surface, you’ll want the “exact” normals for it, which will require some computation that the shaders normally wouldn’t have access to.

So no, this idea is not very well thought out or justified.

hm,

gfx adapters are fast, today systems can even render pixel-sized triangles filling the whole screen at a high fps.

due the fact curved surfaces are not planar, u loose the backface culling feature, which makes them 50% slower.
and u loose a lot of render performance with algorithms that use early z-culling, since u cant partially draw ur “super”-primitive and then draw another and return to ur old. -> lot of wasted fragmentshader cycles.

i would rather spent the transistors pushing the max triangle per second limit instead of implementing exact curves rendering,
cause i have a lot more complex objects, that could not be expressed trough curves but much easier and efficient trough many triangles.
u should look in current graphics applications and notice that primitives (or at least through bumpmapping) approach pixel size while not approximating a curved surface,

this is because nature is not that much curved (ahm, except girls :stuck_out_tongue: ) that u need a special implementation for it

Isn’t the tessellation unit in next-generation hardware (DX11) supposed to do something like this?

If they really do create this “tessellation unit,” I am curious to see how it would interact with the other stages in the pipeline: the vertex shader, geometry shader, fragment shader, etc… it seems like it would be better if they were to improve upon the geometry shader instead of adding another stage, but I’m not sure. :frowning:

I don’t think the tesselation unit has a future (ATI truform anyone ?).
Am I wrong in thinking that one of the uses of the geometry shader can be curve tesselation ?

AFAIK what can be accomplished with the GS is quite limited in regards to subdivision (I think most of them are related to performance problems with the feedback to VS). There are some hints at what is being considered at for DX11 at:
http://download.microsoft.com/download/e…f%20DirectX.zip

Correct me if I’m wrong, but wouldn’t the tessellator have to feed back into GPU RAM between each pass, using much the same mechanism transform feedback does now?

The way I understand subdivision with GS is, the output from the GS can be stored in some (vertex) buffer and fed back into the pipeline in another pass. The first subdivision pass would not actually render anything. This feedback mechanism is necessary to overcome the current limitations on the number of vertices (which seems to be a hardware limitation). Later passes would determine when enough-is-enough and let the vertices proceed to clipping, rasterizing etc. In any case, the product of subdivision can be made available to an application (using the transform_feedback extension). The tessellator unit seems to run in a single pass before the input assembler and it looks like there’s no need for a separate feedback mechanism (other than the existing). No idea what the hardware limits will be like and how it will interact with the pre/post vertex cache. I guess if the TU is being introduced to reduce the storage/bandwidth required for rendering detailed meshes by performing subdivision in hardware, it wouldn’t make much sense having to store and submit temporary results on the client side?!

The feedback mechanism has no choice but to output to a VBO.
The tesselator will likely run on the GPU and use some internal cache memory.

I don’t understand why they will add the tesselator. Perhaps a pdf that explains it. Well, I guess it is not out yet so no pdf yet?

it wouldn’t make much sense having to store and submit temporary results on the client side?!

Transform feedback outputs into a VBO, and so the client side is not sent anything. But, as V-man suggested, the tessellator unit would probably use the faster shared memory instead. Hopefully transform feedback is improved enough to where it can replace this specific “tessellator” pipeline stage.

[quote=HexCat]

Transform feedback outputs into a VBO, and so the client side is not sent anything.

I don’t know why I assumed the driver had to keep a local copy :slight_smile:

Those slides contain some information about how future tesselation hw will work.

http://developer.download.nvidia.com/presentations/2008/GDC/Inst_Tess_Compatible.pdf

Interesting, thanks for the link.

The pipeline description in the slides posted by ScottManDeath looks different than the one I posted a link to. It seems a bit odd to place the evaluation shader after the vertex shader doesn’t it? E.g. the view space transformation usually done in the vertex shader, has to be postponed to the evaluation shader. Is the VS idle in this case? I guess we’ll know in time, exactly how things fit together…

So they are using a coarse mesh and a displacement map as input for the evaluator. Meaning the “real” evaluation of the underlying mathematical representation of the surface is left to some preprocessing on CPU - when generating the displacement map.

Did I get that right?

CatDog

If the view space transform is done to the control points in the VS, then the tessellator is working directly in screen space, so the triangles it generates dont need any further transformation.
The screen depth is needed in order to calculate distance based level of detail, and the transformed normal is needed to find silhouette edges, so you would have no way to control the tessellation LOD if it was done before the VS.

The evaluator uses the patch as control points to generate a tesselated representation of a curved surface.
The displacement map just adds fine detail, if you just want a curved surface then you dont need to have one.

But an extra two shaders? Just so you can have hardware to subdivide a bezier curve? This seems overly complicated to me.
The geometry shader can already subdivide complex curves using triangles with adjacency so there is no need for a separate stage.
All of the problems mentioned as reasons not to use the GS are easier to fix by adding to the existing GS rather than creating three new stages that just duplicate what the GS is supposed to be able to do already.

The concept of a ‘patch’ to represent a NURBS surface also seems to complicate model generation.
I usually make objects from primatives like cylinders and spheres that form a continuous surface that i add detail to without ever having an edge like a patch has.
I would have to break this into patches for rendering, but this would not be compatible with my animation software which assumes a continous mesh, and which could transform a patch into a shape that the tessellator would not work with.

A continous triangular mesh with normals to specify the curvature is much simpler to work with and would actually require less buffering of intermediate results on the GPU as you are feeding a smaller area of the model into the pipeline at a time, so it would be generating fewer triangles.

What i would like to see is no limit at all on the number of triangles that a GS can emit directly to the pipeline.
If memory space on the GPU is insufficient to cache all of the generated triangles then the GS program could simply be paused at the emit instruction and its state saved until the cached triangles have been processed, then it can be resumed.

I can understand the evaluation shader and also the patch shader, but less so, i guess it’s there to provide optimal flexibility, but in all practicality my guess is that people will just cut n paste a standard shader there.
Sure the GS can handle tessellation, but the GS was never built to handle this amount of tessellation, also it’s not neighbor aware as the patch shader and the tessellator needs to be.
The GS has also other things to do, so it’s better if the GS can stay the same no matter if it’s tesslated or not.
Really if your crass about it we would only need two shaders ever, vertex and fragment, these two could theoretically fill the need of all the vertex, geometry tesslation, fragment, blend, texture shader and so on.

But dividing things in lot’s of pipeline stages does make sense as it allows the GPU to better throttle rendering by having small fast shaders output stuff to the buffer and then exit leaving ample room for the fragment shader to work in, the GS would probably hog a few processors and never release them, reduce redering buffers(the ones used while rendering to store temporary vertex and fragment data) and also to make things easier for the programmers, i mean did you ever think that there would ever be a specific tesslation cicuit in the GPU, no it’s all software, or at least will be all software in the end.

I look at these stages as the fist in a new flexible pipeline with multiple stages to pick and choose from, all run in software.

Primitives can easily be broken up into patches, all shapes can, it’s just a matter of how smart the converter is, besides in the example from the pdf from GDC they convert polygons to patches.
Besides in the end it’s all just triangles and little else.