Native Bezier curve & surface patch rasterization

My wish/suggestion for OpenGL 5.0 is to enhance the rasterizer to also natively rasterize rational cubic Bezier curve patches and rational bicubic Bezier surface patches.

Please read OpenGL 4.0 tessellation and then let us know what stops you from implementing it.

Please read OpenGL 4.0 tessellation and then let us know what stops you from implementing it.

To be fair, he asked for direct rasterization of patches. Turning them into triangles isn’t rasterizing them.

That being said, I’m generally of the opinion that if triangles are good enough for Pixar, they’re good enough for everyone else too.

I guess this feature request is more about hardware changes then OpenGL. The rasterization is still one of fixed function blocks.

This feature request is about both hardware changes and OpenGL changes (though more about the hardware). That’s why I suggested it for OpenGL 5.0 rather than OpenGL 4.2 . The OpenGL ARB members communicate with the hardware folks; neither works in isolation.

Triangles may have been good enough for Pixar, but that doesn’t mean that there isn’t something better. Clearly, natively curved surfaces are better for modeling curved surfaces than triangles are, and for many reasons. If you want to use Pixar as the standard, then you could also say that if several hours of CPU time spent rendering each frame is good enough for Pixar, then several hours spent rendering each frame is good enough for everyone else, too. And if needing a render farm of thousands of computers and terabytes of memory is good enough for Pixar, then needing a render farm of thousands of computers and terabytes of memory is good enough for everyone else, too.

Triangles have been the basis of computer graphics since before the integrated circuit was invented, back when CPU clock rates were measured not in gighertz, nor megahertz, but in kilohertz.

Today, single GPUs have as many as 3 BILLION transistors operating at multi-GIGAHERTZ clock rates. Why remain shackled to the limitations of the distant infancy of computer graphics, when only triangles were feasible?

mfort, I have more than read about OpenGL 4.0 tessellation, I have been programming it for over two months. It is GREAT! It is a true advance for the OpenGL pipeline. But it still results in making significantly sized triangles and from that point forward the hacks always used with triangles to make them appear to be curved, like interpolating a curved surface’s normals between a planar triangle’s vertices, still have to be used.

Each generation of integrated circuit technology (about 18 months) roughly doubles the number of transistors available, so Nvidia and AMD need to come up with something to do in their GPUs with 3 BILLION more new transistors. They ought to be able to come up with some dramatic advances to GPUs with that kind of transistor budget.

If you want to use Pixar as the standard, then you could also say that if several hours of CPU time spent rendering each frame is good enough for Pixar, then several hours spent rendering each frame is good enough for everyone else, too. And if needing a render farm of thousands of computers and terabytes of memory is good enough for Pixar, then needing a render farm of thousands of computers and terabytes of memory is good enough for everyone else, too.

By this statement, you’re essentially suggesting that Pixar uses triangles, not because they’re demonstrably faster at rasterization than any possible bezier surface rasterization, but because they don’t know any better. As you point out, they spend hours rendering a single frame. If there were techniques available to speed this up, why wouldn’t they make them?

I’m afraid I’m going to need to see some evidence before I take your word over a multiple-award-winning pioneer in computer graphics technology, who’s technology powers most of the movie industry’s special effects, and who has a CG research budget in the millions focused on finding techniques to make their stuff go faster.

Today, single GPUs have as many as 3 BILLION transistors operating at multi-GIGAHERTZ clock rates. Why remain shackled to the limitations of the distant infancy of computer graphics, when only triangles were feasible?

Because, even for “3 BILLION transistors,” directly rasterizing a Bezier surface is slower than tessellation and triangle rendering. The algorithm itself is fundamentally slower. It would take up precious transistors on their cards, which could have been used for more shaders, all for a feature most people wouldn’t use because tessellation and triangles is faster.

Also, I would point out that uniform, rational cubic surfaces (the ones you asked for direct tessellation of) are not exactly the most user-friendly modelling tool. Modelers would prefer NURBS or subdivision surfaces.

Lastly, GPUs generally do not have multi-Gigahertz clock speeds. Indeed, most of AMD’s GPUs don’t even crack 1 GHz.

They ought to be able to come up with some dramatic advances to GPUs with that kind of transistor budget.

Can you back this statement up with any particular facts, or is it just your general impression that there are a lot of transistors, so they should be doing things that you think are significant?

FYI:

Take a look for GL_NV_draw_path… for Tegra… so we are talking much less capable hardware… it is drawing paths (so not surfaces) but you get the idea…

Underneath, chances it uses both shaders and triangulation to get the job done.

One thing to keep in mind, or atleast I keep in mind: since lighting can be done per pixel (and usually is on desktop nowadays), the real issues are silhouette and texture mapping when one does not want to use “just triangles”.

As for my thoughts: I can so see the temptation for say “draw bezier curves”… I can also see how it might be “faster” for the hardware to rasterize that in stead of a bunch of triangles… but there are stickies:

  1. make the rasteriztion per pixel accurate?
  2. interaction with the rest of the pipeline? The GL_NV_draw_path extension does NOT allow for vertex shaders, and the fragment shader for it is something different
  3. “interpolation” of values across the bezier primitive
  4. clipping

To be honest, in GL4 hardware, I am not too convinced of the potential wonderfulness of having dedicated beans to draw bezier surfaces over using GL4’s tessellation API. There is a case for per-pixel perfection of the surface (this is for the silhouette) but given all the non-trivial things listed above, I am wondering how it would integrate to the GL4 pipeline… GL_NV_draw_path is for GLES2 and it cuts out the vertex shader entirely and is for really 2D things.

Here we go again, Alfonse…

By this statement, you’re essentially suggesting that Pixar uses triangles, not because they’re demonstrably faster at rasterization than any possible bezier surface rasterization, but because they don’t know any better. As you point out, they spend hours rendering a single frame. If there were techniques available to speed this up, why wouldn’t they make them?
I suggested nothing of the kind.

There are techniques to speed up the hours that it takes them to render each frame. Part of the reason it takes them so long is that they use software renderers, not something like OpenGL on GPUs. To speed their renders up, all they need to do is invest years and billions of dollars to implement their renderers in hardware. But after investing those years and billions, they would no longer be able to modify, enhance, or update their rendering algorithms, without investing years and billions more to do it all over again. So, for them, they have decided it is better to leave their renderers in software running on computers and tolerate the hours it takes them to render each frame.

To try to keep this discussion from getting sidetracked (again), I’ll refresh your memory that you brought up Pixar in this discussion as the standard bearer, not me. I merely was trying to show the logical and necessary conclusions of making Pixar the standard by which OpenGL and real-time computer graphics hardware should be judged.

If you really want to go down that road, Pixar uses ray traced rendering algorithms. Show me where that feature in OpenGL is.

I’m afraid I’m going to need to see some evidence before I take your word over a multiple-award-winning pioneer in computer graphics technology, who’s technology powers most of the movie industry’s special effects, and who has a CG research budget in the millions focused on finding techniques to make their stuff go faster.
You want evidence from me that curved surfaces are better representations of curved surfaces than triangles are? I know you’ve got a lot of experience programming OpenGL, so I can’t believe that you are actually blind. Computer graphics is an odd field for a blind man to get into. Come on already. It shouldn’t take a genius to recognize that a curve better represents a curve than a straight line does.

And how do you know that Pixar’s “research budget is in the millions focused on finding techniques to make their stuff go faster?” I seriously doubt that statement.

Because, even for “3 BILLION transistors,” directly rasterizing a Bezier surface is slower than tessellation and triangle rendering. The algorithm itself is fundamentally slower. It would take up precious transistors on their cards, which could have been used for more shaders, all for a feature most people wouldn’t use because tessellation and triangles is faster.
Well, I really don’t care how it is implemented, just so that it occurs at the rasterization step of the pipeline. It might very well be done by tessellation, but at pixel resolutions rather than at gross triangle resolutions.

When you have a budget of six billion transistors, and sell products made with the integrated circuits containing them for a hundred dollars or so, I think the word “precious” is a bit of a stretch. Here, let me do the math for you:

$100 / 6,000,000,000 transistors = $0.000000017 / transistor

If all Nvidia and AMD can think to do with 3 BILLION more new transistors is to add more shaders, then I don’t think they’ll be selling many more new GPUs. Nearly all consumers are perfectly content with something less than the top-of-the-line GPUs, and the principle differentiating characteristic between the top-of-the-line GPUs and their lower cost same-generation cohorts is the number of shaders.

Besides, if just adding more shaders and memory bandwidth is the only goal, then there’s little need to design new GPUs, just design new graphics boards with multiple GPUs on them. That would save Nvidia and AMD hundreds of millions of dollars of GPU design costs.

I’m afraid I’m going to need to see some evidence before I take your word that most people won’t use a feature if there are faster but lower quality alternatives. Here’s my thinking: many people seem to like quality graphics more than they like fast graphics. If they didn’t, most people wouldn’t use things like 3D, shadowing, multisample antialiasing, high resolutions, textures, per pixel shading, really, just about anything that makes computer graphics interesting. I mean, there’s nothing faster than flat shading a really tiny number of lines in 2D. If people are really more interested in speed rather than quality, then we’d all still be playing Asteroids or PacMan today but with frame rates in the multi tens of thousands per second rather than tolerating the measly frame rates of current visually stunning 3D games.

Also, I would point out that uniform, rational cubic surfaces (the ones you asked for direct tessellation of) are not exactly the most user-friendly modelling tool. Modelers would prefer NURBS or subdivision surfaces.
Really? I’m afraid I’m going to need to see evidence of that rather than just take your word for it. How many modelers are using subdivision surfaces? NURBS are just a compressed form of a sequence of rational Beziers. They can easily be exactly converted to rational Beziers. In any case rational Beziers are far easier and far faster to evaluate and far simpler to represent than either subdivision surfaces or NURBS surfaces. In other words, if modelers want to use NURBS, let them. It’s easy enough to directly convert them to the equivalent rational Beziers on the CPU before sending them to the GPU. It doesn’t make sense to do a one-time conversion in hardware that can very efficiently be handled in software. Think of all your “precious” transistors, man!

Lastly, GPUs generally do not have multi-Gigahertz clock speeds. Indeed, most of AMD’s GPUs don’t even crack 1 GHz.
Perhaps that’s so. I was comparing current GPUs to the kilohertz clock rates of computers in use during the infancy of computer graphics when triangles became the standard. So, the factor of improvement is only on the order of 1,000,000 rather than 2,000,000. My bad.

Can you back this statement up with any particular facts, or is it just your general impression that there are a lot of transistors, so they should be doing things that you think are significant?
I think it’s pretty obvious. Look what GPUs can do today with 3 billion transistors. I think it’s pretty amazing. Maybe you disagree. So, you want me to prove that something significant can be done with 3 billion more transistors? Not that the best we can hope for is just more shaders that all do exactly the same things that today’s current shaders can do? Rather than take your word that nothing better can be done to improve the quality of real-time computer graphics, and all that can be or should be achieved is only doing what’s already possible today, but faster, I’m going to put my faith in the continued ability of Nvidia’s and AMD’s engineers to make more advances, and for the many computer graphics researchers working around the world to continue making more advances.

Lastly, I have to wonder what on earth you’re doing on this forum? This forum is titled “Suggestions for the next release of OpenGL.” Apparently the only suggestion that interests you is: make it faster. Consider your suggestion made. Now, please let me make mine.

If you really want to go down that road, Pixar uses ray traced rendering algorithms. Show me where that feature in OpenGL is.

I never said that Pixar and OpenGL should be the same. I said if something wasn’t good enough for Pixar, then you need to justify why it’s important enough to be in OpenGL. That if Pixar is deliberately not using this when they could be, you need to provide justification why OpenGL needs to when Photorealistic Renderman easily trounces the output of any OpenGL program.

Also, Photorealistic Renderman only uses ray tracing where necessary. Fundamentally, it is a REYES-based triangle rasterizer.

You want evidence from me that curved surfaces are better representations of curved surfaces than triangles are?

On the one hand, you have direct bezier rasterization. For a given set of spline patches, at a given resolution, you get a particular image. Let’s call this case A.

For a given angle of view and for a given resolution, you can tessellate this into pixel-sized triangles. This would give you the exact same image as direct bezier rasterization. Let’s call this case B.

Cases A and B produce the same image. What you are suggesting is that the transistors necessary to make case A possible cannot be used to make case B run faster than case A runs. Because if case B runs faster than case A, there’s no need for direct spline rasterization at all. The tessellator and triangle rasterizer would be fast enough to generate and consume pixel-sized triangles, thus giving equivalent image quality.

For a given resolution, of course.

What I want is evidence that the transistors necessary to make case A work and run fast cannot be used to make case B run faster than case A.

Furthermore, I want evidence that the visual gain from modelling curves exactly is worth the performance loss. Remember: you can only see the triangular approximate at silhouette edges. And for a decent triangular approximation, you can only see those artifiacts when the camera is very close to the object. This is something that proper use of tessellation could fix.

Besides, if just adding more shaders and memory bandwidth is the only goal, then there’s little need to design new GPUs, just design new graphics boards with multiple GPUs on them. That would save Nvidia and AMD hundreds of millions of dollars of GPU design costs.

A gross oversimplification of reality. Multiple GPUs don’t scale that much better than multiple CPUs. Also, graphics boards are power-limited: the PCIe specification maxes out at 300 Watts. So you can’t just stuff more GPUs onto a board and expect them to get better.

Plus, new architectures can increase performance per unit area. This decreases the overall performance per unit cost of the GPU. Lastly, performance per watt matters. Moreso today than any time in the past. Multiple GPUs are generally not particularly good at delivering performance per watt.

So even in the world where GPUs no longer get new features, there would still need to be R&D. It’s not like CPUs have been getting lots of new features.

Most important of all, I never said that IHVs should just be focused on making cards faster. But I don’t buy that the tradeoffs for spline rasterization are there. Your argument is based solely on your belief that 3 billion transistors are a lot (by some measure), and that spline rasterization wouldn’t take very much of this.

Really? I’m afraid I’m going to need to see evidence of that rather than just take your word for it. How many modelers are using subdivision surfaces?

The modelers at Pixar pioneered using subdivision surfaces for modeling.

Lastly, I have to wonder what on earth you’re doing on this forum? This forum is titled “Suggestions for the next release of OpenGL.” Apparently the only suggestion that interests you is: make it faster. Consider your suggestion made. Now, please let me make mine.

I was offering commentary on your suggestion.

Most suggestions in this forum are poorly thought out “I want X,” given without even the slightest thought to the consequences of implementing X.

Yes, it’d be great if beziers could be directly rasterized without impacting performance to a degree that would make the visual quality gains worthless. And wouldn’t it be great if none of us had to go to the bathroom or could survive without having to stop to eat and sleep?

We live in the real world. Wishing for stuff does not make it happen. Wishing for stuff that can’t happen is simply being unrealistic and wasting the reader’s time, further diluting any of the actually well-thought out suggestions on this forum. And believing that the stuff in question can happen simply because you have have the notion that 3 billion transistors is sufficient to implement this at the required performance (despite not demonstrating any knowledge of how number of transistors correlate to hardware features) is simply founding this unrealistic expectation in ignorance.

What I want is evidence that the transistors necessary to make case A [direct Bezier rasterization] work and run fast cannot be used to make case B [tessellate into pixel-sized triangles] run faster than case A.
You’re right, I don’t know how many transistors it takes to implement case A or case B, or whether for a given number of triangles case B will always be faster than case A. I don’t even know if there’s been suitable research into direct rasterization of Beziers. As long as it’s not possible to implement case A so that it performs faster than case B, then implementing case B is preferable. However, case B has not yet been implemented, either… at least not in OpenGL 4 hardware. There is a limited number of divisions that the tessellation primitive generator can make (currently, 64). Furthermore, those 64 divisions are across parametric space, not window space, so if a curve is larger than the viewport, there will be fewer than 64 divisions across the entire viewport.

A gross oversimplification of reality. Multiple GPUs don’t scale that much better than multiple CPUs. Also, graphics boards are power-limited: the PCIe specification maxes out at 300 Watts. So you can’t just stuff more GPUs onto a board and expect them to get better.
Not much to argue with there. I’ll point out, though, that many power hungry graphics boards have external power supply connections, direct from the power supply rather than through the PCIe bus (or in addition to the PCIe bus), to get around the PCIe bus power limitations.

So even in the world where GPUs no longer get new features, there would still need to be R&D. It’s not like CPUs have been getting lots of new features.
CPUs aren’t getting many new features anymore because nobody can think of many new features they need. But until GPUs can render photorealistically at real time frame rates, there’ll still be more features to add.

I don’t buy that the tradeoffs for spline rasterization are there. Your argument is based solely on your belief that 3 billion transistors are a lot (by some measure), and that spline rasterization wouldn’t take very much of this.
Actually, my argument is based on the belief that it would be a beneficial feature to add direct rasterization of curved surfaces. However, you are correct that I regard 3 billion transistors as a lot, and by any measure. I never hinted at how many transistors it might take to implement direct rasterization of Bezier surfaces. I have no idea how many it would take. Do you? I’ll leave that to the engineers at AMD and Nvidia to determine. Obviously, my suggestion only makes sense if the number of transistors required to implement it is no more than a reasonable fraction of the 3 billion new transistors that can be expected to become available with the next generation of integrated circuit technology.

I was offering commentary on your suggestion.
And then some.

Most suggestions in this forum are poorly thought out “I want X,” given without even the slightest thought to the consequences of implementing X.
I guess that’s the point of having a public forum. I don’t have a problem if people make suggestions, even if they don’t understand all the consequences. I don’t think there should be a requirement that only integrated circuit designers with extensive knowledge of the internals of OpenGL be allowed to make suggestions here. Those folks probably already work for Nvidia or AMD already, anyway, so they’ve got the ear of their bosses.

Yes, it’d be great if beziers could be directly rasterized without impacting performance to a degree that would make the visual quality gains worthless. And wouldn’t it be great if none of us had to go to the bathroom or could survive without having to stop to eat and sleep?
Alfonse, are you an integrated circuit designer? Do you know how many transistors it takes to implement the direct rasterization of bicubic Beziers? Because, it seems your critique of my suggestion really comes down to your belief that it takes too many transistors to implement it.

We live in the real world. Wishing for stuff does not make it happen. Wishing for stuff that can’t happen is simply being unrealistic and wasting the reader’s time, further diluting any of the actually well-thought out suggestions on this forum. And believing that the stuff in question can happen simply because you have have the notion that 3 billion transistors is sufficient to implement this at the required performance (despite not demonstrating any knowledge of how number of transistors correlate to hardware features) is simply founding this unrealistic expectation in ignorance.
That’s just more commentary on my suggestion? It seems a little more like commentary on me. This is a forum for suggestions. If you feel mine is wasting your time, I have another suggestion: don’t read it. But, thanks for the warm welcome.

Regarding transistors, I am a little confused. At some points, you referred to them as precious, and at other points you indicated that 3,000,000,000 of them is not many. (I have the silly notion that 3,000,000,000 transistors on a single integrated circuit is a lot. What an ignorant, unrealistic fool I must be.) I think it’s interesting to note that the first microprocessor, the Intel 4004, had just 2,300 transistors. That’s less than one millionth of just the additional new transistors that will be available with the next generation of integrated circuits.

I hope Nvidia’s and AMD’s engineers have more imagination and vision than you.

kRogue, I googled GL_NV_draw_path a bit and didn’t get many hits, and I didn’t find anything that gave me a solid idea of its details. Probably Nvidia has better info on their website, I should go search around there.

For me, I think the expected performance improvement and ease of use are the main benefits. In OpenGL 4, if you tessellate to the maximum level (64 divisions) the silhouette and texture mapping can actually be pretty good for bicubic Beziers, and if the maximum number of tessellation levels were increased to, say 128 divisions, it would be even better. But, at that level, that would be 32768 triangles per quad patch. With per pixel shading, the shading quality depends very much on the tessellation levels used. (For quad patches, there are six tessellation levels to set, and they need to be carefully balanced between each other, but also in a way that matches the adjoining patches so there will be no gaps between patches.)

What I’ve found to be the hardest thing to program really effectively is the tessellation control shader (for setting the tessellation levels). The simple solution is to always tessellate at the maximum (finest) level, but that causes a huge and unnecessary performance hit if there are lots of small (in screen size) patches. Equally simple is to always use a fixed smaller tessellation level. But that causes large (in screen size) patches to have the obvious quality problems. Getting a smart, effective tessellation control shader to always pick the ideal tessellation levels is really pretty tricky.

Re: your thoughts:

  1. I think the rasterization could be made per pixel accurate, but that may be a hard problem to solve.

  2. I think the interaction with the rest of the pipeline could be very similar to the way the GL_PATCH primitive used with the tessellation shaders is handled in OpenGL 4. The vertex shader already is capable of processing points in GL_PATCH primitives.

The tessellation shaders, if present, would consume the native Bezier patch just as they currently consume all GL_PATCH primitives. Only if there is no tessellation shader stage active would the Bezier patch go through to the geometry shader.

The geometry shader could be extended for native Bezier patches in the same way that it already deals with other primitives: the patch control points could be referenced as elements of the gl_in[].gl_Position array.

The fragment shader would never see Bezier patches because they’d be consumed in the rasterization step. However, there would probably be need for some new builtin GLSL fragment shader variables, to provide certain information such as the surface normal and parametric location of the surface (or curve) fragment.

  1. I think the interpolation of values across the Bezier primitive only needs to be done at the end of the rasterization step, and that could be accomplished by interpolating between variables associated with the corners of each patch, using the parametric location of the surface fragment to perform the interpolation.

  2. Clipping could probably be implemented in a couple stages. First stage, cull the Bezier’s convex hull against the clip volume. Second stage, just discard fragments outside the clip volume rather than find the exact intersection (there could be several) of the surface analytically with the clip volume.

But, I see your concerns. I think your first and fourth points are especially sticky.

I have mixed feelings about this suggested feature. On the one hand, if Bezier surfaces are natively rasterized, then the geometry shader can’t really be used to do very much interesting stuff with the patches. So, if you wanted to do fine scale deformation to a Bezier surface, for example, that wouldn’t be possible with a Bezier rasterizer. Then again, how often do you need fine scale deformation of a Bezier surface? In those cases, one would just use the standard tessellation stages to take over the processing from the Bezier rasterizer.

Thanks for your thoughts about this. They’ve provided good insights.

Take a look here for the beans on GL_NV_draw_path:

http://developer.download.nvidia.com/tegra/docs/tegra_gles2_development.pdf (PDF warning)

The extensions looks like it has not been submitted.
Note that this extension is for drawing paths, the extension only allows up to quadratic parts of a path.

As for getting bezier curves to be per pixel accurate: the challenges are the following:

  1. projection, I am not convinced that projecting the control points is going to do the trick.
  2. Lets suppose that one could project “something” so that one gets some kind of description of the surface to rasterize, then one needs a rasterization algorithm for something that is not a triangle… there is a bit on rendering paths with up to degree 3 bits on it in GPU Gems 3 http://developer.nvidia.com/GPUGems3/gpugems3_ch25.html

it uses a fragment shader to discard bits not in the region… the way it works is that it makes 2 kinds of triangles:
a) triangles completely within the path
b) triangles which poke out of the path

For triangles of a) just usual rasterization, for triangles of b), it creates a description of the path of the form: { (x,y) | P(x,y) < 0 } where P is a polynomial (of degree 2 or 3), and it discards if the inequality is not satisfied. Handling degree 3 is a royal pain, where as degree 2 is much easier.

The main hurdle for getting “pixel perfect” rasterization of bezier curves is figuring out how to project, to decide if a pixel is within the surface after projection and finally a “fast” way to walk those pixels, right now I do not see a good way to project bezier surface parameters onto the screen so that one can easily figure out what pixels to rasterize… my knee jerk thought is to try top adapt the ideas from the GPU gems article, by computing “something” to project that will in turn create a path…but I am thinking it won’t work…

edit:

I’ve thought some more on this… at this point I am convinced that the idea of taking the bezier surface parameters and then projecting “something” on to the screen and then filling pixels will not work. The reason why is that a bezier surface can have, for lack of a better word, a self silhouette, i.e. imagine a bump on a surface, the the bump creates a silhouette which on the screen is at the same pixels as other bits of the surface…

Though that is just following the lines of “Bezier surface parameters” –> “project something onto screen” –> “rasterize”.

…I tell you what. Some of the suggestions on this forum (like some of mine) may not have a thought about how they are to be implmented… but they are a riveting read.

Funniest thing I’ve read in years! LOL :smiley:

(who says computers are boring?)

Thanks for the links, kRogue. Interesting stuff. They are strictly 2D, though, so I think there will be problems generalizing them to 3D. But then again, some of the techniques might be helpful anyway.

I did a little googling and as it turns out, there already has been some research into direct rasterization of rational bicubic Bezier surfaces. Here’s a pretty good example:
http://www.google.com/url?sa=t&source=web&cd=8&ved=0CD8QFjAH&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.84.4946%26rep%3Drep1%26type%3Dpdf&rct=j&q=direct%20Bezier%20rasterization&ei=b7usTKHPBszFnAeo15nhDA&usg=AFQjCNE_XY-a3ddAEeZUpJ0q_a-aW9Ibfg&sig2=Q4AHWGCE6PpcRTJYaVkT2g&cad=rja
(The link directly asks you to download a pdf file rather than opening it into a browser window.)

The research I’ve reviewed take a ray tracing approach.

I’ve thought another approach, that exploits localized coherency, might be better. I’ll need to play around with it for awhile to see if I can get something to work, and I’ve got some ideas about how to implement this. But here’s the gist of my idea. Given the (U,V) parametric coordinates of a surface at a point (a point in the center of a pixel fragment), the (U,V) parametric coordinates of the surface at an adjoining fragment will be very nearly the same. Furthermore, the change in U and V will be nearly constant from one pixel to the next, and to the next after that, so that predicting the (U,V) coordinates of a fragment next to an already known fragment’s (U,V) coordinates should be easily accomplished by a simple scaled addition to U, and another simple scaled addition to V. The error of each estimate can be tracked to provide a constantly updated estimate of the change in U and in V from one fragment to the next.

Here are some examples demonstrating the pixel to pixel coherence of U and V. First, an image of one rational bicubic surface patch (the coloration is not a texture map; each vertex of each triangle is colored in the geometry shader so the tessellation can be observed):
http://s961.photobucket.com/albums/ae98/…onTriangles.png

And here’s a pair of images of the same patch from the same view and the same projection representing the parametric U and V values in red and green, respectively:
http://s961.photobucket.com/albums/ae98/…sellationUV.png

It can be seen that the change in U and in V from one pixel to the next changes fairly uniformly, which makes the change in U and V predictable from one pixel to the next.