Double Precision

MarkoGL · February 3, 2011, 2:38pm

Hello World!

This is my first post. I have a question, which answere I could’t find anywhere on the net. I hope it’s not a stupid question and can get a useful answere.

For my master thesis I have to render primitives to the framebuffer and calculate their distances from each other.
I use a 64 bit Ubuntu OS with a NVIDIA GTX460 graphics card.
This card should support double precision floating point numbers.
So my question(s) is/are:
How do I use this feature with OpenGL??? I want to send double-precision primitive vertices to be rendered in the framebuffer. Does the framebuffer then also contains data in double-precision (depth-buffer, etc.) , because I want to operate on this data … or is the framebuffer limited to 32-bit precision. If the answere is “NO DOUBLE-PRECISION SUPPORT”, what possibilities do I have to use double-precision? It would be great if one can help me out?

Best regards,
Marko

dukey · February 3, 2011, 4:04pm

I know shaders can use doubles, but I am not sure there are any 64bit render targets. Double precision is pretty much over kill …

It’s only recently hardware supports doubles at all.

system · February 3, 2011, 7:34pm

In terms of 64 bit float (double)
http://www.opengl.org/registry/specs/ARB/gpu_shader_fp64.txt

and http://www.opengl.org/registry/specs/ARB/vertex_attrib_64bit.txt

So you can send 64 bit float vertices and normals and such.
You can send 64 bit float uniforms to the shader but I didn’t see anything in terms of outputing 64 bit floats.

MarkoGL · February 4, 2011, 2:14am

Thanks for the information and your answeres.

So, there is no possibility to use the 64-bit improvement of graphics cards without using the shading language …
what a pity!

That means, if I write a GLdouble vertex to be drawn on the screen, the precision will be cropped to 32-bit in the framebuffer (and also depth-buffer)? Can’t I use Buffer Objects in double precision?

I hope I’m not bothersome with this questions, but I think for scientific research it would be very important to use 64-bit precision at all.

Alfonse_Reinheart · February 4, 2011, 2:45am

So, there is no possibility to use the 64-bit improvement of graphics cards without using the shading language …
what a pity!

I’m not sure what kinds of computations you’re interested in doing that the fixed-function pipeline would be able to compute, but would also need 64-bits of precision. Aren’t shaders kind of a prerequisite for this sort of thing, just to put the algorithm together?

That means, if I write a GLdouble vertex to be drawn on the screen, the precision will be cropped to 32-bit in the framebuffer (and also depth-buffer)?

First, gl_Position is a vec4, not a dvec4. Therefore, it is 32-bits. So it isn’t cropped by the framebuffer; it’s cropped by the vertex shader upon output.

Second, I’m pretty sure that the maximum viewport size would restrict you from being able to render an image so large that the precision provided by a 32-bit float would be insufficient. At least in the X and Y.

Depth precision might be a problem, but there are ways around that.

Can’t I use Buffer Objects in double precision?

You could use transform feedback. On platforms that provide double support, feedback outputs can be double-precision.

But if that actually works for you, you should probably just consider using OpenCL. It’ll be a lot less hassle for you.

ZbuffeR · February 4, 2011, 5:40am

So, there is no possibility to use the 64-bit improvement of graphics cards without using the shading language …
what a pity!

Sorry but that made me laugh… “You mean with this xDSL line there is no telegram included ??”

Indeed, for GPU computing, OpenCL is better suited to the task.
Maybe OpenGL can work too, but can you describe roughly the kind of inputs, calculation, outputs you want to work with ? It will be easier to give you advices.

system · February 4, 2011, 5:52am

The depth buffer normally is 24 bit integer and 8 bit stencil integer. Also called D24S8. There is also D16 and D32.

MarkoGL · February 4, 2011, 11:38am

Hi, I will try to describe my problem once again.

I have, lets say, 100.000 primitives rendered. If I render them e.g. on a 512x512 buffer, it is impossible to see all of them. So I must zoom into the framebuffer and re-render regions. And what I have to do now, is to calculate their intersection points and output them as coordinate points. My problem is, that the primitive points are in double precision coordinates, and if I now calculate the distances in single-precision there will be errors. The problem is that they must be rendered to somewhere where I can use double precision floating points. So I don’t know now if there is a possibility to do so in OpenGL. I hope with this description you can help me?

mhagain · February 4, 2011, 12:20pm

There will always be errors with double precision too; they’ll just be smaller errors!

Really though, 3D graphics (GPGPU is a whole different story) are not mathematically precise. They’re a visual medium and their function is to put images on screen (optionally as fast as possible), so absolute mathematical precision was never a design goal. What 3D graphics are is consistent, so that errors, roundoffs, precision loss, etc are documented, accepted and reproducable within certain tolerances. Otherwise “if it looks good enough then it probably is good enough” is the way things are.

MarkoGL · February 4, 2011, 12:28pm

Ok, but why are then new generation graphics cards built to have double-precision registers, if they are useless for rendering? Is it just a marketing gag? I know if I use CUDA or OpenCL I get double-precision support for calculations, but the problem is how can I render to a “buffer” for using it with CUDA/OpenCL/… ?
I can fetch the rendered framebuffer, and transfere it to CUDA/…, but the data will then be only single-precision, or not?

Alfonse_Reinheart · February 4, 2011, 1:26pm

I have, lets say, 100.000 primitives rendered. If I render them e.g. on a 512x512 buffer, it is impossible to see all of them. So I must zoom into the framebuffer and re-render regions.

Most videogames do this just fine, rendering even more primitives without needing double-precision anywhere in the pipeline. Let alone rendering to double-precision framebuffers.

And what I have to do now, is to calculate their intersection points and output them as coordinate points.

What does that mean exactly? What do the color values you’re putting in the framebuffer mean?

I don’t understand what the data is that you’re trying to draw and read back.

My problem is, that the primitive points are in double precision coordinates, and if I now calculate the distances in single-precision there will be errors.

And? Floating-point error is primarily based on the order-of-magnitude difference between the numbers involved. If the numbers are not far from one another, relative to the number of mantissa and exponent bits, then any error will be negligible.

Most empirical measurements are well within single-precision floating-point error. So unless you’re dealing with very small numbers (most graphics hardware doesn’t handle denormalized floats. Though then again, any graphics hardware with double-precision will handle them just fine), the error from the calculations will likely be well below any error introduced by empirical measurements.

Ok, but why are then new generation graphics cards built to have double-precision registers, if they are useless for rendering? Is it just a marketing gag? I know if I use CUDA or OpenCL I get double-precision support for calculation

And you’ve just stated why there is double-precision math in shaders. Hardware makers added it specifically for GPGPU applications, which tend to use CUDA or OpenCL.

For graphics work, which is based on putting pictures on a screen, they’re not particularly useful.

but the problem is how can I render to a “buffer” for using it with CUDA/OpenCL/… ?
I can fetch the rendered framebuffer, and transfere it to CUDA/…, but the data will then be only single-precision, or not?

Perhaps you’re not aware that “buffer” and “framebuffer” are not the same thing. There are framebuffers and buffer objects. These don’t really have anything to do with one another.

If you just want to run OpenGL to do some calculations on some data and output some data in return, you can use transform feedback like I said.

MarkoGL · February 4, 2011, 2:25pm

Thank you for the answere. As I see, I can’t use the framebuffer.

What I need is a “buffer” where I can render triangles at double-precision depths. That means I need a double-precision depth-buffer???

Are there any OpenGL “objects”, “buffers”, etc. where I can render into … they don’t have to be visualized.

E.g. you have two triangles intersecting at a certain z-value. If I can use double-precision depth-buffers this would have solved my problem.

Alfonse_Reinheart · February 4, 2011, 2:53pm

What I need is a “buffer” where I can render triangles at double-precision depths. That means I need a double-precision depth-buffer???

No, it doesn’t. You think you need one, but thus far, you have not provided a reason why you think this is so besides the fact that you think it will cause “error.” Please clarify why you think that you need a double-precision depth buffer.

Are there any OpenGL “objects”, “buffers”, etc. where I can render into … they don’t have to be visualized.

Yes there are. None that are double precision mind you, but there are off-screen buffers you can render to.

MarkoGL · February 5, 2011, 12:59am

I need it because I want to improve an scientific algorithm for computation of special diagrams. The referenced work I use, was done 15 years ago, and the problem was the single-precision of depth buffers. They said an improvement would be the usage of 64-bit depth buffers. This is the reason, why I “bother” you on that. Is this possible to do with modern GPU’s or not? As i render triangles, the depth test automatically creates a depth-buffer. And on this created depht-buffer I have to calculate the z-distances of triangles in high precision. If this is not possible with modern cards, then I have a problem

Aleksandar · February 5, 2011, 2:58am

You have posted to few facts about the algorithm so we cannot give any particular advice.
I’ll give you some general advices according to knowing facts:

It was the time of first 3D graphics accelerators for the masses. Since then, the market has changed a lot, as well as technologies built in GPUs. It is likely that “your” algorithm can be significantly more efficient if it is implemented in totally different technology. If you need massive parallelism, try to use CUDA or OpenCL APIs. If it is a master thesis, it would require some research anyway. In fact, at least a whole chapter should be devoted to related work and

You could calculate distance analytically and gain precision even higher than double. There are a lot of papers about presenting and calculating extremely high precision without hardware direct support for that (DP calculations using SP, or QP using DP). The algorithm should be developed for the CPU first, and than transferred to GPU using CUDA or OpenCL. If you still want an easier approach (but less precise), you can use OpenGL for drawing the stuff, but locating where the intersections are and changing near and far clipping planes and depth calculation function (this would require using shaders). You should also keep track of all parameters used for depth calculation (and output them), in order to be able to reconstruct where in the world space those intersections are.

From depth buffer you can calculate distance from the viewer. Calculating distances from each other would require additional math. I don’t know anything about the algorithm to give any useful hint.

This is not a huge number even for the CPU to calculate analytically or numerically. The implementation on the GPU could be beneficial, but not in all case.

To gain high precision of the intersection, you can narrow view-frustum, as well as move front and back clipping planes as close to intersection as possible. This approach is much like taking a picture of far galaxy using extremely strong telescope.

In any case, if you want to use GPU for solving your problem you need to spend some time learning OpenGL and CUDA (since you are using GTX460) before decision which path to follow. Also, consult your mentor for the opinion. Someone who gave you the topic probably have some idea how to solve the problem.

ZbuffeR · February 5, 2011, 3:39am

Yeah and a link to the original 15-years-old paper would help too.

MarkoGL · February 5, 2011, 10:42am

Thanks so far for all your answeres!
I can’t tell you all the details, because some secrets must be kept. I’m sorry for that.

So this is one last try to get a good advice:

My problem is, that I need all these features at the same time:

rasterization of interpenetrating shapes
high resolution rendering
depth precision

Is this possible to do somehow with OpenGL or OpenGL&CUDA ?

Alfonse_Reinheart · February 5, 2011, 11:36am

I can’t tell you all the details, because some secrets must be kept.

So you have a problem. But you can’t explain what it is. And you want us to help you solve that problem. Which you won’t explain.

Also, if it’s based on an academic paper, it’s not exactly “secret,” is it?

The point we’re trying to make is that when you claim that you need double-precision rendering, we don’t believe you. Or more to the point, we believe that you can get the answers you need to reasonable precision without double-precision rendering. But we can’t tell you how to do that because you won’t tell us what it is you’re trying to do.

Is this possible to do somehow with OpenGL or OpenGL&CUDA ?

I can’t tell you all the details, because some secrets must be kept.

Of course, we already told you the answer to what you’re asking for, so you don’t need us to tell you again. Here are some relevant quotes:

gl_Position is a vec4, not a dvec4. Therefore, it is 32-bits. So it isn’t cropped by the framebuffer; it’s cropped by the vertex shader upon output.

I’m pretty sure that the maximum viewport size would restrict you from being able to render an image so large that the precision provided by a 32-bit float would be insufficient. At least in the X and Y.

The depth buffer normally is 24 bit integer and 8 bit stencil integer. Also called D24S8. There is also D16 and D32.

You could use OpenCL/CUDA to do it, but that would basically mean writing a rasterizer. That’s probably not going to be fast. Even if we ignore the fact that use of double-precision cuts performance by half at a minimum.

MarkoGL · February 5, 2011, 2:49pm

Here’s a link of the paper:

http://www2.iwr.uni-heidelberg.de/groups/ngg/People/winckler/CGpaper/hoff99fast.pdf

Alfonse_Reinheart · February 5, 2011, 4:14pm

That explains a lot.

So you have a field of sites, defined in double-precision math. And you want to construct a Voronoi diagram from them. At an arbitrary resolution.

I will assume that your distance mesh computations are already algorithmic. That is, for a specific resolution, you can generate a distance mesh who’s meshing error is one pixel in size.

Numerical precision problems can theoretically appear in one of two ways. They can appear in the clip-space X,Y positions or they can appear in the depth buffer Z positions. Remember: up until clip-space, you can work in whatever precision you want.

There is no possible way that you can have precision problems due to the clip-space X and Y positions. This is simply because of the maximum rendering resolution. I think most GL 3.x class hardware has a maximum viewport size of 8192x8192. 4.x class hardware may go up to 16384x16384, but I’m not sure of that.

What I am sure of is that even at 16k x 16k, you have plenty of numerical precision on the clip-space values on the range [-1, 1]. That gives you 24-bits of precision, 14 of which are used to find the pixel. That gives you 10 bits of sub-pixel precision, which ought to be plenty. Since you can’t actually see sub-pixels (unless you turned on multisampling, which I’m pretty sure is a bad idea for your needs.), any imprecision there will make little difference overall.

Remember: your algorithm will only ever be as precise (for a single rendering) as the resolution you use to render with, plus or minus one pixel.

Then there’s depth precision. Now that I understand your problem set, I can say that there are many, many games you can play to get an accurate image of an arbitrary solution. The most powerful of which I will describe below.

Now, we will assume that the computations for the 32-bit clip-space positions are all done using double-precision math. So what matters now is getting the most out of the available depth precision.

The whole thing comes down to depth range. For any particular view of the image, there is a minimum and maximum distance that matters. To get the most out of your 24-bit depth buffer, you need the depth range in your glOrtho call (or whatever other call you use to construct your orthographic projection matrix) to be as close as possible to these minimum and maximum distances. Because it is the values within this range that could potentially conflict.

What you want to do is adaptive rendering.

First, render the entire field at a relatively small resolution. Read that back to the CPU.

Then, break the field up into chunks and render each of those chunks. When rendering each chunk, you use your low resolution depth map to find the minimum and maximum depth values. Bias them a bit, just to make sure you don’t clip anything by accident.

You can repeat this process ad-nausium until you achieve the desired level of precision. You can even analyze the depth buffer values to determine where there may be depth precision problems and investigate only those areas.