OpenGL Internals.

When OpenGL was invented it used floating points calculations at it’s core due to computational problems. nowadays , when transistors are no longer the limit for calculus,
Does anybody know if OpenGL still uses floating points at it’s core, or was the core enhanced to double precision values ?
I need proof of whatever answe is written , either by links or by sending me any article (better be new…).
Thank you .

GL doesn`t use floats actually. It requires a certain minimum (as the GL spec requires, link is on the front page).
The minimum is 1 per 10(power)(-5).

Dont let glVertex3f glVertex3d and others fool you as to whats happening under the hood.
Some Radeons are said to use 24 bit floats in some calculations.

If you need doubles, then modify MESA or something. Most of the gaming world doesn`t need that much precision.

When OpenGL was invented it used floating points calculations at it’s core due to computational problems.

  1. What makes you think OpenGL uses floats at its core?
  2. What are those alleged computational problems, and how are they solved by floats?
  3. Did you miss the multiple versions of just about every function for integer and double-precision floats?

I need proof of whatever answe is written ,

You’re not getting access to our driver code, no matter how much you beg for it.

Edit: Spelling

[This message has been edited by al_bob (edited 08-05-2003).]

You’re not getting access to our driver code, no matter how much you beg for it.

Do you work for an IHV? Which one, nvidia?It’s nice to know where driver people when you need help with issues on a certain brand of hardware.

Proof? Hope your good with SoftIce

It all depends like they said, on the driver writers. Some like 16 or 32, others like 24 and pretend it is 32.

Originally posted by harsman:
Do you work for an IHV? Which one, nvidia?It’s nice to know where driver people when you need help with issues on a certain brand of hardware.

Why? That’s why they have developer relation web sites and e-mails…

Originally posted by V-man:
Some Radeons are said to use 24 bit floats in some calculations.

It uses 24 bits per channel in fragment programs. Vertex programs use full 32 bit precision per vector component. Vertex colour is interpolated at 12 bit per channel precision (though one can get around this by sending colour from the vertex program to the fragment program as a texture coordinate).

to all you Paranoids, Now you’ve got me pissed.

I am one who uses those Nvidia/ATI/3D Labs cards at work for programming simulators.

I don’t want to hack any driver, nor do I want to use any softice or whatever Ice or fire to crack basic stuff out. What is the fricking problem of saying “Hello, I work at NVidia, the Hardware designers say they work with doubles?” , I mean , what harm can it make ? what I need the answer for is the latest simulator I am writing , it tends to produce large errors when far from the origin , that’s it, No hacker, no cracker and no shmacker. Boys and girls, grow up and be more professional. I did not post this one because I need it for fun. I just need to know if my graphic accelerators (vendors listed above) are precise and for what floating point range.

to all you Paranoids, Now you’ve got me pissed.

Well, someone left their sense of humour in bed this morning.

I am one who uses those Nvidia/ATI/3D Labs cards at work for programming simulators.

That’s nice.

What is the fricking problem of saying “Hello, I work at NVidia, the Hardware designers say they work with doubles?”

Ever heard of “IP” and “NDA”? There are things that just can’t be revealed. So having “written proof” (that you require) isn’t really possible.

what I need the answer for is the latest simulator I am writing

Perhaps you should post some code or some more details on your simulator, and why and where you think you need 64-bit floats, instead of bashing the people who are trying to help.

Notice you still haven’t answered the questions I posted above.

it tends to produce large errors when far from the origin , that’s it

Perhaps you’re running into z-buffer issues? It’s hard to say without more details, or a screen shot, or code.

Boys and girls, grow up and be more professional.

Why thank you Mr Knowledgeable and Mature.

Originally posted by phoenix_wrath:
[b]to all you Paranoids, Now you’ve got me pissed.

I am one who uses those Nvidia/ATI/3D Labs cards at work for programming simulators.

I don’t want to hack any driver, nor do I want to use any softice or whatever Ice or fire to crack basic stuff out. What is the fricking problem of saying “Hello, I work at NVidia, the Hardware designers say they work with doubles?” , I mean , what harm can it make ? what I need the answer for is the latest simulator I am writing , it tends to produce large errors when far from the origin , that’s it, No hacker, no cracker and no shmacker. Boys and girls, grow up and be more professional. I did not post this one because I need it for fun. I just need to know if my graphic accelerators (vendors listed above) are precise and for what floating point range.[/b]

I agree. What’s the big deal in saying what the actually internal precision is… unless, of course, they’re converting down to 8 bit floats. That might be a bit embarrasing for some operations.

Sometimes I think these hardware manufacturers think their products are the most important objects on the planet… For instance, more important than a cure for cancer.

It’s just a video card for crying out loud. Tell us if the internal precision breaks the published interface or not.

I think a large part of the problem here is the way in which the original question was phrased.

I believe the answer desired is that most modern accelerators perform transformation operations with approximately single precision floating point accuracy.

As a follow-up to another response on this thread, recent ATI graphics products will use 24-bit precision at the fragment level. Not the vertex level.

-Evan

Originally posted by phoenix_wrath:
Does anybody know if OpenGL still uses floating points at it’s core, or was the core enhanced to double precision values ?
I need proof of whatever answe is written , either by links or by sending me any article (better be new…).

I have been coding OpenGL since Windows NT 3.1 and the software reference port for OpenGL from SGI originally contained an option for compiling with single or double precision. I have yet to run across any derivative hardware implementation that provides anything higher than single precision and many implementations dropped bits here and there to shave precious cycles from their pipeline so the precision was even less (considerably less in some areas). Higher precisions are in the works for certains aspects of the pipeline (e.g., depth buffer, color channels, etc.). However the basic hardware vertex pipeline is almost assuredly single precision or less for the cards you listed. Of course I could be wrong, but I doubt it.

The distortions you describe are most likely a result of exceeding the dynamic range of a single precision value (~23bits). As someone else has noted, you need to make sure your depth buffer is as tight to your dataset as possible and cull objects beyond a certain distance from the eye if possible.

FWIW, I can offer no proof other than my own experience. I do not work for a graphics card vendor but am reasonably familar with most.

Originally posted by ehart:
As a follow-up to another response on this thread, recent ATI graphics products will use 24-bit precision at the fragment level. Not the vertex level.

So at the vertex level, it is 32 bit?

Not that its ultra important for me. Im just curious as to the reasons why things are so.

Im assuming for the vertex stage, it is quite important and also you dont want to risk it. For the fixed pipeline, I don`t think it is a big deal. 24 bit should do the job, no?

> Hello, I work at NVidia, the Hardware designers say they work with doubles

The “Hello, I work at NVIDIA” part is accurate for me. No, there’s no access to double precision math in current NVIDIA GPUs.

The full answer to the original question is somewhat complicated because it depends on the domain of computation (vertex or fragment). It also requires knowing a little more about how numbers are represented than just saying 8 bits, 16 bits, 24 bits, 32 bits or whatever.

First, I’ll explain some notation. Floating-point numbers are represented as a number of magnitude bits, a number of exponent bits, and a sign bit. The IEEE 754 standard establishes a 32-bit IEEE float to have 23 bits of magnitude, 8 bits of exponent, and a sign bit. That’s 32 bits total.

To make this easier to digest, you can abbreviate this floating-point format as s23e8.

The other way to represent numbers is with fixed-point numbers. A fixed-point number is just an integer (magnitude) with the decimal point (or binary point) shifted around. Normally, you’d think of an 8-bit integer as ranging from 0 to 255. But 3D programmers often think these values in a [0,1] range so that 0 is really 0/255=0 and 255 is really 255/255=1.

We’ll abbreviate this fixed-point format as u0.8 (unsigned with 0 bits of integer magnitude and 8 bits of fractional magnitude).

On the other hand, the conventional [0,255] encoding of an 8-bit integer (with no fractional magnitude) would be u8.0 (unsigned with 8 bits of integer magnitude and no fractional bits).

You could also have fixed-point numbers that have some integer bits and some fractional bits. For example s1.10 would have 12 bits: a sign bits, one integer bit, and 10 fractional bits.

With these abbreviations, if you see an “e” that means floating-point (because there’s an exponent); if you see a “.” that means fixed-point.

(There are a lot of numeric details that these abbreviations don’t convey. Floating-point representations have details such as NaN (not a number) semantic, Infinity semantics, negative zero semantics, denorm semantics (for very small numbers), and lots of other details. Fixed-point representations also have details such as whether particular values such as zero, one, one-half, etc. are exactly representable as well as how rounding works and what the actual divisor really is. These details are probably beyond the scope of what the original question is asking about.)

Now we can talk about the numerics available in various NVIDIA GPUs.

TNT & TNT2 (NV4):

The per-vertex transformation & lighting operations are performed on the CPU so the vertex precision is s23e8 (standard 32-bit floating-point). The per-fragment texture enviornment operations are u0.8.

GeForce 256, GeForce2 GTS, GeForce2 MX, Geforce4 MX, nForce integrated graphics:

The per-vertex transformation & lighting operations are perfomed by the GPU with s23e8 numerics. The per-fragment texture environment & register combiners (NV_register_combiners) operations are s0.8 (9-bit).

GeForce3, GeForce4 Ti:

The per-vertex transformation & lighting operations as well as hardware vertex programs (NV_vertex_program, ARB_vertex_program) are performed by the GPU with s23e8 numerics. The per-fragment texture environment & register combiners operations are s0.8. The per-fragment texture shader operations (NV_texture_shader) are s23e8 operations.

GeForce FX:

The per-vertex transformation & lighting operations as well as hardware vertex programs are performed by the GPU with s23e8 numerics. The per-fragment texture environment & register combiners operations are s0.8. The per-fragment texture shader operations (NV_texture_shader) are s23e8 operations. The per-fragment fragment program operations (NV_fragment_program, ARB_vertex_program) are performed with s238, s10e5, or s1.10 numerics depending on the instruction & register formats.

For what it is worth, you are typically safe to assume that vertex processing is always performed with s23e8 numerics, whether implemented on the GPU or CPU. It is the per-fragment computations that vary a lot. For example, my understanding is that ATI resorts to s15e8 numerics for their ARB_fragment_program support. Rather than support multiple numeric formats, s15e8 is the only option you have.

As for double precision, you’ve got to ask yourself whether doing all your rendering math in double precision would make your resulting images look better. The simple answer is NO. Would there be some corner cases that might look better with double precision, maybe. Certainly you could create such situations if you tried pretty hard. On the other hand, there are certainly cases where u0.8, s1.10, s10e5, and s15e8 are demonstrably not sufficient but plenty of situations where these numerics are sufficient. I’m sure you could create situations where s23e8 was not quite sufficient.

My advice is to understand your application and what numerics you really require. If a more compact numeric form meets your needs and excess precision isn’t going to improve your picture much or at all, you are encouraged to use a numeric format requiring fewer bits. The reason is that compact forms are often more efficient, particularly for fragment processing. This is one reason the GeForce FX provides several numeric formats for fragment processing rather than taking a one compromise size fits all approach.

This isn’t any big secret. A careful reading of the NV_vertex_program, NV_register_combiners, NV_texture_shader, and NV_fragment_program OpenGL extension specifications could allow you to arrive at these same observations for yourself. And you could always write OpenGL programs to verify the available numerics.

I hope this helps.

  • Mark

Thank you Mark Kilgard !

Finaly some serious OpenGL poster .
Thanky ou very much for the explenations , the detailed data and the accuracy of it.
Thank you very much for the honesty, I do appreciate it.

I now know that nVidia Cards calculate with better precision than ATI.

P.s.
What is preventing nVidia to move to double precision ? I mean you guys beat the transistors amount of the Pentium IV from Intel and everybody knows that floating points calculus at Intel chips is done with doubles. Do you think of it , or do you think regular s23e8 will be enough for this time ?

Once again I thank you for beinging this discussion back to a professional course (my appologies to all the other ones who seriously answed as well].

What is preventing nVidia to move to double precision ? I mean you guys beat the transistors amount of the Pentium IV from Intel and everybody knows that floating points calculus at Intel chips is done with doubles. Do you think of it , or do you think regular s23e8 will be enough for this time ?

Well for one thing, you’d need about ~4x the transistors to implement fp64 operations comapred to fp32, at the same speed, or ~2x for just half-speed. You’d also need wider datapaths, more (or wider) registers etc. This means you end up with a chip that’s about 4x larger (by a rule-of-thumb, it means it also costs ~16x more to build), and isn’t any faster than the fp32 chip.

If more people would buy 500$ * 16 == 8000$ video cards, then it might be something you’d have a chance of seeing

In any case, wouldn’t it be kinda pointless to move to double precision in GPUs until 64 bit CPUs become the norm (not to mention 64 bit OSs)? It’ll definitely be quite a long while until such precision is implemented into GPUs.

Most CPUs have handled 64 bit precision floating point numbers for quite some time now.

j

Originally posted by j:
Most CPUs have handled 64 bit precision floating point numbers for quite some time now.

My point was a matter of optmization, but looking at some of AMD’s tech documents, it appears that both are well supported equally well in the FPU. It appears that 3DNow! is meant for single precision floats, though. Not sure about SSE or SSE2. . .

phoenix_wrath, you need to calm down, all I see are a few well intentioned if off the mark posts :-).

Simulators have traditionally applied a double precision offset from the origin to handle large coords. Basically you subtract the DP eye position from the DP model matrix (or at least a large equal amount from both) to produce a smaller single precision model (or modelview) matrix. In reality it’s often only the translate component that is important in this.

Doubling up the precision of all your transform hardware and all your transmitted or stored vertex locations seems like a heck of a price to pay when it isn’t needed.

Looking at the problem as it exists in real simulators with real databases, limited support for DP modelview matrix operations in hardware is probably the next best thing. All databases I’ve seen store some sort of structure with positional offsets explicit or implied & single precision vertex values within each ‘node’. Any competent simulation engineer could also easily accomplish this if it weren’t the case from the outset.

So what you really want supported in hardware to make life easier is the ability to eliminate the software modelview manipulation of:

// pseudocode
float_modelpos = (float)(double_modelpos - double_viewpos);
single_viewpos = 0,0,0;

or the equivalent scheme where a fixed number is subtracted from each.

It’s that subtraction that’s the pain to support in large coordinate simulation and the attendant sudden switch of matrix locations if the offset changes at a tile boundary and all your dynamic effects etc need to be adjusted frame synchronously etc.

In it’s simplest form an implementation would allow you to just use doubles in your modelview matrix calls (and texture dare one hope) and it’d all happen under the covers. You wouldn’t need 64 bit for most applications, maybe only 48 bits with particular attention to translate terms under matrix multiplication would solve many issues.

Even if you think you want DP everywhere you don’t, you want the above or some variation on it, and that is a far more likely prospect if you ask for the right thing and implementors understand how easy it is to support.

With the speed of systems these days single precision arithmetic isn’t an issue for competent implementors but double precision is a definite nice to have, it makes software much simpler, while moving some software transformation to hardware and that can improve your dataase structure and have benefits w.r.t. database structure, application structure and state changes that I won’t go into here. In addition there would be spinoff benefits for other applications that used it.

So, consider the real support required here, the homogeneous divide doesn’t have to be supported at DP, not even the matrix*vertex, it’s only the modelview multiplications (and probably only for some fields initially) to get you an accurate model-view translation on the modelview matrix that are required so it SHOULD be a breeze to implement.

You multiply the matrices in DP (or even 48 bit fp), then cast to 32 bit after the multiply and transform all vertices in 32 bit as you currently do and you’re in simulation Nirvana. Anything more is gravy.

Finally, and respectfully, anyone asking for DP vertices and the whole nine yards (instead of the matrix support I’ve outlined), should be viewed with extreme skepticism. It’s just another example of asking hardware to solve the wrong problem, expensively, when a simple software approach is known to solve the problem.