No accumulation buffer !!!!!!

mfugl · November 27, 2000, 4:04pm

I have just bought a geforce2 and now I finds out that it does not support something as simple as an accumulation buffer! The ati radeon and voodoo4/5 does. 3D texturing is also not supported though it is also a really simple technique.

Why does nvidia not implement the accumulation buffer? It should just be a driver issue.

mcraighead · November 27, 2000, 11:37pm

We do support the accumulation buffer, and we do support 3D texturing. Both are implemented in SW only, though. (If we didn’t implement them, we would have a non-conformant driver and we would not be allowed to call it OpenGL.)

On the specific topic of the accumulation buffer, there is no consumer HW that supports accumulation buffering in HW, so we’re hardly unique in this.

Matt

Tom_Nuydens · November 27, 2000, 11:47pm

3dfx only has their “T-buffer”, which is exposed by their OpenGL driver in the form of a “3dfx_multisample” extension. I don’t know why they didn’t handle it via OpenGL’s accumulation stuff, though. Does anyone know more about this?

mcraighead · November 27, 2000, 11:53pm

It’s only a small subset of the full accumulation buffer functionality.

Matt

mfugl · November 28, 2000, 2:06am

Yes, I know you have an OpenGL compliant driver, but I kind of expected to go well beyond software drivers on important issues, when buying this expensive card, which is great in other areas.

The accumulation buffer is often used like this:

change the scene a bit and draw it
call e.g. glAccum(GL_ACCUM, 0.5f)
repeat the above lines a couple of times
call e.g. glAccum(GL_RETURN, 1.0f)

I cant see why not implementing the accumulation buffer in hardware. It is just like doing tripple buffering, where an extra memory bufffer is in use.

As I see it, it is as simple as doing a Blt() operation with blending, and the geforce can do this for sure. So why not just extend the next version of the driver with this functionality?

Tom_Nuydens · November 28, 2000, 2:20am

Originally posted by mfugl:
I cant see why not implementing the accumulation buffer in hardware. It is just like doing tripple buffering, where an extra memory bufffer is in use.

If only it were that simple. If you take a look at your pixel formats, you’ll notice that the accumulation buffer is usually 64 bpp. It needs this extra precision to work. You can figure it out: if nothing else, the memory requirements alone would be a good reason not to implement it in hardware right now (on 32 Mb cards, at least).

mcraighead · November 28, 2000, 2:44am

Correct, it is a lot more than we can do with our 2D engine. Accumulation buffering requires 16-bit-per-channel precision (it’s not strictly required, but it is generally expected) and a signed buffer.

It is definitely an ultra-high-end OpenGL feature. It consumes huge amounts of memory, it uses tons of bandwidth and fill (you have to draw many times, and the accumulation operations may require a read from a 32-bit buffer and a read-modify-write on a 64-bit buffer), and you need, at the bare minimum, a pipelined 16x16 multiplier and a barrel shifter to implement it. Even then you’ll have an accumulation rate of 1 pixel per 4 clocks because you need to accumulate R, G, B, and A. 4 multipliers and 4 shifters would help, but you’re starting to talk about a fairly big unit then, more than anyone can afford to build into a consumer 3D chip.

When you start adding up what it would take to implement it in HW, it’s pretty obvious why hacks like the T-buffer exist. It’s much cheaper to have a 4-sample buffer and render 4 times into it, then do a single filter operation, than to have a high-precision buffer, render 4 times, do 4 fancy blends, and then still have to copy back to the main framebuffer.

If you want a cheap approximation of the same kinds of effects, you also have the option of using blending. For example, you can draw an object using ONE/ONE blending four times at 1/4 the usual brightness to get a motion blur effect on a dark background. The effectiveness of this technique depends on your scene, and it does cost precision, but it’s often a good way of getting a “good enough” approximation.

Matt

mfugl · November 28, 2000, 3:12am

I still believe you could prevent your driver from switching to software mode. A 32 bit buffer could be used, and regarding signedness I suppose it is a matter of changing the bias.

In general I think you should implement ‘less’ important features in hardware, and just use less transistors on these areas. It does not matter that its not the fastest implementation on earth. It would just be so much better and faster than this switching to software mode and prevent customers from getting these disappointments with their new expensive gear.

Eric · November 28, 2000, 3:20am

Hi Mfugl !

I don’t want to offend you; I just have some silly questions: you seem really interested in the accumulation buffer and I suppose you have very good reasons for that.

The questions are:

Why didn’t you get some more information before buying you GeForce2 ? You would have been told that it does not have an HW accumulation buffer.
If you had known it, would you have gone for what they call a professional graphics card that has it ?
Why do you need the accumulation buffer so badly ???
How much are you ready to pay for an HW accumulation buffer ?

Hope you will answer !

Best regards.

Eric

mcraighead · November 28, 2000, 3:47am

The decisions behind what we decide to put in HW are complicated, and even when I do have influence, there is a 12-month lead time on anything I suggest showing up in a shipping product, sometimes much longer.

In this particular case, we made the same decision everyone else did. The justification is pretty clear: supporting accumulation buffering would take quite a bit of effort, the technique would still be too slow for many real-time applications, there are no significant apps shipping now or any time in the next year that use the accumulation buffer, and no major developers are requesting it.

And further, remember that the GeForce architecture has been shipping for well over a year. If we had wanted to put accumulation buffering in it, we would have needed to decide that more than 2 years ago!

Maybe it’s a feature you want, but you have to have realistic expectations about our product design cycle, and we can’t satisfy everyone.

If we had spared a bunch of transistors for accumulation buffering, there would be a tradeoff. For example, we might have had 10% slower 3D performance, or we might have had to remove a 3D feature to make up for it. Overall, I think people would have been less happy about the product.

Matt

mfugl · November 28, 2000, 4:48am

Hi Eric!

I have looked rather closely on nvidias site, but I found it very difficult to find an exact specification of what can and what it can’t do. They just say something like: it’s the best, it’s the fastest and it’s what you have been waiting for, so go and buy it!
No, with pro card you pay often 5 times more and get almost nothing extra (in terms of production costs)
I’ve just looked forward to do some fun programming with it. There are so many possibilities with it.
4)If implementing the accumulation buffer using microcodes (the slow way), like in cpu’s, there would be almost no extra production cost, and therefore I should not pay more. They would just sell more pieces.

Best regards
mfugl

mfugl · November 28, 2000, 4:51am

Hi Matt, I guess its also allot about politics and marketing. So when you within the next year come up with this next next next generation gpu, I will just throw my card away and go buy another one

timfoleysama · November 28, 2000, 2:12pm

I’d never really had an urge to use the accumulation buffer (yeah, motion blur and depth of field are neat, but having to render the whole scene 2 or more times and halve your frame rate never seemed worth it). The one time that I thought the accumulation buffer would be useful, I found out that it’s way too limited anyway.

Imagine that you are drawing surfaces with multipass effects (or perhaps you use multipass as a fallback if the HW doesn’t do multitexture). Further, suppose you have been told that the LOD algo will simply fade models out once they reach a certain distance - or that as part of a death animation models must fade out to avoid having tons of corpses around. How then, do you apply the multipass shader to this transparent object? Multipass depends on the contents of the framebuffer containing only the intermediate results of the shader you are trying to render. Trying to just do multipass, modulating each pass by the opacity gives the wrong results. You really want to render the surface with the full shader, and then composite the result into the framebuffer using the alpha of the surface. I looked carefully at the facilities of the accumulation buffer, and found that it is not up to the task.

What I need, then, is a compositing buffer which holds a single RGB or RGBA image at the same resolution and bit depth as the framebuffer (which would have to be RGBA). At any time I would like to be able to blit the framebuffer into the compositing buffer (using any of the standard blend modes - and the alpha channel of the frame buffer as src alpha). One could then render all the opaque objects in the scene and directly copy the results to the compositing buffer. Next one could clear the framebuffer and render a complex transparent surface (say a large pool of reflective water, that requires many rendering steps since it acts as a partially reflective portal-mirror) clipping it using the Z-buffer. Once the frame buffer was set up so the pool was properly shaded, and the destination alpha was appropriate for compositing, this layer could be alpha blended into the compositing buffer.

The memory demands of such a technique seem manageable, but it would demand attrocious amounts of fillrate unless the copy and blend operations were cleverly optimized to avoid copying empty regions…
I realize that the same effects could be yielded through intelligent use of glCopyTexSubImage, but it seems the direct route would be easier to optimize…

For now I can live with the errors caused by blindly using multipass - they aren’t too bad with many multipass effects, and usually aren’t noticeable when objects fade in and out quickly.

[This message has been edited by timfoleysama (edited 11-28-2000).]

mcraighead · November 28, 2000, 2:42pm

You can probably do that with an aux buffer and CopyPixels…

Matt

system · November 29, 2000, 9:04am

Originally posted by mfugl:
In general I think you should implement ‘less’ important features in hardware, and just use less transistors on these areas. It does not matter that its not the fastest implementation on earth. It would just be so much better and faster than this switching to software mode and prevent customers from getting these disappointments with their new expensive gear.

Umm… the specs are clearly available all
over the web, and on most good vendor web
sites. Thus, no disappointments, assuming
you did your homework.

Now, even a GeForce Ultra (at $419) is not
“expensive” compared to the professional
workstation cards that DO have accumulation
buffers (you know you can spend $20,000 on
a workstation GL card, right?)

Last, for the examples that you have
mentioned (such as rendering 4 times, each
time doing an average between “previous”
and “current” pixels) using destination
alpha seems like it would do the same thing
for you. Or just using accumulation mode,
with the right light scaling.

[This message has been edited by bgl (edited 11-29-2000).]

Vilu · December 1, 2000, 7:48am

I was wondering, does nv20 support accumulation buffer on hardware? Of course, you, mcraighead, propably are under NDA or some fascist “we hate leaks except when it comes to our drivers but we fake we hate that too” contract or something. What special super-nifty extensions will nVidia give in the future?

mcraighead · December 1, 2000, 10:50am

I can’t comment on unannounced products.

I’ve said this before, but we do, in fact, hate leaked drivers. No matter what various conspiracy theorists will tell you, we have never intentionally leaked a driver, we do get angry when it happens, and we always groan in agony whenever someone complains that they downloaded some new leaked driver and it didn’t work on their system.

Of course, we also always laugh whenever we see a “review” of some new leaked driver where the reviewer tries to make a big deal out of a statistically insignificant difference in performance.

It’s like Bush and Gore: the framerate numbers (vote counts) change every time you measure (count) them, the two drivers (candidates) are both indistinguishable to any normal observer, and some people get way too worked up about which driver (candidate) is going to win.

However, I think it’s safe to say that our drivers are of higher quality than the candidates.

Matt

Eric · December 3, 2000, 9:20pm

Matt, just a comment here: I understand nVidia’s position about leaked drivers. You receive complaints about them while nobody should be using them (waste of time for you)… Fair enough. But I am sure you also receive useful bug reports from some people (OK, not from me… specular highlight… hum…). Don’t you ???

About your elections, from outside the US, it does not seem the problem lies in the drivers (candidates), but in the FPS calculation method (vote process)… I received this funny pictures the other day: you saw a man counting tons and tons of papers (the votes) manually. The legend was: “The USA, the world’s most powerful nation !”.

Regards.

Eric

UncleBuck · December 4, 2000, 6:10am

To be honest I think I’d prefer IHVs to concentrate on other areas right now than accumulation buffers. Simulating motion blur, depth of field (Cinematic Effects!) aren’t exactly high on the list of what every developer wants. The only real benefits I see for them is for full screen anti-aliasing and jittered soft shadows, but better techniques for doing these will appear.

Vilu · December 7, 2000, 12:34am

Is it just me but doesn RGSS look too blurry? I mean, sure, it cleans up jaggies, but it seems exaggerate(um… they look like they pop out of the scene more) objects closer to me. Or it was just the screenshots fault. It just looked a bit… skewed and blurry. OGSS is cleaner looking but less powerful. I’m not flaming, just pondering.