R520 - no FP texture filtering?

I just want a confirmation on this…
So ATI supports now FP blending (what about alphatesting?) and antialiasing on FP-rendertargets, but no filtering on FP textures? If that is true, what is the reasoning behind this?

Im not sure why, but i know that the new ATI cards dont support all of SM3.0 either, Vertex Texturing apparantly wasnt included in the ‘minimum’ SM3.0 spec, so ATI have skipped it.

-Twixn-

If that is true, what is the reasoning behind this?
That ATi has slacked off since the R300? The fact that they care far more about performance than they do about functionality, because you can’t test functionality in a benchmark?

ATI’s new cards are definately not worth it. No FP filtering, a vertex texture cap but no texture formats that work (very sleazy), and cross-fire… ha hahahaha come on now. Why spend all that money on ATI right now when you can get the same (or better) performance with new cool features from NVIDIA. It’s a no brainer really.

-SirKnight

Originally posted by skynet:
I just want a confirmation on this…
So ATI supports now FP blending (what about alphatesting?) and antialiasing on FP-rendertargets, but no filtering on FP textures? If that is true, what is the reasoning behind this?

Alpha testing should work AFAIK (haven’t tested). The reasoning is simple. For all HDR related stuff, FP filtering was one of the most expensive to support, yet is the least important one. Instead focus has been put on hard to emulate features, such as full multisampling support on FP16, FP16 blending, displayable FP16 surfaces with tonemapping in the display engine etc. Blending could reasonably be emulated with ping-ponging, but that’s quite cumbersome.
Floating point filtering on the other hand is easily emulated in the shader. Also, FP16 is hardly the best way to store HDR assets. A much better way is to use something like RGB in DXT1 and exponent as L16. This costs less than 1/3 the bandwidth and storage space.

Personally, while FP16 is a convenience, I’d rather spend the transistors on supporting multisampling. Users shouldn’t have to turn off AA to play with HDR. That pretty much makes HDR useless IMHO.

Originally posted by Korval:
That ATi has slacked off since the R300? The fact that they care far more about performance than they do about functionality, because you can’t test functionality in a benchmark?
Uhm, the X1K cards has the best feature set out there. To mention a few that Nvidia don’t support: FP16 multisampling, displable FP16 and RGB10_A2, RGB10_A2/R16/RG16/RGBA16/R16F render targets, fetch4, floating point HiZ, 3Dc+ (now with a single channel format as well), unlimited MRT support, angle invariant AF (well, they used to, but now they don’t).

Or how about that dynamic branching now is actually fast enough to be useful? Like the X1800 being over twice as fast as the 7800GTX in the dynamic branching heavy samples to be included in the next ATI SDK, and the X1600 competing very well with it. Or instancing that’s fast enough to be useful. The X1600 spanked the 7800GTX in my tests (heck even my mobility 9700 does).

What about Vertex Texturing? Are you at least going to support a render-to-vertex-array in OpenGL in a near future?

Another reason I saw mentioned on review sites regarding lack of FP16 filtering is that developpers are expected to implement custom filters via shaders anyway, meaning virtually no one would use the built-in box filter. The site gave UnrealEngine3 as an example; the engine uses a custom filter on all cards (ie including NVIDIA, which hardware filtering is wasted in this case).

Even if you implement a custom filter, often you will be able to have better performance if you use the built-in box filter to implement this custom filter, since it samples 4 or 8 texture samples instead of one.

I believe that the reason why Unreal Engine 3 implements its own filtering is because of non-availability of SM3.0 capable hardware and/or sucky driver support when they were developing this technology. I don’t see much reason for implementing your own filtering if the hardware filter is good enough and produces good results. From the looks of it i don’t think that Epic are doing any fancy stuff in their own filtering code.

Floating point filtering on the other hand is easily emulated in the shader.
Which is fine for bilinear and to a certain extent trilinear. It’s a whole other matter to (efficiently) do aniso.

Something on-topic, and somethin off-topic:

On topic: I second the bit about fragment branching. Soft shadows maps will probably benefit. Now your deferred shaders can dispatch on material efficiently. The possibilities are quite intriguing. The displayable HDR (and AA!) is also nice, but that’ll eat up quite a bit of storage/bandwidth. I wonder what the performance of that will be like. And last I heard, render to vertex array was faster than render from vertex texture on NV hardware. But it’s been a long time since I’ve tested that.

Off topic: To be pedantic, the linear filter is a tent filter. The box filter is what they recommend for auto generating mipmap levels. But the point is clear: filters with narrow support universally suck.

-W

FP16 multisampling, displable FP16 and RGB10_A2, RGB10_A2/R16/RG16/RGBA16/R16F render targets, fetch4, floating point HiZ, 3Dc+ (now with a single channel format as well), unlimited MRT support, angle invariant AF (well, they used to, but now they don’t).
Of these, the only one that is of significant importance is multisampling. I don’t need to be able to display FP render targets directly, as I can just render to a texture and use that texture to convert the data to a displayable format. Plus, I get to take the opportunity to do other things (post-processing effects) with the texture while doing this, so in some cases, it isn’t even a performance loss.

Like the X1800 being over twice as fast as the 7800GTX in the dynamic branching heavy samples to be included in the next ATI SDK, and the X1600 competing very well with it.
Yeah, like I’m going to believe a test that ATi designed. When an unbiased 3rd party comes up with such a scenario, then maybe we can talk.

Originally posted by gybe:
What about Vertex Texturing? Are you at least going to support a render-to-vertex-array in OpenGL in a near future?
Well that’s on the ARB’s table right now. I hope to see it added to FBOs at some point. We had superbuffers implemented in our drivers like two years ago or so. But things took another turn, things got moved to the ARB, which eventually mixed up with other stuff and we got FBOs, which doesn’t support R2VB yet.

Originally posted by Won:
The displayable HDR (and AA!) is also nice, but that’ll eat up quite a bit of storage/bandwidth. I wonder what the performance of that will be like.
It’s quite good actually. In the new SDK that was released just today (http://www.ati.com/developer/radeonSDK.html) there’s a new HDR sample that does AA. I saw a 14% performance hit by enabling 6xAA.

Originally posted by Korval:
I don’t need to be able to display FP render targets directly, as I can just render to a texture and use that texture to convert the data to a displayable format.
Well, then you’ll probably at least enjoy a RGB10_A2 backbuffer for better output precision. The displayable FP buffers may not be that useful to regular gamers. To truly take advantage of it you’ll need one of those extremely expensive HDR displays. I haven’t seen that in action myself, but supposedly it’s pretty cool.

Originally posted by Korval:
Yeah, like I’m going to believe a test that ATi designed. When an unbiased 3rd party comes up with such a scenario, then maybe we can talk.
Talk! :wink:

http://www.xbitlabs.com/articles/video/display/radeon-x1000_21.html
“In both these cases RADEON X1800 XT is undefeated. Moreover, in case of the hardest Heavy Dynamic Branching, it is twice as fast as the rival.”

http://www.xbitlabs.com/articles/video/display/radeon-x1000_27.html
http://www.xbitlabs.com/articles/video/display/radeon-x1000_33.html

I had a look at those benchmarks Humus just posted.

On the first page, i see the 7800 outperforming everything overall. Especially on benchmarks that are realistic for today’s market. I would be very far from saying the ATI cards were unbeatable.

However i do have to point out this quote “Xbitmark results prove this point once again: the overall performance of RADEON X1800 XT is lower or the same as that of GeForce 7800 GTX.”

But the last two pages dont mention the 7800, it only compares the ATI’s newest and last generation (X1k and X#00) to nVidia’s last generation (GF 6). Perhaps its becuase the rest of the GF 7’s havent appeared yet. But still, new vs. old. Its like comparing the GF 6 series to the GF FX series.

The second two i think are biased…it also doesnt mention what brand, an ASUS and a Sparkle would produce different results.

Although, i think we are going off-topic…its turning into yet another ATI vs. nVidia thread.

-Twixn-

Originally posted by Humus:
[quote]Originally posted by Won:
The displayable HDR (and AA!) is also nice, but that’ll eat up quite a bit of storage/bandwidth. I wonder what the performance of that will be like.
It’s quite good actually. In the new SDK that was released just today (http://www.ati.com/developer/radeonSDK.html) there’s a new HDR sample that does AA. I saw a 14% performance hit by enabling 6xAA.
[/QUOTE]Wow! That SDK is getting so good I think it easily rivals Nvidia’s now.

Just how many examples and papers did you write in that Humus? Virtually all the examples smack of your “style”.

OK, new Radeons are as cool as GeForces now and have some decent features I miss on green team cards, but…

WHEN THE HELL ATI WILL START TO THINK ABOUT UPDATING THEIR OPENGL DRIVER!?!?!?!?! Or are they in the Vista boat - Astalavista GL???

Not thinking about getting any of their product till this is fixed!

Originally posted by sqrt[-1]:
[b]Wow! That SDK is getting so good I think it easily rivals Nvidia’s now.

Just how many examples and papers did you write in that Humus? Virtually all the examples smack of your “style”.[/b]
I did write the majority of the framework, so the samples adheres to that more or less. I wrote about 3 of 4 of those samples, but most of the duplicates between the API are mine, so maybe saying a bit over half is more fair. I wrote 3 of the new papers (HDR Texturing, Programming for CrossFire and Framebufer objects).