PDA

View Full Version : R350 finaly! It's faster than FX, but is so flexible? Aparently yes!



Kosh Naranek
03-05-2003, 11:41 PM
Some minutes ago, the first reviews of the new Radeon 9800, with uses the new R350 core are launched.

Aparently, it's so flexible as GeForceFX, or even more and have the advantage to be faster with filters, it's much more noiseless and consumes less power, this all for the same pice

What do you think about her?

cirityone
03-06-2003, 12:15 AM
Hi, this a OpenGL forum!. Anyway, the change for the queen's graphics board is near...the RV350 is better than NV30 and NV34, but i think it in Direct3D...and OpenGL too?.

Kosh Naranek
03-06-2003, 12:37 AM
Yes, but my main doubt is about the suport to OGL 2.0.......

Besides somethings else, like de Stencil Buffer aceleration is very important to Shadow Volumes rendering in Doom3 which is OpenGl.......

Besides such important thing just happen sometimes, and i don't think this will realy disconcert someone http://www.opengl.org/discussion_boards/ubb/smile.gif

Robbo
03-06-2003, 01:18 AM
Can you ladies point me towards the reviews? I haven't come across any yet this morning. A few links would be nice. Will help to add some context to this discussion!

Adrian
03-06-2003, 01:31 AM
www.tomshardware.com (http://www.tomshardware.com) www.hardocp.com (http://www.hardocp.com)
among others.

Kosh Naranek
03-06-2003, 01:43 AM
Even better, go to www.rage3d.com (http://www.rage3d.com) there are links to many other reviews. Unlucky in the momment I don't find any review really good on techinnical details

But appearently, Radeon 9800 is amazing, full suport to OGl 2.0, multiple render targets, stencil buffer with aceleration and more

Unlucky again, i canno't find any good schedule to compare the nunber of registers, temporary registers and other things of this kind

Please if someone find, put the link, and off coursem say what think about the new flagship of ATI http://www.opengl.org/discussion_boards/ubb/smile.gif

[This message has been edited by Kosh Naranek (edited 03-06-2003).]

PH
03-06-2003, 01:52 AM
From what I've read ( www.driverheaven.net/reviews/r350/index.htm (http://www.driverheaven.net/reviews/r350/index.htm) ), the 9800 has an F-buffer which allows for fragment shaders of unlimited length. Very impressive.

Kosh Naranek
03-06-2003, 02:10 AM
They say that is this which give to the Radeon 9800 full OGL2,0 support, infortunate I was not
see this in details, however this is really drop-shot

MickeyMouse
03-06-2003, 02:25 AM
Do you anybody know what "Shadow volume rendering acceleration" http://www.driverheaven.net/reviews/r350/index.htm means on their features list?

Tom Nuydens
03-06-2003, 03:02 AM
There's some good information at Beyond3D: http://www.beyond3d.com/reviews/ati/r350/

-- Tom

gibber
03-06-2003, 04:59 AM
Do you anybody know what "Shadow volume rendering acceleration" means?

I guess it means support for some extension that allows shadow volumes to be rendered once (instead of once for front faces and one for back faces), hopefully EXT_stencil_two_side.

The card looks good. I've been a NV nut with my GF3, but ATI are looking better and better, especially wrt the NV30 vs new radeons.

davepermen
03-06-2003, 05:24 AM
the shadowvolume optimisation is part of hyperZ3, you can read about it on their page. it means that the superfast hyrarchichal z-buffer thingy they have is now optimized as well for the stencil operations normally done for shadow volumes, means compressing/hyrarching them as well.. so much of the stencil-buffer-fillrate can get forgotten.. and of course stencil-two-side is in as well http://www.opengl.org/discussion_boards/ubb/biggrin.gif

i am very impressed by the info so far. now i would like to see some specs, espencially for the pixelshaders and vertexshaders..

yeah, gibber, currently there is no point in getting a new nv card anymore, is there? high end is now radeon again, high cheaper end is radeon as well (9700 stuff), and cheapest high end is 9500, still a great card.. the lower stuff is still dx8.1 only.. but that'll change sooner or later, too..

nutball
03-06-2003, 05:35 AM
The only worry holding me off getting a Radeon is the Linux support. How good are their Linux drivers these days? NVIDIA dropped the ball with their last Linux set IMO.

Oh, and I'd really like 32-bpc FP too...

[This message has been edited by nutball (edited 03-06-2003).]

V-man
03-06-2003, 05:49 AM
Sounds like a pretty good board but it's performance seems close to the R300.

I noticed that the transister count hadn't changed between R300 and 350

And what pisses me off is that I don't see a list of supported extensions. Tons of benchmarks but a simple list can't be uploaded?

Tom Nuydens
03-06-2003, 07:10 AM
Originally posted by V-man:
I noticed that the transister count hadn't changed between R300 and 350

... The review on AnandTech says that there are no new features except for the "Hyper-Z III" improvement and some color compression optimizations. Huh?!?

The features mentioned in the other reviews sound very impressive, though!


And what pisses me off is that I don't see a list of supported extensions. Tons of benchmarks but a simple list can't be uploaded?

Amen! Most video card (p)reviews seem to consist of a rehash of the marketing blahblah, followed by approximately 25 pages worth of benchmarks and FSAA/AF comparison screenshots, neither of which I ever even so much as look at.

What I want to know before buying anything is what the extension string looks like, whether all new functionality is exposed, AND if all relevant extension specifications are publically available by the time I get the card in my hands.

-- Tom

Jan
03-06-2003, 07:28 AM
How can they support "full" OpenGL 2.0. As far as i know 2.0 isnīt completely ready, yet. And if it were, than there would certainly be a final spec, or so.
I mean, if i by a GF fx, or a 9800, than i am not able to program and use 2.0 programs, am i? I still need the headers, and they are not out yet, are they?

Jan.

Adrian
03-06-2003, 07:31 AM
Have you tried e-mailing them? They're probably not aware the extension list is important.

[This message has been edited by Adrian (edited 03-06-2003).]

kehziah
03-06-2003, 09:06 AM
This article (in French) says Radeon 9800 Pro has 117M transistors (107M for 9700 Pro). http://www.hardware.fr/articles/456/page1.html
I note that THG's article put a ? next to 9800 Pro transistor count.

harsman
03-06-2003, 09:12 AM
The 9800 implements an f-buffer which means it can support shaders of any length. You can problaby even support full dynamic branching in fragment shaders with clever use of discard and the f-buffer. It won't be fast but it'll work. The other new stuff seems to be mostly efficiency improvements, better HyperZ and a tuned a memory controller. Of course, it also runs at a faster clock rate with faster memory.

MZ
03-06-2003, 11:50 AM
All I can read about R350's F-buffer is that it "allows infinite instruction count". But instruction count is not the only limited resource of FP, there are others:
- texcoord count
- texture unit count
- dependant read complexity
- temporary register count
- constant register count

If you exceed any of these, you have to do multipass too. If I understand correctly the Stanford paper (didnt read thoroughly), the "true" F-buffer would remove all above limits, because it combines results of muliple passes, and each pass is given full set of resources for use.

So, the fundamental question is: which limits of FP are actually relaxed in R350?

If this is only instruction count, as reviews seem to suggest, then this is nothing to be excited about. In practice difference between Nv30's 1024 vs. infinite will not matter, especially in situation when neither HW supports true (non-unrolled) loops in FP (otherwise ps3.0 support in R350 would be hyped). Then R350 is just as much GL2-ready as nv30.

Korval
03-06-2003, 02:42 PM
Most of the 9800's speed improvements were in the areas of antialiasing support and Hyper-Z stuff. If you don't turn on antialiasing (and, to be honest, given the fact that a 9800 will run Unreal2003 at around 90fps with good antialiasing and aniso, why not?), you're probably not going to see much of a speed improvement.

What I really want to know is:

#1: Did they match new nVidia's fragment program instructions?
#2: Did they match nVidia's vertex/fragment program parameter/constant/etc counts?

Humus
03-06-2003, 03:27 PM
Originally posted by MZ:
If this is only instruction count, as reviews seem to suggest, then this is nothing to be excited about.

Don't know if this is true or not, but infinite instruction count alone is enough to get me excited. I don't think I've ever hit any of the other limits, but I sure have hit the instruction limit a few times.

MikeC
03-06-2003, 03:50 PM
Anyone know what's happened to the M10 mobile part? I'd assumed it would be announced at the same time.

deshfrudu
03-06-2003, 04:08 PM
Correct me if I'm wrong, but doesn't the f-buffer give us order-independent-transparency, basically for free? There's something to be excited about.

DaveBaumann
03-06-2003, 04:17 PM
FYI, for those that wanted OGL Extensions:
http://www.beyond3d.com/reviews/ati/r350/index.php?p=apisupp

cass
03-06-2003, 04:34 PM
Originally posted by deshfrudu:
Correct me if I'm wrong, but doesn't the f-buffer give us order-independent-transparency, basically for free? There's something to be excited about.

No, f-buffer renders fragments in the order you send them, so you're still responsible for your own sorting.

I'm interested to hear details on the f-buffer implementation as well. http://www.opengl.org/discussion_boards/ubb/wink.gif

Cass

DaveBaumann
03-06-2003, 04:42 PM
A few more F-Buffer details here:
http://www.beyond3d.com/forum/viewtopic.php?t=4717

(sireric - ATI)

V-man
03-06-2003, 07:38 PM
Originally posted by DaveBaumann:
FYI, for those that wanted OGL Extensions:
http://www.beyond3d.com/reviews/ati/r350/index.php?p=apisupp



Thanks, I didn't see any link to that page.

What is "Legacy Depth Bias"?

Why do they list "Projected Textures". This is texture coordinate generation I guess.

davepermen
03-06-2003, 11:04 PM
Originally posted by cass:
No, f-buffer renders fragments in the order you send them, so you're still responsible for your own sorting.

I'm interested to hear details on the f-buffer implementation as well. http://www.opengl.org/discussion_boards/ubb/wink.gif

Cass

at least it kicks all problems of multipass away, making transparency a useable feature again. but depthsorting is needed (depending on the transparency..)


about the other limits (someone above noted), texcoords, texcount, etc.. dunno, you never worked on a r300 chip yet, did you? i don't have any problems with texcount.. i mean.. the full lighting equation does not require any texture at all, remember, full floats => lighting equation can be done ni the (now infinite long) shaders. if you wanna do shadowmapping, you need one shadowmap per light, yep, limiting your max amount of lights. but you can emulate for example soft shadowing by supersampling around on the shadowmap with ease, you _could_ do it like lightmapping, fitting several shadowmaps into one (dunno, its just an idea http://www.opengl.org/discussion_boards/ubb/biggrin.gif)..

all i wanna say is texture count is limited, but makes definitely sense, and is not that much of an issue, as you can actually use it for texturing (specifiyng materials, that is). texcoord amount is not really important, you _can_ just pass in object space x,y,z and the u,v texcoords, interpolated, and generate all other texcoords from this perpixel (dropping the vertexshader). that is always possible, even while wished to get as much out of the pixelshader as possible from time to time.


its just the ability to not care about multiple passes that is awesome. something a 9700 and a gfFX can't provide. its a great step..

about the ones talking about opengl2.0. do we really need a full released opengl2.0 spec to implement what we yet know about gl2? no. there are papers describing more or less how gl2 should look like, and ati is free to implement drivers that support that, today known gl2. no the ati card can't do everything full in hw. some features simply have to get identity mappings (the ddx and ddy instructions available on the gffx for example), others possibly have to drop to software (texture sampling in vertex shader), but who cares. its great to play yet, and the hw is definitely capable of supporting/emulating a good gl2 version. just remember, no gf2 can do gl1.3 or gl1.4 features, still it claims to be such a card. it can work around most non-hw features. just don't use 3d textures on the gf2 http://www.opengl.org/discussion_boards/ubb/biggrin.gif

m2
03-07-2003, 03:59 AM
Originally posted by cass:
No, f-buffer renders fragments in the order you send them, so you're still responsible for your own sorting.

Read the paper by Mark and Proudfoot (you should have already, as they even cite you ;-) As I understand it, as long as you don't change fragment programs (or in general, as long as you don't change the fragment program state), you should be able to get away without sorting. It's part of the f-buffer design. In plain text that means that if you manage to define a single fragment program for an object, you can render the whole object without sorting, even if it's semi-transparent.

OTOH, if you have multiple shaders, you still have to sort. Basically the question is:


Originally posted by cass:
I'm interested to hear details on the f-buffer implementation as well. ;)

... how is this thing really implemented, and more specifically, did ATI adopt one of the strategies proposed by Mark and Proudfoot to handle buffer overflows. My educated (but still wild guess) would be that the f-buffer is flushed whenever you change the state of the fragment program, but that still can lead you to a situation where the on-card memory is not enough.

Gimme a sample board (with working drivers, if that's not much to ask for).

Tom Nuydens
03-07-2003, 04:57 AM
Originally posted by m2:
In plain text that means that if you manage to define a single fragment program for an object, you can render the whole object without sorting, even if it's semi-transparent.

No, the paper explicitly states that "partially-transparent surfaces must still be rendered in back-to-front order".

Let's say you have a translucent sphere with a two-pass shader. Even if you depth-sort the polygons, the first pass will cause all fragments to be covered twice (once for the back of the sphere, once for the front). The fragments on the front and on the back will be blended together and a single color will be written to the framebuffer.

Now you go and do pass two. You generate a new pair of "front" and "back" fragments. What you want is for these to be blended with the corresponding "front" and "back" fragments from pass one, but you don't have these anymore! All you have is the combined color that was written to the framebuffer.

Using the F-buffer you can keep both the original fragments and composite the two passes correctly, then do the blending between the final "front" and "back" fragments in the framebuffer.

-- Tom

m2
03-07-2003, 05:20 AM
No, the paper explicitly states that "partially-transparent surfaces must still be rendered in back-to-front order".

Dang. Wrong letter! I was thinking of the R-Buffer. :-(

V-man
03-07-2003, 07:45 AM
Sounds like it might be like a 3D texture where the depth is used for the stream of fragments.

I wonder what the API looks like.
Would be nice to have their modified MESA at hand. http://www.opengl.org/discussion_boards/ubb/smile.gif

MZ
03-07-2003, 08:27 AM
davepermen, you say in your real work with R300 you never needed more than 8 texcoords, 16 texunits, etc. That's reasonable, but try to apply your own argument to yourself: did you ever needed shader longer than 1000 instructions?

I can't understand how "unlimited" can impress you guys so considerably more than "limited-to-obscene-long-1024". Regarding purely FP length capablity, the difference between NV30 and R350 is neglible.

Now, if the R350 F-buffer didn't give you all what true F-buffer could theoretically give, then the difference would be even more neglible. I am abstracting from practical usablity of 9+ texcoords or 17+ texunits. But this all would simply mean that R350 will not save you from multipass even a bit more than NV30, unless you run 1025+ long shader.

Another thing: from what sireric wrote, it seems like F-buffers will not be transparent for user, and they will have to be explicitly allocated (super-buffers were mentioned). This, I guess, will require estimating max amount of written fragments, what may involve estimating depth complexity of rendered objects.

Yet another thing: we don't know yet how many temp values can be output in single pass. If there are not enough of them, this may force shader compiler to produce longer code, because when you can't store temp value, you will have to recompute it from scratch in next pass(es). On the other hand, the more temp outputs are allowed, the larger bandwidth overhead of multipass is, and more buffers to allocate.

These are all speculations, but if they show true, I'll appreciate NV30 solution as easier to use.


Now OT:
IMO, all those technical news of yesterday are really shadowed by announcement of GFFX 5200 price (100$)

Tom Nuydens
03-07-2003, 11:07 AM
I agree that today's hardware probably doesn't warrant f-buffers yet, because any shader that requires the f-buffers to kick in would probably run too slow anyway. However, this is the first card with f-buffer support, and that's still very cool from a nerd's perspective http://www.opengl.org/discussion_boards/ubb/smile.gif

From a developer's point of view, the generalized FP texture support (2D/3D/cube) is more interesting, as are the multiple render targets (but 9700 already had those, I believe). These features aren't available on NV30, AFAIK, which is a real shame.

All in all, I'm more interested in details on ATI's f-buffer implementation than I am in any extra flexibility it might give me. How much storage do they allocate for the f-buffers? What do they do in the case of overflow, and is any application intervention required when it happens?

-- Tom

davepermen
03-07-2003, 01:41 PM
Originally posted by MZ:

davepermen, you say in your real work with R300 you never needed more than 8 texcoords, 16 texunits, etc. That's reasonable, but try to apply your own argument to yourself: did you ever needed shader longer than 1000 instructions?

yes.

i do a lot of raytracing stuff http://www.opengl.org/discussion_boards/ubb/biggrin.gif

harsman
03-08-2003, 07:04 AM
MZ, if the driver compiles shaders directly from a HLSL, an f-buffer should be completely transparent to the app. However, if you don't use GL2 slang but something more lowlevel you'll probably have to use it explicitly. I'm guessing this is why they say they'll support it in GL2, but not in Direct3D. With the Direct3D HLSL, the driver never sees the high level code AFAIK, MS' runtime compile sit to pixelshader asm and sends it to the driver. And of course, the interestng difference to NV30 is performance, which cards performs better on long shaders? Since the Quadro supports 2048 instructions, I think Nvidia could support any number of instructions if they wanted to.

dorbie
03-09-2003, 05:54 PM
I have a few questions about the implementation. The limitations are made clear from the paper but a major feature like this is enormously interesting and details are critical.

Can we get more information on how the API is exposed, is it a just additional registers? Do you require exact fragment replication between passes? How many fragments * stores can be held in the f-buffer before it overflows? What happens when the f-buffer overflows? Will you rely on the HLSL compiler technology to solve any of these and if so will this ever be exposed at a lower level or is it too much of a support nightmare?

dorbie
03-09-2003, 06:17 PM
Tom, there may be multipass shaders now that would benefit from the f-buffer. No need to mangle your passes into a combination that works with limited framebuffer 'registers', AND if you have fine enough grained multipass a lot of the time it's going to be fetching the previous passes result from the on chip f-buffer instead of the destination framebuffer in VRAM, so the f-buffer could end up accelerating some of TODAYS multipass ONLY IF it's tuned to take advantage of it. So rather than the fbuffer kicking in and you going to another lower level of performance it would hopefully kick in currently, after you restructure your application a bit and you'd see a performance win (this is of course wildly optimistic :-).

We still don't know some critical stuff, most importantly how big is the buffer, and is overflow a disaster. It'd also be interesting to know if this is entirely new memory or does it eat into other on chip cache.

The API is key as well, noone's spoken to this at all. If it's low level with just additional registers then it may be tricky to exploit in conventional apps without some middlewear or compilation magic. Then let's say you DO decide to exploit the f-buffer, you may be trading state thrashing across passes and cache flushes on stuff like tiled texture for f-buffer fetches, for some stuff this is still a win, for other stuff it's not. It seemed great to think about the f-buffer in isolation and how it saves a lot of persistent framebuffer memory but it ain't as simple as that :-(.

For implementing longer shaders with no changes in other state it's clearly a win IF you can switch the shader itself quickly enough, because you don't really care the drawing order of primitives (you do about meshing though!), buy again what happens for large on screen primitives that hit a lot of fragments do you have to subdivide or just take it in the shorts with an overflow? It almost demands some kind of on chip subdivision of rasterization and automated sequential application of multiple shaders before f-buffer overflow bites you.

We need more info from ATI, it may be that even they don't know how to best exploit this strange beast yet.

[This message has been edited by dorbie (edited 03-09-2003).]