GF3 Z-occlusion performance? (Sorry, not OpenGL releated)

Sorry, this is not OpenGL related … but I think this is the best forum to get good technical answers of this question.
http://www.aceshardware.com/Spades/read.php?article_id=25000228

According to this review it seams that GF3’s Z-occlussion isn’t as effecient as ATi’s HyperZ. I though they were pretty much the same, but in VillageMark Radeon outperforms GF3, which I find a little surprising. Anyone having a good explanation of this?
Matt? Cass?

VillageMark is distributed and created by Imagination Technologies (PowerVR) and underlines the strong points of their own product, the Kyro II. So, you could say this benchmark is biased.

Says it all really. Considering Geforce 3 comes top in all the other comparisons on that review.

Nutty

Sure, but why do Radeon beat the GF3? I still find that confusing. In fact also, the benchmark is created to take pretty good advantage of the hardware. It even uses T&L even though the Kyro cards doesn’t support it.

Maybe it simply shows that even the GF3 is not perfect …
Reading various comparisons, I discovered that the GF3 is not so much faster than the actual high end GF2 cards.
But actually the GF3 is the only one to offer so many extensions and DX8 features (when they finally manage to make it stable lol)

[This message has been edited by paddy (edited 04-29-2001).]

I was rather unimpressed by that article; I saw a lot of obvious errors in it.

I don’t think it’s even worth looking at that particular benchmark.

  • Matt

Originally posted by mcraighead:
[b]I was rather unimpressed by that article; I saw a lot of obvious errors in it.

I don’t think it’s even worth looking at that particular benchmark.

  • Matt[/b]

An example to back up that statement?

I can add that according to Dave over at beyond3d.com nVidia culls on perpixel level while ATi cull on per block level. That should explain why Radeons are much more efficient in this benchmark. Indeed, it should be much faster in the majority of the apps out there, unless of course the polygons starts to get so small on screen that they usually don’t cover a whole block.

Originally posted by Humus:
An example to back up that statement?

All right, I can’t talk about many of the errors in the article, but here’s one that stuck out as blatantly obvious.

The methodology used for claiming what the “boost from using optimized code paths” in Serious Sam is completely broken. The percentage gain numbers given mean absolutely nothing. In fact, it’s downright misleading the way they describe the first benchmark as “optimized”.

And it sounds like Dave is confused…

  • Matt

Sure, I can agree that those numbers doesn’t mean much, but it’s still interesting to see what effect it has when a game isn’t optimized for a card or is optimized for only a certain card or range of cards.
(I also understand that you didn’t like that test when it has comments like “I was wondering, what would have happened if Croteam (the designers of Serious Sam) would have resorted to non-optimized settings (read: NVIDIA is the lowest common denominator).” )

Anyway, so Dave is wrong/less correct?
So what is the fundamental difference between the ATi way and the nVidia way?
There’s got to be a reason why a GF3 with more than twice the fillrate gets beaten by the almost one year old Radeon card when there’s heavy overdraw.

I’m pretty sure the Geforce 3 and the Radeon uses the same z-buffer compression, which can theoretically reach 4:1 savings.

The GF3 and RAdeon also both have similar ‘fast z-clear’ functions that zaps all of the data in the z-buffer. This supposedly also speeds up performance.

Nvidia actually has stated that these features do not provide any major performance gains on the GF3. Note that this is NOT because the functions were poorly implimentated, but rather it is because of the increased memory bandwidth of the GF3. They are simply not needed as badly. In contrast, the Radeon can benefit more from these features because of its comparable limited memory bandwidth.

I’ve always found great reviews on Ace’s Hardware. Dissing them cuz the test they used is wack isn’t too cool. Of course, we all know Nvidia feelings on anything dealing with the KyroII.

If you are worried about biased reviews, check out http://www.anandtech.com. It’s the best one out there in my opinion. Anand rules. There a bunch of info on his site about the GF3 and Radeon.
http://www.anandtech.com/showdoc.html?i=1426

Funk.

[This message has been edited by Funk_dat (edited 04-30-2001).]

Well, I’ve found that the vast majority of video card web sites are very poor. Aces’ is good for CPUs, but iffy for 3D. B3D has a good reputation, but I’m unimpressed.

The aceshardware article shows a pretty severe misunderstanding of what the various Serious Sam settings mean. As I already said, the word “optimized” is extremely misleading! “Customized” would be much more accurate, and even then, some of the customizations they use for us are actually bad customizations.

The percentage numbers are absolutely meaningless. They neglect such obvious issues as CPU limitations! (This is a common problem with “FSAA performance hit” comparisons. Instead of benchmarking, say, 12x9 vs. 6x4x4, they benchmark 6x4 against 6x4x4. As a result it is virtually impossible to interpret the results in a meaningful way.)

The fact is that there is a huge knowledge gap between the people who run these web sites and those of us who work on these products for our jobs. It took me only a few months inside NVIDIA to realize how frequently these web sites say things that are just outright wrong. It’s extremely bad journalism.

  • Matt

Yes, I understand and agree fully on that.
But back on topic, I’m very interested in knowning how the HierachicalZ/Zocclusion stuff differs technically.
My understanding is that the ATi stuff is dividing the screen into many small 8x8 tiles. For each tile a min & max depth is stored for that tile. FastZClear only needs to clear the min/max values. When rendering it renders to a small onchip buffer, but first it calcs the max/min depth for the incoming poly in that tile. If the new poly is entirely in front of the stored it renders with depthtest disabled. If it’s entirely behind the stored it goes on to the next tile without the need to render anything. Else, it’s needs to render normally. Then add to this the Zcompression which I guess kicks in when a onchip tile is ready and about to be written into memory.
I cannot 100% back up this, but that’s how I think it works. (Maybe someone from ATi can correct me if I’m wrong).

Now how does the nVidia way differ?
I initially had the impression that it was essentially the same technique as ATi’s, but seeing that it doesn’t give the same performance boost as ATi’s it got to be something that differs. Daves explanation made some sense to me, but if you’re saying that hes wrong that how does it work?

For reasons that should be obvious, I am not about to tell everyone what we do.

However, if you think that all of our fast Z work is for nothing, that’s not true. It provides significant real speedups.

  • Matt

<sarcasm>
Hey, you can tell me. It’s not like ATi is going to copy your way of doing it when their own is twice as fast.
</sarcams>

Yes, I understand that it’s not for nothing. GF3 is 76% faster than GF2 in this benchmark with similar fillrate, so it can’t be useless.
But can you at least tell me if it’s true that it culls at pixel level or if it’s done at block level.

I’m pretty sure it’s done on the pixel level.

Nutty

Traditional depth testing compares the pixels after the pixel has already been shaded and textured. The GF3 and Radeon have features the compare the pixels to the z-buffer before the rendering pipeline. This is where the savings come from. All of the fragment operations(texture, fog, stencil, etc.) are saved on the discarded pixel.

The technique Nvidia uses to send the fragments to the new z-hardware I don’t know. And since Matt wont talk about it, I assume it’s proprietary. Wether it’s per pixel or per ‘tile’, I’m sure it’s optimized quite nicely for the hardware.

The GF3 and Radeon also have z-buffer compression and fast-z clear which also add speed and are self explainatory.

One last thing: The radeon will benefit more from these features because of it’s limited memory bandwidth. This doesn’t mean that it’ll perform better than the GF3 in the end, just that it’ll get a bigger boost in resonable applications.

Hope I cleared some stuff up for you.

Funk.

also…

I didnt mean to diss Nvidia hardware or anything when I said that it the z-buffer stuff wouldnt give any major performance gains. I’m sure the statement I read was in regards to resonable applications and systems where often times limited CPUs and game-style geometry were involved.

The more bandwidth the better!

Funk.

Originally posted by Funk_dat:
[b]Traditional depth testing compares the pixels after the pixel has already been shaded and textured. The GF3 and Radeon have features the compare the pixels to the z-buffer before the rendering pipeline. This is where the savings come from. All of the fragment operations(texture, fog, stencil, etc.) are saved on the discarded pixel.

The technique Nvidia uses to send the fragments to the new z-hardware I don’t know. And since Matt wont talk about it, I assume it’s proprietary. Wether it’s per pixel or per ‘tile’, I’m sure it’s optimized quite nicely for the hardware.

The GF3 and Radeon also have z-buffer compression and fast-z clear which also add speed and are self explainatory.

One last thing: The radeon will benefit more from these features because of it’s limited memory bandwidth. This doesn’t mean that it’ll perform better than the GF3 in the end, just that it’ll get a bigger boost in resonable applications.

Hope I cleared some stuff up for you.

Funk.[/b]

Thanks, but you sort of summed up what I already knew. What I’m interested in is the difference between the two implementations.
Btw, I don’t know if the reasoning that Radeon should benefit more because it has less bandwidth actually holds true. You must remember that the Radeon only has two pipelines to feed while GF3 has four. Bandwidth / rendered pixel is higher on the Radeon than on GF3.

What I’m interested in is the difference between the two implementations.

Humus, you aint gonna get that unless someone breaks their NDA with nvidia. Divulging the inner workings of how their implementation works, is a guaranteed way of getting a slap!

Nutty

Mhhh…
ATI did not give the exact specifications (of course) but at least we have a guess of how it’s working.
I don’t know how the technology differs, but apparently, as the GF3 as a much higher mem bandwith, it’s low scores show the NVidia system is not as good as the ATI one