PDA

View Full Version : NV40



Won
04-14-2004, 07:31 AM
Seems really cool. Now that the NDA is lifted, does anyone who might have one of these things already know whether the glReadPixels performance is improved?

-Won

davepermen
04-14-2004, 07:48 AM
looks like great work done this time by nvidia, congrats in advantage. now their web departement is to blame. i can not download anything interesting from nzone at all.. :(

Mazy
04-14-2004, 08:55 AM
won, why is that so important? they have pixel buffers for async readback, but more important, you should let the data stay on the card if youre aim for speed..

Won
04-14-2004, 09:06 AM
Mazy --

What you say is true and obvious, and in a perfect world I would do just that. However, sometimes you need to get data off the video card, and for my particular application I need to do it quickly. Suffice it to say, I don't intend to use GPUs in only traditional ways. Anyway, this feature is important to many other people besides me.

3Dlabs cards, while they don't support the async stuff (yet), do support fast AGP transfers in both directions. On a Wildcat VP, a glReadPixels call can get 700MB/sec over AGP 4x, which is reasonably close to the AGP 4x limit of roughly 1GB/sec. On a Geforce, a glReadPixels call can get 240MB/sec, which is reasonbly close to the 64-bit PCI limit of 266MB/sec. There's an order of magnitude (or more) that I'd like to get back.

-Won

Corrail
04-14-2004, 09:13 AM
Jep, this chip is really cool! It has some very nice features. I hope the chips with 8 or 12 pipes will be available soon too!

jwatte
04-14-2004, 09:15 AM
The problem is that, for really valid numbers on readback improvements, Intel needs to lift their NDA as well (on PCI Express chip sets).

dorbie
04-14-2004, 10:11 AM
Even if you get the PCI Express version of this card initially NVIDIA will have an AGP->PCI Express bus bridge on this thing, so it's not going to be native PCI Express out the gate although TH says it is "AGP 16x" on the card(?). Either way for readback it'll only come into it's own with PCI Express, although it's anyone's guess what the readback implementation will perform like right now over the bridge. Reads vs writes are still going to be hopelessly asymmetric on bandwidth until you eliminate the AGP bus.

Having said that, the performance is stunning. They've aimed very high with this thing. It's big and power hungry but that's intentional, they need it to win. Once again I'm amazed by what I can buy for $500 at Besy Buy.

Won
04-14-2004, 11:11 AM
This doesn't mean that their AGP implementation can't have fast AGP transfers in both directions; it only means they can't occur at the same exact time.

I'm guessing the PCX bridge is going to have a fairly minimal performance impact. There might be issues with cost/heat/reliability, but unless you're doing small, frequent bidirectional transfers, the internal AGP "16x" can probably handle the bandwidth. Since the PCX bridge is essentially soldered directly to the GPU, there's very little line capactance or whatever electronic limitations that are typical from, among other things, going across a edge connector.

-Won

Ostsol
04-14-2004, 11:16 AM
One of the guys down at Beyond3d.com calculated that clock for clock, pipe for pipe the NV40's pixel shading power is 1.4x that of the R300. I'm not entirely sure how accurate that is, but it's certain that NVidia has come out with a chip that is fundamentally more powerful and not just bigger.

Won
04-14-2004, 11:27 AM
Whoops. I knew I should've been more specific with the thread title. I was specifically asking about glReadPixels performance.

Having said that:

Clock for clock comparisons are basically meaningless. I remember Anandtech did a clk/clk comparison of various CPUs like the Pentium, Pentium MMX, Pentium Pro, Cyrix 686, AMD K5/6 or whatever the contemporaries were. It found that the Cyrix chip was fastest clock per clock. Who cares? It could only go a fraction of the clock speed of other microarchitectures, and clock speed scalability matters, too. It isn't an indepedent concern.

-Won

dorbie
04-14-2004, 12:04 PM
Won, PCIX isn't PCI Express. PCI-X is a different standard. The bus bridge is a chip, the point there is a protocol translation going on as well as GART aperture remapping etc, at the bare minimum the limitations of the original bus apply.

Ostol, I know it's not just bigger, my point was more about their commitment to winning, they've gone about as big and hot as they could bare to without getting crazy (or maybe it's borderline crazy). It's a monster, but that's a good thing IMHO.

yooyo
04-14-2004, 12:39 PM
Originally posted by Won:
Seems really cool. Now that the NDA is lifted, does anyone who might have one of these things already know whether the glReadPixels performance is improved?
Try to use PDR or PBO. It should be fast enough...

yooyo

Elixer
04-14-2004, 01:02 PM
Some of the new features of the card look REALLY cool, like the Vertex Frequency Stream Divider. Wonder if it will be exposed in openGL first, or DX?

Pretty soon, you will need a seperate PSU to drive these things. (Not that this is a bad thing mind you.)

Adrian
04-14-2004, 02:22 PM
Originally posted by yooyo:

Try to use PDR or PBO. It should be fast enough...
PDR does little to improve raw performance. We need to an order of magnitude improvement to start doing some really interesting stuff with readpixels.

Won
04-14-2004, 02:28 PM
Dorbie --

You misunderstand.

PCX is the name of the bridge (should've mentioned that). AGP has no INHERENT limitation on read bandwidth; it is simply not implemented on NVIDIA GPUs that I'm familiar with. 3Dlabs, for example, currently has fast AGP in both directions. Details in a previous post.

-Won

zeckensack
04-14-2004, 02:28 PM
I can only get 60 MB/s on my Radeon 9200, so consider yourself lucky :)

Re the NV40, I'm amazed. Looks like a great engineering achievement, flawless, save for the R300ishness in the anisotropic texture filter department. And whether or not that's a flaw is highly debatable, I suppose.

Won
04-14-2004, 02:32 PM
To spell it out further:

The PCX bridge is probably going to be a very efficient in translating AGP commands to and from PCI-Express commands, and it will do so in both directions. The reason why I believe this is because NVIDIA also plans on using PCX to bridge their PCI-Express native chipsets to AGP. This means that AGP transfers must go fast in both directions, otherwise when they flip it, you're going to have slow texture uploads. Same with the PCI-Express side.

If this is true, that means fast ReadPixels depends only on the GPU/driver, not the bridge.

-Won

dorbie
04-14-2004, 03:22 PM
Won, thanks for the explanation, I misread your PCX statement, sorry. I'm reluctant to infer too much from it's intended use in another context about it's performance in this one, but you make good points and I've learned something new.

jwatte
04-14-2004, 05:58 PM
dorbie: "Once again I'm amazed by what I can buy for $500 at Besy Buy."

Wow. Your Besy Buy must be much better than the Best Buy I go to in San Carlos. Here they don't have them yet, and won't for at least one, if not two more months. Where's yours? :-)

Korval
04-14-2004, 09:24 PM
I can't believe that, with all the power and functionality that NV40 promises, the best thing you can think of to discuss is glReadPixels performance.

It's a graphics chip. Draw some pictures with it.


Re the NV40, I'm amazed. Looks like a great engineering achievement, flawlessFlawless? We don't know that yet. I still want to know actual performance characteristic in the fragment program with both looping and 32-bit floats. As well as just how fast/slow "texture" accesses in the vertex shader are.

davepermen
04-15-2004, 12:32 AM
Originally posted by Korval:
[QB]It's a graphics chip. Draw some pictures with it.you know what? Adrian does some of the best, and most future-designed drawings you can imagine. he simply doesn't want to use hw the way it is used now, but try to find new ways to use the gpu to draw what we all want: beautiful graphics.

just because you don't need it doesn't mean it's useless.

and it's ANNOYING to see all the time those crippled half-failed agp supports. agp has great speed, in both directions. they just always "forget it". dissapointing.

but else, lets see adrian, if we can map your algos to nv40 completely :D

Adrian
04-15-2004, 12:42 AM
Originally posted by Korval:
I can't believe that, with all the power and functionality that NV40 promises, the best thing you can think of to discuss is glReadPixels performance.

It's a graphics chip. Draw some pictures with it.
Fast readback will allow good quality interactive Global Illumination. There are probably many other uses.

nutball
04-15-2004, 01:55 AM
Blending in floating-point render targets too, don't forget that! Very nice to have! :)

Ysaneya
04-15-2004, 02:21 AM
I'm also waiting for good readback performance. I have a procedural terrain algorithm, currently running on the CPU, that i'd love to shift on the GPU. Unfortunately the time to upload the data to the GPU and retrieve the result back (even if the pixel shader running on the GPU is 100x times faster than on the CPU) is higher than the time it takes to do everything on the CPU.

Y.

Won
04-15-2004, 04:25 AM
Korval --

Of course I plan on doing graphics with the bloody thing; I also happen to need glReadPixels (in fact, to do graphics). And who says that GPUs are only good for just graphics? All in all, your tacit value judgement is pretty flip, and your narrow thinking is pretty uncharacteristic of your usually thoughtful comments.

I can get the rest of the relevant information on any of several web sites (I'm sure you have as well). I got most of it weeks ago on the less-reputable rumor sites, and most of it was correct. I know enough to have a good idea what dynamic branching, multiple render targets, floating point orthogonality, etc. etc. buys me because it's been discussed here and elsewhere already; we're all primed to hear what we already know.

However, no hardware review site publishes a glReadPixels benchmark. NVIDIA's marketing doesn't promote it. Since I haven't been sitting on one for the past few weeks, how else can I find out but to ask?

Ok, maybe this question is more to your liking:

Does anyone know anything about the programmable video processor? Is it going to be accessible to OpenGL? Will I be able to use textures/AGP memory as inputs/outputs? Will it accelerate the imaging pipeline?

-Won

romanoGL
04-15-2004, 11:00 AM
Elixer,

do you know what is this Vertex Frequency Stream Divider? Can I find something about it on the Web?

Thanks,
Romano

MZ
04-15-2004, 01:15 PM
Originally posted by Elixer:
Some of the new features of the card look REALLY cool, like the Vertex Frequency Stream Divider. Wonder if it will be exposed in openGL first, or DX?Too late :cool:
HRESULT IDirect3DDevice9::SetStreamSourceFreq(UINT StreamNumber, UINT Divider);@Romano: msdn/dx9 (http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/directx9_c/directx/graphics/reference/Shaders/VertexShader3_0/VertexStreamFrequency.asp)