PDA

View Full Version : Carmack's .plan



Zeno
02-11-2002, 02:52 PM
JC has an updated .plan where he discusses feature and performance differences between nVIDIA and ATI, as well as the new naming conventions.
http://www.shacknews.com/finger/?fid=johnc@idsoftware.com

Just thought some of you might be interested http://www.opengl.org/discussion_boards/ubb/smile.gif

-- Zeno

Elixer
02-11-2002, 03:01 PM
Oh sure, (almost)everyone in this forum already said that nvidia should have named the GF4 MX something else, and look who gets on the bandwagon! http://www.opengl.org/discussion_boards/ubb/wink.gif

Humus
02-11-2002, 03:07 PM
Pretty interesting indeed!

Lars
02-11-2002, 03:15 PM
What i think to be most interesting, is that the ati drivers initially have that much problems. I mean how can it be that they have so much problems delivering bug free drivers.
JC didn't mention the number of bugs he found and where, with the drivers, but it seemd there weren't less.
What about you guys here with radeon cards, have you experienced any problems using the ati-extensions ?
Or is JC using features nobody else from us even thought about. I haven't read any thread here around describing serious problems with the radeon.
I for my self can't test this, cause i only own GeForce Cards.

Lars

dorbie
02-11-2002, 04:13 PM
It's interesting that John Carmack is suffering with z buffer invariance between passes when he switches between normal rendering and vertex programming on NVIDIA. I wonder if this is beyond the scope of glPolygonOffset to fix or if a reliable working glPolygonOffset implementation would solve the problem.

SirKnight
02-11-2002, 04:41 PM
Well Carmack did say that NVIDIA is supposed to make a driver fix or something that will let us use vertex programs and fixed functions in the same pass. I sure hope so, that problem caused me a lot of grief not too long ago. Also from what JC was saying about the ATi pixel shader stuff, it actually made me wish i had a radeon. http://www.opengl.org/discussion_boards/ubb/wink.gif The ability to read from a texture twice sounds darn cool. Would it be possible to add this kind of support for the geforce 3 and 4Ti cards in the drivers? Or would that have to be a hardware change? Unfortuantly i know the ability to have 6 texture units would have to be a hardware change, too bad.

Also while we are on the topic of carmack discussing the two cards (ati and nvidia) i found that it was kind of wierd that the ati cards support 2 nvidia extensions yet the nvidia cards do not support any ati extensions. What is up with that? http://www.opengl.org/discussion_boards/ubb/smile.gif I think it would be cool for nvidia to support all ati extensions and ati to support all nvidia extensions. But i dont think that would be possible since both chipsets are very different. That would make a larger die prolly and make the chips cost a hell of a lot more. O well, we still can dream cant we? http://www.opengl.org/discussion_boards/ubb/wink.gif

-SirKnight

dorbie
02-11-2002, 05:43 PM
This is a complex issue. A reasonable rule of thumb is that both NVIDIA and ATI will do what's in their own interests. ATI & NVIDIA are not in the same position so their actions are different, but I wouldn't put it down to altruism. There may also be other issues we never see like I.P. ownership. Personally I think it's in both their interests to work on common API extensions because it will drive high end sales. The biggest obstacle to high end penetration is advanced feature support in games and the biggest obstacle to advanced feature support is a lack of easy to use, shared vendor API extensions for those features.


[This message has been edited by dorbie (edited 02-11-2002).]

Ysaneya
02-11-2002, 11:46 PM
> What about you guys here with radeon cards, have you experienced any problems using the ati-extensions ?

Yeah, tons of problems. At a time i was thinking of simply switching back to my Geforce 2, but after reinstalling the drivers 4 times and reinstalled Win2k, i finally got ride of the (main) bugs. To give you an example, the wireframe mode with texture mapping was randomly crashing my application.

Y.

Lighthouse
02-12-2002, 01:13 AM
JC has posted the following update to his plan.

"8:50 pm addendum: Mark Kilgard at Nvidia said that the current drivers already
support the vertex program option to be invarint with the fixed function path,
and that it turned out to be one instruction FASTER, not slower."

Thought this may be of interest.

Humus
02-12-2002, 01:42 AM
Originally posted by Lars:
What i think to be most interesting, is that the ati drivers initially have that much problems. I mean how can it be that they have so much problems delivering bug free drivers.
JC didn't mention the number of bugs he found and where, with the drivers, but it seemd there weren't less.
What about you guys here with radeon cards, have you experienced any problems using the ati-extensions ?
Or is JC using features nobody else from us even thought about. I haven't read any thread here around describing serious problems with the radeon.
I for my self can't test this, cause i only own GeForce Cards.

Lars

I had some problems with my computer freezing with certain vertex shaders on some drivers, but I did away with vertex shaders since they weren't much use anyway in my project, not sure if it's been fixed. There were also a problem with glSetFragmentShaderConstantATI() which I was going to report just to find it was solved in the latest drivers before I got to report it. There were a problem with going in and out of fullscreen mode, not a biggie, has been solved. There were a problem with mipmapped cubemaps, which supposedly have been solved but no driver with the fix is available yet, but should be within days.
With the latest driver the only problem that remains for me is the cubemap bug, but it can be worked around temporally by turning mipmapping off.

opla
02-12-2002, 03:48 AM
There were a problem with mipmapped cubemaps, which supposedly have been solved
it IS fixed in the driver 6.13.2552

Nutty
02-12-2002, 04:00 AM
How do we enable this option to enable Vertex programs, and fixed function paths to be used together in multipass?

Surely if it's faster than before, why not just make it the norm all the time, or will this break stuff?

paddy
02-12-2002, 04:14 AM
Who the hell is this John Carmack ???

...

Sorry, couldn't resist, those who have read a certain post on slashdot will understand the joke http://www.opengl.org/discussion_boards/ubb/wink.gif

knackered
02-12-2002, 04:30 AM
Does the 8500 support VAR? Or have its own version of VAR? (is that what 'vertex objects' are in carmacks article?)
I'd be lost without VAR, it's really given my project a boost.
I tried an 8500 a few months ago, but the drivers were so bugged that I didn't have the time to wait for their updated drivers. I was also getting worse frame rates on it than the geforce2 gts, when I finally got it running an opengl app.

opla
02-12-2002, 05:21 AM
Does the 8500 support VAR?no

Or have its own version of VAR? (is that what 'vertex objects' are in carmacks article?)yes, and it's much better than VAR, you don't have to manage and synchronize the AGP memory.
you just call glNewObjectBufferATI() with a byte size and the pointer to your data, and you have an ID for your data in fast memory.
Then you use glArrayObjectATI() instead of gl*Pointer().

I'd be lost without VAR, it's really given my project a boost.GL_ATI_vertex_array_object is fast too

I tried an 8500 a few months ago, but the drivers were so bugged that I didn't have the time to wait for their updated drivers.it's true that the drivers are still bugged, but it's getting better...

Tom Nuydens
02-12-2002, 06:06 AM
Originally posted by Nutty:
How do we enable this option to enable Vertex programs, and fixed function paths to be used together in multipass?

Maybe it'll be exposed through the new GL_NV_vertex_program1_1 extension?

-- Tom

Gorg
02-12-2002, 06:48 AM
Doh! Opened my big mouth again. http://www.opengl.org/discussion_boards/ubb/smile.gif

[This message has been edited by Gorg (edited 02-12-2002).]

cass
02-12-2002, 07:11 AM
Tom,

Yes, position invariance will be exposed in NV_vertex_program1_1.

Docs should be available soon.

Thanks -
Cass

Nutty
02-12-2002, 07:49 AM
Cass, How come Mark Kilgard says this new feature for invariance in mixed pass rendering is slightly faster than normal?

Just curious thats all.. If it's a tight lipped secret I understand.

Any time scale for when nvidia's new extensions are likely to be published?

Cheers,
Nutty

dorbie
02-12-2002, 08:39 AM
I think it's Carmack saying that the vertex program he writes with matched Z is one instruction less than his earlier vertex program without matched Z. Maybe it was mjk saying it, but the jist is the same.

[This message has been edited by dorbie (edited 02-12-2002).]

Zak McKrakem
02-12-2002, 11:07 AM
Originally posted by opla:
yes, and it's much better than VAR, you don't have to manage and synchronize the AGP memory.
you just call glNewObjectBufferATI() with a byte size and the pointer to your data, and you have an ID for your data in fast memory.
Then you use glArrayObjectATI() instead of gl*Pointer().


This is your opinion, of course. Mine is different. In my opinion, ATI_vertex_array_object has a small design flaw: you need to have your geometry in system memory to use it. For static geometry it is not a big issue but for dynamic geometry there is a double copy: you have to create it in system memory and them call to UpdateObjectBufferATI to use it (it probably makes a copy of the data to AGP memory).
With VAR you don't have this problem as you write directly to AGP memory.

cass
02-12-2002, 11:18 AM
With position invariant programs, you don't calculate o[HPOS] in your vertex program, and that allows the driver to do it a little more efficiently. It actually is equivalent to ~ 1-2 instructions *less* than if you did it yourself.

Sorry, Nutty, the details of this are not something we expose publicly.

Thanks -
Cass

Nutty
02-12-2002, 11:43 AM
Are you saying all I have to do, is not write out my transformed vertex position to o[HPOS], and I'll get invarient results with mixed function multipass?

Thanks,
Nutty

jwatte
02-12-2002, 01:52 PM
Nutty: He's saying that whey they use the "plain" OpenGL pipeline on the card, which is apparently NOT entirey composed of shader opcodes that we have access to, the number of clock cycles or opcode equivalents that a vertex transform takes is one less than if you were executing the semantically equivalent program, as defined by the vertex shader language available to us.

Opla:


yes, and it's much better than VAR, you don't have to manage and synchronize the AGP memory.


Except I can manage it more efficiently than they can, as I know what the pattern of writing and accessing is. If I allocate a VAR and split it in two, I only need to test a fence when I pass the end of a chunk, a la double-buffer. This happens maybe once every 10 or 20 meshes I render, depending on the size of the meshes.

Meanwhile, the ATI driver has to set a fence for each buffer I upload and render, as it can't know whether I will soon ask to re-upload to that buffer or not. Or it will have to do mumbo-jumbo swithc-aroo behind the scenes, which degrades to very much the same thing in the end. They cannot do the double-buffering thing, because they don't know the lifetime of each individual object upload I make.

Then there's the problem of having to upload geometry to the buffer in the first place. If I'm dynamically generating the geometry, they impose an extra copy pass on me. I have no idea what the implementation is, but they may even blow my L1 cache when they upload the data, even if I'm conscientious about writing to memory with un-cached stores.

I've heard several times that the ATI people are amenable to adding an extension so you can get access to the buffer. If they do that, and relax synchronization so that I don't have to synchronize per mesh, then they'll be equivalent. Until then, the extension may appear simpler to use, but it's simpler to use in the same way that glVertex3f() is simpler to use than glDrawRangeElements().


[This message has been edited by jwatte (edited 02-12-2002).]

Humus
02-12-2002, 03:23 PM
Originally posted by jwatte:
I've heard several times that the ATI people are amenable to adding an extension so you can get access to the buffer. If they do that, and relax synchronization so that I don't have to synchronize per mesh, then they'll be equivalent. Until then, the extension may appear simpler to use, but it's simpler to use in the same way that glVertex3f() is simpler to use than glDrawRangeElements().


I guess the new GL_ATI_map_object_buffer extension may be just this.

dorbie
02-13-2002, 08:32 AM
Nutty, they are deliberately not saying how this is done.

It looks like they gave Carmack access to a back door they are not ready to reveal. There is a different vertex program which doesn't compute this information, but it probably involves doing something else that they don't want to tell everyone about, at least not yet.

cass
02-13-2002, 08:55 AM
Just to be clear, position invariant programs are defined in NV_vertex_program1_1. I won't elaborate on that spec until it's public - which should be Real Soon Now.

Cass

Zak McKrakem
02-15-2002, 02:57 AM
There is also the fact that ATI's Vertex_Array_Object extension is just available on Radeon 8500 but not on lower Radeons (7500, 7200 and previous model). So in those models you have no 'fast' way to pass vertex to GPU.
What ATI think about that?
How they suggest us to send the geometry on those cards? (On a Radeon 7500, I made a simple test using OGL standard arrays, with and without CVA. And them using D3D with Vertex Buffers. D3D version is about 30 times faster. With VAR extension on a GeForce card, OGL version is more or less the same speed than D3D version)
Anyway, it will be good for everyone to have a single and common way to do it for nVidia and ATI cards (and 3DLabs, Matrox, ...). The lack of a common interface for this part of the pipeline, the fact that it is the same action for every GPU, and the fact that it has been discussed on ARB meetings (as you can read in ARB meeting notes) show the difficulties of the ARB to reach a consensus about a single and needed feature in modern cards.
In my opinion, this just reflect the lack of capacity of two IHV like nVidia and ATI to reach an agreement that will benefit OGL. And just for marketing reasons. It seems that they need a dictator (like MS with D3D) to do things right and useful for all of us.
Do you thing having two ways of send geometry to videocards benefit somebody?
Do you thing having two vertex programs APIs, for two different cards, to do exactly the same thing (small differences) benefit somebody?
It doesn’t benefit developers, it doesn’t benefit OGL and, this is the worst, it doesn’t benefit IHV (as you probably don’t use both ways, and maybe none).
I liked when there was people like Michael Gold from nVidia and Tom Frisinger from ATI that sit together and creates common extensions like texture_env_combine.

dorbie
02-15-2002, 04:22 AM
Zak, are you asking us or asking ATI?

You are saying that on some cards the fastest dispatch in D3D is 30 times faster than the fastest dispatch in OpenGL, I doubt it, did you try other methods?

Try using glDrawElements, at the very least the drivers will be optimized for this because of Quake3 benchmarking.

Here's what Carmack (I think) wrote on Quake3 dispatch when advising IHVs on optimization:
http://www.quake3arena.com/news/glopt.html

Try sticking to these rendering paths on the card you have performance issues with.

Zak McKrakem
02-15-2002, 05:06 AM
Believe me. It is not a 'real' application because it draws the same model (with ~40000 faces, ~47000 vertex) in 16 different positions using just one texture with one directional light and infinite viewer.
Using standard arrays it has to send the geometry each time it draws the model (in each position) from sysmem to the card.
QIII uses CVA but using small chunks of vertex. As CVA 'is not well defined' it seems that IHV have written the extension just for QIII case. So it doesn't seem to do nothing with my ~47000 vertex model.
With D3D I create an static VB so it seems that the model is stored in video memory.
With VAR, I can put the model in video memory and it is more or less same speed that D3D test. Or I can put the model in AGP memory and it depends on the AGP configuration that it can be more or less the same speed (with good AGP4x, fast writes, sideband, ...), a bit slower (bad AGP4x configuration) or more or less half the speed (with AGP2x configuration).
Using standard arrays it is about 30 times slower.
Note that this is just a test. In a real application you don't usually use a 47000 vertex model so speed between different systems in ‘my game’ is not as noticeable.

But, as I said, with Radeon 7500, 7200, ... you don't have any other way to send the geometry than using 'standard' arrays.
I tried the test with ATI_vertex_array_object but it locked the computer. (I have to try with newest drivers as it seems that they have fixed some problems I saw with previous ones)
I have to said that ATI drivers have been improved a lot from first Radeon. Now, not using CVA, for the first time, with latest drivers, everything is working ok on my system. It is time to give their extensions another try. I'm happy about it.

Question for you programmers of games and other kind of applications. Do you like a common (for all IHV) api calls to send geometry to the cards? Maybe OGL2 proposal as an extension for current OGL?
Question for IHV: Is that difficult to create a common way to solve this? Haven't you read this forums with a lot of questions about 'best way to send geometry', 'using CVA', 'Using display lists to send geometry', 'Using VAR', and similars? Doesn't it mean anything to you?
Thanks.

[This message has been edited by Zak McKrakem (edited 02-15-2002).]

Ysaneya
02-15-2002, 06:40 AM
I 100% agree with Zak. Yes, trust him. I have a Radeon 8500. Without using the VAO extension (just plain vertex arrays), the most i can get is 2 Millions Tris/sec. In D3D on the same system/hardware, i can reach up to 40 Millions Tris/sec.

With VAO i get better results ( up to 13 MTris/sec ) but it's still pretty far from D3D's peak.

Y.

dginsburg
02-15-2002, 07:39 AM
ATI_vertex_array_object and ATI_map_object_buffer have recently been implemented for all Radeon family cards (including the 7500). It's not in the current driver release, but it will appear in the next one. The only thing that the 7500 can not support is ATI_element_array because the HW does not support it.

--Dan

Zak McKrakem
02-15-2002, 11:57 AM
Originally posted by dginsburg:
ATI_vertex_array_object and ATI_map_object_buffer have recently been implemented for all Radeon family cards (including the 7500). It's not in the current driver release, but it will appear in the next one. The only thing that the 7500 can not support is ATI_element_array because the HW does not support it.


Dan, it is good to hear that. If you let me to suggest an addition to the extension, then I will suggest you to include something like OGL2 Direct Access.
The extension is very similar to the Vertex Arrays Objects that appears in OGL2 white paper.
This way, for dynamic objects, you should not store the model in system memory before calling UpdateObjectBufferATI.
I can be wrong, but I think that AdquireDirectPointer is very similar to D3D's Lock and ReleaseDirectPointer similar to D3D's Unlock and as you have those functions implemented in your driver it could be easy to create the gl interface.
And, as this is not OGL2, you can relax the spec to meet your current hw requirements. It can be a good base for a desired ARB extension and a good bridge for future OGL2.0

Thank you.

evanGLizr
02-15-2002, 12:04 PM
Originally posted by Zak McKrakem:
Believe me. It is not a 'real' application because it draws the same model (with ~40000 faces, ~47000 vertex) in 16 different positions using just one texture with one directional light and infinite viewer.

The best way to feed a graphics card with static models is using display lists. And if you embed a glDrawElements vertex array inside a display list, even better: that way you hint the driver that:
a) The model is not going to change (it's a display list).
b) It can draw the display list using indexed primitives (the driver could "guess" this without the glDrawElement hint, but just in case).

The driver will choose the fastest method to display that, be it AGP memory or even video memory.

As a driver developer said in OpenGL gamedev discussion list:

When you create a display list, I really get to go to town because i
assume you're going to want to use your list more than once.

Korval
02-15-2002, 12:20 PM
The best way to feed a graphics card with static models is using display lists.

Not true. According to nVidia, the fastest way to send vertex data, static or dynamic, is with VAR. VAR, even in AGP memory, beats their own display list code.

evanGLizr
02-15-2002, 12:34 PM
Originally posted by Korval:
Not true. According to nVidia, the fastest way to send vertex data, static or dynamic, is with VAR. VAR, even in AGP memory, beats their own display list code.

According to nvidia http://developer.nvidia.com/view.asp?IO=ogl_performance_faq



10. Should I use display lists for static geometry?
Yes, they are simple to use and the driver will choose the optimal way to transfer the data to the GPU.


And the best thing is that you will get the best from every driver, not only from Nvidia's.

cix>foo
02-15-2002, 12:54 PM
That performance FAQ is out of date; there's another one which says VAR is faster. Display lists take no advantage of the GPU vertex cache; each vertex is sent, lit, and transformed as a separate entity, and the GL driver really can't optimise it easily without slowing down the initial processing of the display list fairly significantly (well, it might do that, but it might not - I suspect not). It's "theoretically" up to nearly 4x faster to draw something using VAR thanks to the cache.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

evanGLizr
02-15-2002, 01:19 PM
Originally posted by cix>foo:
That performance FAQ is out of date; there's another one which says VAR is faster. Display lists take no advantage of the GPU vertex cache; each vertex is sent, lit, and transformed as a separate entity, and the GL driver really can't optimise it easily without slowing down the initial processing of the display list fairly significantly (well, it might do that, but it might not - I suspect not). It's "theoretically" up to nearly 4x faster to draw something using VAR thanks to the cache.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

That's why I suggested embedding a glDrawElements vertex array inside a display list: the driver doesn't have to do any guesswork at all, it knows for sure it's an indexed geometry with no state changes in the middle. If the driver cannot be bothered to optimize that, it's another matter.

Anyway, I still think there's much more behind the scenes of a display list that what people think.

[Edit: Ooops, I thought you were Cas as in Cass http://www.opengl.org/discussion_boards/ubb/smile.gif]

[This message has been edited by evanGLizr (edited 02-15-2002).]

cix>foo
02-16-2002, 02:57 PM
Naw, I'm just plain old Cas wot doesn't know much relatively http://www.opengl.org/discussion_boards/ubb/smile.gif

Interesting idea about drawelements inside the display list but I suspect that this is such a rare path they haven't bothered with it.

Cas http://www.opengl.org/discussion_boards/ubb/smile.gif

Korval
02-16-2002, 05:57 PM
That's why I suggested embedding a glDrawElements vertex array inside a display list: the driver doesn't have to do any guesswork at all, it knows for sure it's an indexed geometry with no state changes in the middle.

Any decent optimized display list can do the same thing for glBegin/End with glArrayElement.

And the faq you pointed me to specifically says that the fastest way to transfer geometry to the graphics chip on nVidia hardware is "DrawElements/DrawArrays Using wglAllocateMemoryNV(size,0,0,1)", which means VAR in video memory. The second fastest is "DrawElements/DrawArrays Using wglAllocateMemoryNV(size,0,0,.5)", which is VAR in AGP memory. The third is display lists.

And I seriously doubt that encapsulating the glDrawElements calls on VAR are going to be faster than calling them directly. Who knows what drivers have to do behind the scenes to make display lists work; it could take longer than a simple function call.

knackered
02-17-2002, 02:56 AM
I don't understand why we have to keep our index arrays in system memory. I know because the driver (which exists in system memory) has to access the indices, but why does it?

Mazy
02-18-2002, 12:37 AM
Soneone who has an URL to the NEW preformance FAQ?