PDA

View Full Version : Sept. Meeting notes



Zak McKrakem
10-07-2002, 08:47 AM
Some interesting stuff...
http://www.opengl.org/developers/about/arb/notes/meeting_note_2002-09-18.html

"Two new vertex array object extension proposals were presented. This has become time-critical, and a working group will be formed immediately to converge on an ARB VAO specification." Oh... My god! Can it be true?

Ozzy
10-07-2002, 09:26 AM
Well i'm definitely *not* so enthousiast!!
quoting from the hours:
"GL2 objects would be defined in terms of client state and client machine units (also true of the ATI/NV proposal), which makes it hard to move VAOs onto the server. Principal design goal is accelerating rendering for direct rendering clients, not enabling more efficient indirect rendering. ATI/NV proposal is oriented towards more dynamic data, as well."

I haven't seen what was the proposal from 3DLABS exactly concerning VertexArrays but i feel a little bit sick when i think about static data or in other words data that are stored onboard (as it is with VAR and as it should be -> but is still not proven <- with VAO)
Well, wtf? programability with GL2.0.. okay the BIG stuff.. will be cool, everybody is nice etc etc etc...
And what about these memory issues? i don't want AGP when i'm managing static data.
This is not because dynamic stuff & streaming blah is da fashion that they could miss the point???
Something sounds wrong and maybe i'm! ;)

PH
10-07-2002, 09:48 AM
The links to the slides are broken. I would like to see those ( if anybody with access to them is reading this ).

Ozzy
10-07-2002, 09:53 AM
pareil

davepermen
10-07-2002, 10:50 AM
i don't see much problems on the VAO problem. they start to solve it, thats all.

but there is still only an os-dependent solution for pbuffers, render to texture. that is so utterly stupid. as rendering to texture gets more and more important, this should be finally handled in a real GL_ARB_render_target.

annoyed.

Ozzy
10-07-2002, 01:34 PM
Originally posted by davepermen:
i don't see much problems on the VAO problem. they start to solve it, thats all.


Start to solve it??? bugged & still not finished.. it's about a year from now Davepermen!
I don't want to be harrassant with this VAO prob but i'm fed up with people (from ATI) telling bulls.h.i.t.s..

from the specs:
Version

0.9 - 08/15/01
Added support for variant arrays.

0.8 - 07/06/01
Added table of new state.

----------

And today you're able to say that VAO rules? Frankly they don't, and please don't tell me that that's because they ARE of course faster than CVA! (how can they be slower anyway) but that 2x speed is not enough when you are supposed to turn your board in real hw T&L. (with no bus traffic).

VAO is a shame in term of performances and i don't care about its current implementation.

Korval
10-07-2002, 04:34 PM
(with no bus traffic).

VAR doesn't achieve this, primarily because doing so would violate numerous laws of physics.


VAO is a shame in term of performances and i don't care about its current implementation.

Say what you will about the 8500 implementation of VAO, there is nothing wrong with the spec itself. The spec doesn't say, "You will guarenteably get 5-10x the speed of non-VAO vertices." In fact, it doesn't promise any speed improvement. All it does is have a more relaxed version of the rather strict OpenGL array rules.

Likewise, it is not the ARB_render_texture extensions fault that it runs like crap (equivalent to a copy) on a GeForce. It is solely the fault of nVidia's drivers. Just as VAO performance not meeting your expectations (whatever those are) is purely the fault of ATi's drivers.

As an excercise to prove this, implement VAO using VAR. You can do this; it isn't terribly difficult. You'll find, unsuprisingly, that it achieves virtually the same speed as VAR.

Ozzy
10-07-2002, 11:30 PM
Korval says-> :)
"VAR doesn't achieve this, primarily because doing so would violate numerous laws of physics."

Then, once you've stored your vertices into VRAM could u explain for which good reason they'll be sent back to system memory through the BUS again?

Are we talking about VertexArrayRange? :))

Ozzy
10-07-2002, 11:31 PM
Originally posted by Korval:
As an excercise to prove this, implement VAO using VAR. You can do this; it isn't terribly difficult. You'll find, unsuprisingly, that it achieves virtually the same speed as VAR.

Bravo, but it is non-sense.

Ysaneya
10-08-2002, 12:28 AM
Ozzy, VAO alone is not that good, but as soon as you start using the ATI_map_object_buffer extension, VAO becomes as good as VAR. What makes you think it's slower than VAR ?

Y.

Ozzy
10-08-2002, 02:22 AM
Y.

Sorry i'm just landing on earth, and i've never heard of that ATI_map_object_buffer. ?
where are the specs of this extension? :))

Ozzy
10-08-2002, 02:35 AM
The specs and the raw timings results make me think that VAO (without the map_object_stuff you're talkin about) doesn't challenge compared to VAR implementations.

from specs (for Korval also ;) ->
This extension (VAR) is designed to allow very high vertex processing rates which
are facilitated both by relieving the CPU of as much processing burden as
possible and by allowing graphics hardware to directly access vertex data.
Because this extension is implemented as an addition to the vertex array

-------

Right, currently CPU usage when using VAO is really important! thus, that's the problem.
A good T&L design (as we're talking about vertex processing) should *not* have to involve any CPU cycles. that's all..

Now maybe the ATI_map_object trick can help but it is not present in the specs. (?) uh?

am i blind? :))

Ysaneya
10-08-2002, 03:05 AM
The ATI_map_object_buffer extension is just a way to lock / unlock the object buffer; that is, you get a direct pointer to video memory, instead of having to do your stuff in system memory, then transfer it into video memory. It only helps for dynamic updates, i don't think it changes anything for static geometry.

And i agree normally, when you statically store your vertices into video memory, the CPU should be idle when rendering. That's what i'd expect from VAO, did you have a different experience..?

Y.

Ozzy
10-08-2002, 03:16 AM
Well, Y.
I've looked into your sources and i've found the map_object_buffer stuff.
Thus what i've got to do now is to test with this locking func..
Btw, i am *only* using GL_STATIC_ATI when creating the VAO. Thus, maybe it needs to be an explicit lock to VRAM?

note: where did u find that ext?



[This message has been edited by Ozzy (edited 10-08-2002).]

Ysaneya
10-08-2002, 04:16 AM
It's only usefull when you write your data to video memory. Hence, if you've got static stuff that you only write once at runtime, it's useless. But if you need to stream some data in, i'd expect a quite big performance improvement. Btw, i think it requires your VAO to be created with GL_DYNAMIC_ATI.

This extension is not "official", so i won't tell more :) But believe me when i say the spec is quite simple, and it's obvious how to use it..

Y.

Ozzy
10-08-2002, 04:39 AM
Well i'm a BIT distgusted... ;)
knocking at the door of ATIDevrel for VAO issues etc etc etc for 8 months!
Not a word..
Kewl.
Anyway, we'll see what this ext is all about sOon!

Now from my own experience as i said, (and u know Y.) ;) It's just that CPU involved while processing vertex that drive me mad since the begining! That's why i was supposing that it would be AGP transferts.
(what else could it be?)
Moreover there are really bad performances 'enabled' when lighting is ON.

Tom Nuydens
10-08-2002, 04:44 AM
You mean they still haven't released the ATI_map_object_buffer spec? How long has that extension been around?

Oh well. Get http://www.ati.com/developer/sdk/RadeonSDK/Html/Info/Extensions/glATI.h and you will immediately see how it works:



// Assuming that buffer_handle is your already created vertex array object:
// Get a pointer to your object buffer:
void *objbuf = glMapObjectBufferATI(buffer_handle);

// ... Now write vertex data to *objbuf.

// And then unmap the buffer:
glUnmapObjectBufferATI(buffer_handle);
I've never seen the actual spec of ATI_map_object_buffer, but I can't imagine the above would be too far off.

-- Tom

[This message has been edited by Tom Nuydens (edited 10-08-2002).]

Ozzy
10-08-2002, 05:01 AM
don't know yet! :))
But we'll definitely see what the buffer addy will look like!! AGP or VRAM? ;) )

Ozzy
10-08-2002, 05:05 AM
Sorry Zak for this thread being.. weird.. ;)

Cab
10-08-2002, 07:12 AM
The fact of having a common interface to submit the vertex data (static and dynamic) to the graphics card is something really good.

It is true that render to texture is something that some of us can understand as something important (more important each day for advanced effects), as it can be a common interface for nv_texture_rectangle, occlusion query, a method for doing things if a condition becomes true without losing the parallelism (something like beginif(condition1, condition2, …)), …

But one of the needs to render things quickly is having the vertices in a place where the GPU can handle them efficiently (and it is true for render to texture, for setting up a shadow map, for vertex programs, … for everything). And the lack of a common API for this, is something bad for everybody trying to do something for the consumer mass. And IHVs looking to create a common interface is something all of us should understand like the right approach (this is something we are been asking for years). And it is a sign that they have stopped (at least for a moment) of looking at their own belly button, and trying to do the right thing for developers. I think, they have understood that this is the only way developers will use the new features of their different hw. So, for me, it is good news. ARB_fragment_program, even with the small limitations exposed by jwatte, is also a very good news. As they are the big number of ARB extensions approved and existing on current drivers during the last year. What we have to do, everyone of us, is give the correct feedback to the ARB so they can make the correct decisions.

If ATI’s VAO doesn’t work for you, it doesn’t mean that it is not a good spec, it is probably because their drivers are not optimised or not working properly. I used VAO one year ago, and it freezes the computer. I tell it to ATI, with a small demo, and I haven’t tried it again until last month. Now, it is working and the speed gain is big, from 15FPS to near 100FPS (on one testing machine: PIII 800, W98, R8500), and it is in a real game, with different kind of vertices, static and dynamic arrays, …

With a common interface, and everyone using it, they will need to optimise their driver because the benchmarks will show the difference between cards from different vendors.

Of course, this is just my opinion.

Ozzy
10-08-2002, 08:34 AM
Nice thoughts Cab for sure. :)
I completly agree about the unified interface stuff..
Anyhow, and i don't know that much for streaming data but.. today, VAR is the most performant, reliable and handy mechanism if u need to use highly optimised & customised data format. I mean, there is no HINT or such things.. ;) For all theses reasons VAO doesn't make it.

Ysaneya
10-08-2002, 10:17 AM
I suppose you refer to things like discard / preserve hints when updating the VAO. That's true. On the other hand, VAR has priorities when allocating memory. And you don't have to worry about synchronization with VAO. All in all, both have nice and bad things. VAR is more powerfull, but it's also easier to mess up with it. How much time did you spend debugging because you were writing to a zone of memory the GPU was still reading ? I find VAO's a lot cleaner.. VAR always seemed to me to be a big "hack" to grant access to video memory.. but maybe that's just me :)

Y.

Korval
10-08-2002, 11:58 AM
Then, once you've stored your vertices into VRAM could u explain for which good reason they'll be sent back to system memory through the BUS again?

You didn't specify the AGP bus. The video card has its own bus to its video RAM.


Bravo, but it is non-sense.

Yeah, it's "non-sense." Except how it completely defeats your argument that the VAO extension is somehow worse than VAR.

The 8500 implementation of VAO may not perform as good as the GeForce implementation of VAR. That isn't the same as saying that VAO is worse than VAR.

There is nothing intrinsically in the VAO extension that makes it slower or more CPU reliant than VAR. The only thing that is really different (besides having direct access to memory) is the lack of synchronization events in VAR. VAR requires manual synchronization, while VAO does sync-ing automatically.

It is a well-known fact that ATi's drivers are not as well developed as nVidia's. So, when you see an 8500's VAO implementation losing to a GeForce's VAR implementation, I fail to understand why you are surprised. It may even be some hardware problem that's getting in the way. In any case, it has nothing to do with the VAO extension itself.

And if you think this is "non-sense", then you don't understand the difference between a specification and its implementation.

Ozzy
10-08-2002, 01:21 PM
Y.
As i said i do *not* use dynamically modified nor streaming data. Thus, i haven't got to worry that much in terms of implementation.. Maybe the only worries i get were optimising my vertex structure to fill all the needed geometry into 16Mb. ;)
Moreover we're all coders here and i can understand that VAR design looks a bit like an alien mechanism into GL.. okay...
But i haven't seen and i was *not* able to do what i want & need to do with VAO regardless of performances..
But globally i agree with what you're saying :)

Ozzy
10-08-2002, 01:39 PM
Originally posted by Korval:
if you think this is "non-sense", then you don't understand the difference between a specification and its implementation.

Ok Korval, i don't want a good big long battle here.. :) Let's say that for the BUS stuff there was misunderstanding. The fact is that i'm suspecting VAO to store in AGP & then send it to the board when u need to display. (but i can't prove that right now).

Now talking about ->T&L<- implementations. It look obvious that if you've planned to write & run a T&L based engine then u'll have to face != types of VertexProcessing implementations as there is no unified interface under GL at the very moment.
So what do u expect then? The two current wellknown challengers are NV & ATI.
And respectively they've implemented VAR & VAO. Thus, for the same program on the same computer running both video cards families i expect let's say similar results. But it's not like that. Even a GF2MX beat a 8500 using the ideal configuration for a T&L chip
->
all geometry stored onboard 1 prim = 1 strip. (16MB)
single texturing & gouraud with a max of 16Mb textures.
lighting enabled of course, it is supposed to be done by the GPU.

I can remember months ago it was worst than it is with ATI drivers. Now it runs.. and i could add : it is running within the frame!!
But.. when cpu is 89% occupancy with a 8500LE it is the half on the same machine running a crappy GF.

that's all i've got to say. take it easy.

Gorg
10-08-2002, 02:01 PM
Originally posted by Ozzy:
The fact is that i'm suspecting VAO to store in AGP & then send it to the board when u need to display. (but i can't prove that right now).



To repeat Korval : Maybe the current ATI implementation does. But the specs do not specify that.

Ozzy
10-08-2002, 02:07 PM
Ok.. and in ten years i will be 42. :)
Np let's wait for better drivers.

Ysaneya
10-09-2002, 12:23 AM
Ozzy, just curious, what did you want / needed to do with VAR that you couldn't do with VAO ? Excepting very specific and unusual usage of fences, i don't really see.. they should more or less be "functionnaly" equivalent.

I don't think a GF2MX beats a R8500 at T&L, when using VAR from one side and VAO on the other. What makes you think that?

CPU usage: i wouldn't trust that one. It's well known that some functions, like SwapBuffers when using vsync, make the cpu go crazy; but it doesn't mean it's slow at rendering.. god only knows (and maybe ATI's driver team too..) what happens in the driver :)

Finally, VAO storing in AGP memory: i have no idea. I'd expect it to store it in AGP memory when using GL_DYNAMIC_ATI and in video memory (hence no bus transfer) when using GL_STATIC_ATI. It might not be the case, i honnestly don't know. But well, that's what i'd expect logically. After all, don't forget that before the detonators 40.41, VAR memory allocations were limited to 32 Mb, even when you had a 128 Mb video card. Speak about driver limitations, heh..

Y.

Ozzy
10-09-2002, 02:38 AM
Basically my structs using VAR look like this ->

typedef struct
{
VR_SHORT x,y,z,rienz; //coords $0,$2,$4,$6
VR_SHORT nx,ny,nz,rien; //normale.. $8,$0a,$0c,$0e
VR_BYTE r,g,b,a; //couleurs.. $10,$11,$12,$13
VR_UV texCoord[VR_MAX_TEXTURE_UNITS]; //coordonees de textures.. $14,$18,
// $1c,$20,
// $24,$28
// $2c,$30


VR_DWORD pad; //:(


}NV_VERTEX;


--------------------------
Of course size is varying depending on the texture channels used by the prim.

Moreover with VAR i explicitly store the vertices in VRAM while with VAO i can only pray for it to store my custom vertices as they are defined.
Moreover, customised vertex formats speed up GPU processing on GF hw then it's really nice to get performances + size advantages at the same time.

As my geometry is 100% static then no need to sync. Thus while rendering frame -1 in these circumstances i enjoy my 2 VBL in one.
understand parallelisation. While GPU is rendering frame-1 , CPU is cooking next lists, manage the game, play music and smoke a joint.

This is *definitely* not a vsync problem while swapping etc... the VAO cpu overhead is occuring while displaying primitive only.
Then if cpu is *too much* involved just say bye bye to parallelism. Frankly i don't enjoy this kind of high penalties. and this is explaining why on even an InnoGf2Mx200 it's smooth like ... u know.. ;)

got to go..

Ysaneya
10-09-2002, 03:38 AM
I fail to see why you couldn't be able to use the same structure with VAO. In addition, the priorities in VAR are just like hints. Because you request video memory (priority 1.0) doesn't mean you won't get AGP memory. To be convinced, try allocating 200 Mb of 1.0 memory; with the latest detonators, it will succeed, and for sure it doesn't fit in video memory.

I was not suggesting that the CPU usage was a vsync problem. It was merely an example of why i think CPU usage is worthless in itself. I'd be more interested to see a test in which you make a CPU calculation (physics, AI, whatever), and due to parallelization, it still runs at 100% framerate with VAR, and slows down with VAO. Can you demonstrate that ? If not, i for one wouldn't rely on CPU usage :)

Y.

Ozzy
10-09-2002, 04:05 AM
Originally posted by Ysaneya:
I fail to see why you couldn't be able to use the same structure with VAO. In addition, the priorities in VAR are just like hints. Because you request video memory (priority 1.0) doesn't mean you won't get AGP memory. To be convinced, try allocating 200 Mb of 1.0 memory; with the latest detonators, it will succeed, and for sure it doesn't fit in video memory.


Since March 2002 (as far as i know) VAO implementation is bugged for data types != GL_FLOAT. (try GL_SHORT,GL_BYTE etc)

and with VAR 16Mb is far enough for our project. ;) Thus we don't have to worry about the new memory management settled in current NV drivers. :)

Ozzy
10-09-2002, 04:07 AM
Originally posted by Ysaneya:
I was not suggesting that the CPU usage was a vsync problem. It was merely an example of why i think CPU usage is worthless in itself. I'd be more interested to see a test in which you make a CPU calculation (physics, AI, whatever), and due to parallelization, it still runs at 100% framerate with VAR, and slows down with VAO. Can you demonstrate that ? If not, i for one wouldn't rely on CPU usage :)

Y.

I'm afraid that you'll have to wait for the first episode of Orky's adventures! ;) )

davepermen
10-09-2002, 04:46 AM
cpu usage is always 100% if your thread does something, unimportant if he does much actually.. but yeah, it should not use the cpu to convert. possibly it has to, due some hardware restrictions..
but VAO is cool anyways. else we should as well get a TAR, Texture Array Range, and PAR, and all that. or MAR, memory array range, and just allocate every memory and do the whole memorymanagement for ourselfes. this is a) not the way opengl works, and b) stupid, as the ones that code the drivers know bether how to optimize for the particular gpu. VAO on geforces would surely rock..