PDA

View Full Version : ARBvbo posted



cass
03-17-2003, 09:49 AM
http://oss.sgi.com/projects/ogl-sample/registry/ARB/vertex_buffer_object.txt

There will also be a presentation on the OpenGL.org web site soon that Kurt Akeley gave at GDC 2 weeks ago.

gibber
03-17-2003, 10:32 AM
Good news indeed http://www.opengl.org/discussion_boards/ubb/smile.gif reading it now.

DarkWIng
03-17-2003, 10:36 AM
Nice! When will we see it in the extension string?

cass
03-17-2003, 10:38 AM
It's in NVIDIA beta drivers now, and it may show up in shipping drivers before the string appears in the extensions. Best way to check is to try to initialize the entry points.

Thanks -
Cass

davepermen
03-17-2003, 11:22 AM
great news, thanks for the info, cass

kehziah
03-17-2003, 11:31 AM
Just great!

Zeno
03-17-2003, 11:36 AM
Very cool. This is one of the last things I needed to be able to remove all vendor-specific extensions from my programs http://www.opengl.org/discussion_boards/ubb/smile.gif

-- Zeno

[This message has been edited by Zeno (edited 03-17-2003).]

tellaman
03-17-2003, 11:38 AM
sooooo sweeeet!!!

davepermen
03-17-2003, 11:45 AM
Originally posted by Zeno:
Very cool. This is one of the last things I needed to be able to remove all vendor-specific extensions from my programs http://www.opengl.org/discussion_boards/ubb/smile.gif

-- Zeno

[This message has been edited by Zeno (edited 03-17-2003).]

next thing is rendertexture without wgl stuff, and i'm happy http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Julien Cayzac
03-17-2003, 12:00 PM
Originally posted by cass:

It's in NVIDIA beta drivers now.


Is it in 43.03 drivers?
If so, when do we get a linux upgrade? http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Good news, indeed!

Julien.

NitroGL
03-17-2003, 12:04 PM
Originally posted by davepermen:
next thing is rendertexture without wgl stuff, and i'm happy http://www.opengl.org/discussion_boards/ubb/biggrin.gif

I actually would have prefered a new render texture extension than this, but I guess it doesn't matter THAT much. http://www.opengl.org/discussion_boards/ubb/smile.gif

kehziah
03-17-2003, 12:05 PM
After reading the Issues section, one can truly see the amount of work undertaken by the working group.
Once again, great job!
Can't wait to grab drivers implementing this ;-)

cass
03-17-2003, 12:29 PM
For those of you that are eager to look at Kurt's slides, I have put them at:
http://www.r3.nu/~cass/gdc_slides/GDC2003_OGL_BufferObjects.ppt

I'll leave them up there, but the OpenGL.org webmaster will be putting them up on this site soon too.

Thanks -
Cass

KRONOS
03-17-2003, 01:34 PM
YEAH baby!!!

And it looks good too... Where are thoose drivers cass?!

http://www.opengl.org/discussion_boards/ubb/smile.gif

Coriolis
03-17-2003, 01:41 PM
Do those slides contain any useful info that is not in the spec itself? And if so, is there a PDF version?

evanGLizr
03-17-2003, 01:42 PM
Do buffer objects survive screen resolution changes, etc.?

RESOLVED: YES. This is not mentioned in the spec, so by default they behave just like other OpenGL state, like texture objects -- the data is unmodified by external events like modeswitches, switching the system into standby or hibernate mode, etc.


Have you talked to MS about this? AFAIK OpenGL cannot survive modechanges in XP because of XP's design (the OS invalidates the WNDOBJ handler when not in the same resolution the WNDOBJ was created in), and MS acknowledges that shortcoming.

No sensible app should keep an OpenGL context alive across a modechange (unfortunately, there are some non sensible apps floating around). That statement in the issues part of the spec encourages faulty programming.

Care to comment?

[This message has been edited by evanGLizr (edited 03-17-2003).]

tellaman
03-17-2003, 02:05 PM
Originally posted by KRONOS:
YEAH baby!!!

And it looks good too... Where are thoose drivers cass?!

http://www.opengl.org/discussion_boards/ubb/smile.gif

yes and where to get headers?
thanks!

jra101
03-17-2003, 02:13 PM
Originally posted by tellaman:
where to get headers?

http://cvs1.nvidia.com/inc/GL/glext.h

Tom Nuydens
03-17-2003, 02:23 PM
Sweet! Working with offsets instead of direct pointers is a bit awkward at first, but other than that it works like a charm.

My GF4 Ti 4200 seems to strand at around 33 MTris/sec (using STATIC_DRAW, tristrips of about 1000 tris each, 1 million triangles total, index arrays also in a buffer object). Does that sound about right? If so, it seems a bit slower than VAR (SphereMark can reach 43 MTris/sec on my machine).

-- Tom

mproso
03-17-2003, 03:42 PM
What about NV10s and NV20s MAX_VERTEX_ARRAY_RANGE_ELEMENT_NV of 65535 and 1048575 limits.
Can we render objects that have more indicies than those limits above with GL_ARB_vertex_buffer_object? I mean can we render tham fast enough without spliting models on more parts? There are no implementation details or some restrictions of this kind in the VBO spec.

Thanks!

cass
03-17-2003, 05:04 PM
Tom,

Early implementations of VBO were just for API correctness (so that apps could begin porting). In current internal builds, VBO is as fast as VAR.

mproso,

Many of the strange aspects of VAR have been hidden by VBO. It doesn't mean that there aren't underlying hw restrictions, but the driver has more flexibility to work around those restrictions in ways that do not hinder performance.

evanGLizr,

Contexts survive mode switches and power management events. The things that may not survive are mapped buffer objects.


Coriolis,

The slides provide a higher-level introduction. The don't contain more information than is in the spec.

Thanks -
Cass

mcraighead
03-17-2003, 05:04 PM
Originally posted by evanGLizr:
AFAIK OpenGL cannot survive modechanges in XP because of XP's design (the OS invalidates the WNDOBJ handler when not in the same resolution the WNDOBJ was created in), and MS acknowledges that shortcoming.

I don't believe this is accurate. Our drivers have been able to recover from modeswitches (including color depth switches) for years now. That includes both Win9x and NT in all its variants -- NT4, Win2K, WinXP. We consider it a solved problem.

The real issue has more to do with discarding the vidmem heap. However, this is not an insoluble problem.

- Matt

JD
03-17-2003, 05:07 PM
This extension was sorely needed. Kudos to all ihv who worked on it.

Mazy
03-17-2003, 05:15 PM
I tried it out with gf3 and drivers 43.00, got it to not generate errors, but i get SLOWER result with it than normal vertex arrays?? having Vertex,Normal and indices in bertex buffers, just color and one light..
and, static_draw once, only rebind the arrays each frame, not the buffers ( yea i know, an implementation misstake by me in my engine )

can i query for the buffers validation?

JD
03-17-2003, 07:20 PM
You know what would be nice? A html file with links pointing to split extension specification text files. As it is now the file is too long and I keep scrolling up and down all the time to read it. The issue section should be categorized. Have an overview section and a function reference section would be nice as well. Doesn't anyone ever reads directX docs? Hyperlinking is so cool http://www.opengl.org/discussion_boards/ubb/smile.gif

Matt, cass, how does streaming differ from mapping? Am I correct in thinking that d3d9 dynamic buffers = mapped vertex buffer objects? Trying to make sense out of this that's all.

V-man
03-17-2003, 10:01 PM
Originally posted by mcraighead:
I don't believe this is accurate. Our drivers have been able to recover from modeswitches (including color depth switches) for years now. That includes both Win9x and NT in all its variants -- NT4, Win2K, WinXP. We consider it a solved problem.

The real issue has more to do with discarding the vidmem heap. However, this is not an insoluble problem.

- Matt

You have funny drivers since when creating the GL surface, if I pass a HWND=NULL, it succeeds creating the GL window everytime and works without problems.

The "GDI generic microsoft" implementation and MESA fail.

Well, that wasn't intentional. I didn't initialize my HWND so I discovered it accidently.

zeckensack
03-17-2003, 11:50 PM
Isn't NULL supposed to be the 'window' handle to the desktop surface? That would make it valid ... to some extent.

Cab
03-18-2003, 12:41 AM
This is a really good news. I don't see the moment to remove those VAR, VAO paths.

Have you notice that the slides of the other talks in this year GDC OpenGL tutorial are also here (some of them are not in the NVIDIA web site): http://www.r3.nu/~cass/gdc_slides
I was there and I recommend you all of them. Especially: GDC2003_OGL_Performance and GDC2003_OGL_ARBSuperbuffers

Tom Nuydens
03-18-2003, 01:26 AM
Originally posted by Mazy:
I tried it out with gf3 and drivers 43.00, got it to not generate errors, but i get SLOWER result with it than normal vertex arrays??

Try 43.30. I have no problem matching VAR performance on my GF3.

-- Tom

pkaler
03-18-2003, 01:57 AM
Originally posted by Cab:

Have you notice that the slides of the other talks in this year GDC OpenGL tutorial are also here (some of them are not in the NVIDIA web site): http://www.r3.nu/~cass/gdc_slides
I was there and I recommend you all of them. Especially: GDC2003_OGL_Performance and GDC2003_OGL_ARBSuperbuffers



No need to spend all of Cass' bandwidth. They're all located here as well.
http://www.opengl.org/developers/code/tutorials.html#gdc2003

Very cool stuff!!! An update to the Performance FAQ was sorely needed.

V-man
03-18-2003, 07:30 AM
Originally posted by zeckensack:
Isn't NULL supposed to be the 'window' handle to the desktop surface? That would make it valid ... to some extent.

Yes, only the first time. Then it should be failing.

Maybe this "feature" will help people who want to render to the desktop.

ToolTech
03-18-2003, 08:41 AM
When will we see some FGL 9700 drivers supporting this ??

MZ
03-18-2003, 09:55 AM
Originally posted by PK:
Very cool stuff!!!
nVidia and ATI logos side by side in all presentations!!! World will never be the same again... http://www.opengl.org/discussion_boards/ubb/smile.gif

ehart
03-18-2003, 09:57 AM
The ARB VBO support is in ATI drivers right now. When it appears in shipping drivers just depends on when the qualification cycles for different driver kits complete.

Depending on how things line up, I would say that VBO will be either in the next released driver or the one after that. I would expect those with access to developer drivers will see it extremely soon if not already.

-Evan

ToolTech
03-18-2003, 10:08 AM
How do you get access to ATI dev drivers ? I represent a company in Sweden working with a scene graph and I do need access to that extension.

Humus
03-18-2003, 01:47 PM
You need to register as a developer. There's a form here:
http://apps.ati.com/developers/devform1.asp

sanjo
03-18-2003, 01:58 PM
does anyone know how to fix the det. 43.30
resolution and refresh rate issue?

with 43.00 i could get near any resolution and refresh rates above 100 Hz.

now with 43.30 the resolutions are restricted to some. and i get max 85 Hz. refresh rate.

thanks

SirKnight
03-18-2003, 03:11 PM
Just for the heck of it here is a link to these latest beta nvidia drivers incase someone doesn't know where to get them:
http://www.3dchipset.com/drivers/beta/nvidia/nt5/index.php

I actually found these pretty easy in google but I'll be nice and put 'em here. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

-SirKnight

JONSKI
03-18-2003, 03:30 PM
Originally posted by tellaman:
yes and where to get headers?
thanks!

Originally posted by jra101:
http://cvs1.nvidia.com/inc/GL/glext.h

What kind of power curve can I expect with these new headers? Can you post a dyno sheet?

[This message has been edited by JONSKI (edited 03-18-2003).]

jra101
03-18-2003, 04:02 PM
Originally posted by JONSKI:
What kind of power curve can I expect with these new headers? Can you post a dyno sheet?

Power curve? Dyno sheet?

Korval
03-18-2003, 04:56 PM
For those of you that are eager to look at Kurt's slides, I have put them at:

Cass, what's wrong with you? You just got done with ARB_VBO, which means we can finally be satisfied. But no, you go and bring that up to tempt us more with even greater power, flexibility, and, dare I say, easy and trivial Render-to-texture functionality.

What's the ETA on the full ARB_"Uber_Buffer" extension (or extensions)?

jwatte
03-18-2003, 07:12 PM
No sensible app should keep an OpenGL context alive across a modechange


As soon as you figure out a way to transfer across an embedded Internet Explorer child window (with all state/context) to a newly created window on mode change, you might be able to claim this :-)

However, as that isn't really an option right now (and there seems to be no immediate plan to make it so), there are certainly legitimate reasons to re-use main windows after mode switches.

SirKnight
03-18-2003, 07:46 PM
Originally posted by JONSKI:
What kind of power curve can I expect with these new headers? Can you post a dyno sheet?


Say wha? This is a C/C++ header file not a car engine.

-SirKnight

NitroGL
03-18-2003, 10:23 PM
Originally posted by SirKnight:
Say wha? This is a C/C++ header file not a car engine.

-SirKnight

Bad joke, have a sense of humor. http://www.opengl.org/discussion_boards/ubb/smile.gif

MichaelK
03-19-2003, 03:36 AM
Originally posted by jra101:
Power curve? Dyno sheet?

Hes talking bout dinos sh*t. Didnt You know?

KRONOS
03-19-2003, 05:27 AM
I think I came across a bug in the drivers! I think... http://www.opengl.org/discussion_boards/ubb/wink.gif

My render procedure is something like this:




//create the buffer objects
for(i=0; i<Modelo->numMeshes; i++)
{
GlGenBuffersARB(1,&amp;Modelo->Meshes[i].bufferID);
GlBindBufferARB(GL_ARRAY_BUFFER_ARB, Modelo->Meshes[i].bufferID);
GlBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(MD5Vertex)*Modelo->Meshes[i].numVert, NULL, GL_DYNAMIC_DRAW_ARB);

buffer=GlMapBufferARB(GL_ARRAY_BUFFER_ARB,GL_WRITE _ONLY_ARB);
memcpy(buffer,Modelo->Meshes[i].geomCopy,sizeof(MD5Vertex)*Modelo->Meshes[i].numVert);
GlUnmapBufferARB(GL_ARRAY_BUFFER_ARB);
}

//do some other stuff
//draw to depth
for(i=0; i<Modelo->numMeshes; i++)
{
MD5Mesh&amp; mesh=Modelo->Meshes[i];
if (mesh.materialShader->transparent)
continue;
GlBindBufferARB(GL_ARRAY_BUFFER_ARB,mesh.bufferID) ;
GlVertexPointer(3,GL_FLOAT,sizeof(MD5Vertex),calcO ffset(mesh.geomRender,&amp;mesh.geomRender->pos));
GlDrawRangeElements(GL_TRIANGLES,0,mesh.numVert,me sh.numTri*3,GL_UNSIGNED_INT,mesh.Indices);
}
//render some more

//do the animation
MD5SetFrame(Modelo,frame);
for(i=0; i<Modelo->numMeshes; i++)
{
GlBindBufferARB(GL_ARRAY_BUFFER_ARB, Modelo->Meshes[i].bufferID);
buffer=GlMapBufferARB(GL_ARRAY_BUFFER_ARB,GL_WRITE _ONLY_ARB);
memcpy(buffer,Modelo->Meshes[i].geomCopy,sizeof(MD5Vertex)*Modelo->Meshes[i].numVert);
GlUnmapBufferARB(GL_ARRAY_BUFFER_ARB);
}


GlGenBuffersARB return a valid number, so does GlMapBufferARB, and so on...
But this creates some artifacts. Unless I put glFinish after MD5SetFrame and before mapping the buffers, it doesn't work.
I believe the driver should do all the syncronization work, right? So, where am I doing things wrong or is it a driver bug being worked out?
The drivers are beta.

bakery2k
03-19-2003, 05:30 AM
Originally posted by KRONOS:
I think I came across a bug in the drivers!

What drivers are you using?

sanjo
03-19-2003, 05:41 AM
Originally posted by sanjo:
does anyone know how to fix the det. 43.30
resolution and refresh rate issue?

with 43.00 i could get near any resolution and refresh rates above 100 Hz.

now with 43.30 the resolutions are restricted to some. and i get max 85 Hz. refresh rate.

thanks

sorry to bother you again.

but i would really appreciate any answer to this problem.
or simply, is it even possible to change the settings anyhow ?

thanks

Lars
03-19-2003, 06:37 AM
Originally posted by JD:
You know what would be nice? A html file with links pointing to split extension specification text files.

I once did a bit more readable version of the registry, by genrating a CHM of the registry. i just updated the file, and you can download it under: http://userpage.fu-berlin.de/~larswo/OpenGL-Extensions.chm

It is not perfect, some extensions where a bit failformated. Also it does not include cross references yet (usefull for the dependency section or some function references) but maybe i do this sometime.

Lars

KRONOS
03-19-2003, 07:06 AM
Sorry, forget to mention that! http://www.opengl.org/discussion_boards/ubb/tongue.gif I'm using the Detonator 43.30...

zeckensack
03-19-2003, 08:21 AM
KRONOS, aren't you supposed to check the return value of glUnmapBufferARB?

Maybe you should look into that first.
Skipping the maps/memcpys/unmaps and just using glBufferDataARB (with non-NULL parameter) might be worth a shot too.

SirKnight
03-19-2003, 09:05 AM
Originally posted by NitroGL:
Bad joke, have a sense of humor. http://www.opengl.org/discussion_boards/ubb/smile.gif

I'm sorry you thought I was making a joke but I was not. Power curves and dyno sheets have to do with performance of engines and the like, beit a car engine, motorcycle, or whatever kind of engine, even wind turbines. I guess I can see how you thought I was making a joke there, but trust me, if I were it WOULD be funnier than that. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

See here is an example of a dyno sheet: http://www.suprastore.com/800rwhpclub.html

How you can make one for a header file I don't know.

Here it talks about power curve a little for a dirtbike: http://www.onoffroad.com/dubach.html

http://www.opengl.org/discussion_boards/ubb/biggrin.gif

DISCLAIMER: I am not trying to be funny with all of this. I just don't see how a power curve and a dyno sheet has to do with a header file.

Maybe it was meant that he wanted a graph comparing the performance of VAR/VAO with VBO?

-SirKnight


[This message has been edited by SirKnight (edited 03-19-2003).]

Zeno
03-19-2003, 09:37 AM
How you can make one for a header file I don't know.

I can't believe there is so little overlap between computer and car people. The joke was referring to the new "headers" that go with this extension, i.e. glext.h. On a car, headers are the tubes that the exhaust goes through when it exits the engine. Replacing stock headers with more free flowing ones can increase engine horsepower by reducing back pressure. Get it? ha ha http://www.opengl.org/discussion_boards/ubb/smile.gif

-- Zeno

JONSKI
03-19-2003, 11:51 AM
Actually, I don't have to worry about headers. I drive a turbo rotary, so all I have to worry about are those nVidia detonators. I've seen many a turbine and rotor housing suffer from its ill effects.


Oh yeah,
The RX-7 club knows about you.

Thanks to me, they also know about the new VBO.

SirKnight
03-19-2003, 02:21 PM
Haha. So when can we expect new RX-7's with these new VBO headers? I bet they will beat everything else on the road. http://www.opengl.org/discussion_boards/ubb/wink.gif

-SirKnight

KRONOS
03-19-2003, 03:32 PM
KRONOS, aren't you supposed to check the return value of glUnmapBufferARB?
Maybe you should look into that first.
Skipping the maps/memcpys/unmaps and just using glBufferDataARB (with non-NULL parameter) might be worth a shot too.

I check it, and it returns TRUE. I just didn't put it there to make the example simpler. And I do not want to upload the data once. I want to change it every frame.
I create the buffers once, then enter the rendering loop where I draw and then change the data.

bashbaug
03-19-2003, 05:08 PM
Originally posted by KRONOS:
I want to change it every frame.
I create the buffers once, then enter the rendering loop where I draw and then change the data.

I can't think of any reason why calling BufferData would be any worse off than doing the map / unmap for each update. In fact, I can think of a lot of reasons why it might be better.

As said in Kurt's presentation, there are very few times you'll need to use map / unmap. If you're completely respecifying the contents of a buffer you're better off with BufferData.

-- Ben

[This message has been edited by bashbaug (edited 03-19-2003).]

Korval
03-19-2003, 05:47 PM
Why is it that you can only render from one vertex buffer? Why is the spec written to be so limitting in terms of swapping components (wanting to render a model using the same x,y's, but using a completely different set of z-values. Mainly for heightmaps).

bashbaug
03-19-2003, 08:40 PM
Originally posted by Korval:
Why is it that you can only render from one vertex buffer? Why is the spec written to be so limitting in terms of swapping components (wanting to render a model using the same x,y's, but using a completely different set of z-values. Mainly for heightmaps).

ARB_vbo isn't any more or less restrictive than conventional OpenGL vertex arrays. That is, you can't specify different components of a vertex attribute from different arrays, but you certainly can specify different vertex attributes in different buffer objects. This would be useful to draw the same model with different colors or texture coordinates or whatever. It's admittedly a stupid example, but consider:




BindBufferARB(ARRAY_BUFFER_ARB, vertexBufID);
VertexPointer(...);

BindBufferARB(ARRAY_BUFFER_ARB, colorBufID1);
ColorPointer(...);

// draw the model -
// vertex data comes from vertexBufID, color data from colorBufID1
DrawElements(...); // draw the model

BindBufferARB(ARRAY_BUFFER_ARB, colorBufID2);
ColorPointer(...);

// draw the model again -
// vertex data still comes from vertexBufID, color data from colorBufID2
DrawElements(...);


-- Ben

Cab
03-20-2003, 07:54 AM
Is VBO going to be supported on Radeon 7500, 8500?

Thanks

NitroGL
03-20-2003, 08:10 AM
I don't see why it wouldn't be supported on those... It's almost the same as ATI's VAO extension.

pkaler
03-20-2003, 08:13 AM
Originally posted by Cab:
Is VBO going to be supported on Radeon 7500, 8500?


I'd expect all cards that support vao and var will support ARB_vbo.
http://www.delphi3d.net/hardware/extsupport.php?extension=GL_ATI_vertex_array_objec t http://www.delphi3d.net/hardware/extsupport.php?extension=GL_NV_vertex_array_range

Cab
03-20-2003, 08:20 AM
Originally posted by NitroGL:
I don't see why it wouldn't be supported on those... It's almost the same as ATI's VAO extension.

Maybe because mapping a buffer was not available via the ATI extension. There was another ATI extension (ATI_map_object_buffer) for doing it, but I don't know if it was available in Radeon 7500.

Thanks.

NitroGL
03-20-2003, 09:06 AM
@Cab - This shows that all (I think) Radeons support that extenion: http://www.delphi3d.net/hardware/extsupport.php?extension=GL_ATI_map_object_buffer

And for those interested, I've made a sinple (fairly simple) demo of the new extenion: http://www.area3d.net/file.php?filename=nitrogl/ARBvbo.zip
The important parts are commented. I'm pretty sure it's all done correctly, though on my 9700 it doesn't work quite right (mem unmap fails), but I think that's just a driver thing.

DarkWIng
03-20-2003, 09:16 AM
How expensive VBO bind actualy is? Specialy in cases where you have each array in seperate VBO (one ofr vertice, one for normals, one for colors,....)

jwatte
03-20-2003, 10:08 AM
I can't think of any reason why calling BufferData would be any worse off than doing the map / unmap for each update. In fact, I can think of a lot of reasons why it might be better.


Suppose I still do skinning in software. Then I can either:

1) Skin into cached memory (which will thus cut my available L1 cache size in half, which REALLY hurts on the Pentium IV). Then use BufferData to copy the data into the buffer, which means a second copypass.

2) Map the buffer, and skin directly into the buffer, which presumably lives in un-cached memory. This avoids an extra copy pass, AND it gives me more L1 cache for my bone matrices.

On a Pentium IV and a reasonable-size skeleton, the size of the bone matrices really starts hurting if you're going cached-to-cached, as there's only 8 kB of L1 cache, and (rule of thumb) half of that disappears if you're writing to cached memory (maybe only a quarter disappears if you write with MOVNAPS because the cache is 4-way (IIRC), but I wouldn't bet on it).

Anyway, it seems to me that mapping is The Right Thing To Do for any streaming data which you rewrite every frame, and completely rewrite as part of generating the data.

cass
03-20-2003, 10:09 AM
Originally posted by Korval:

What's the ETA on the full ARB_"Uber_Buffer" extension (or extensions)?

Korval,

The superbuffers group is working hard to get a finalized spec. As we near completion of the spec, we will strive to get it into public drivers.

I'm really happy about these two extensions. They will fix a lot of outstanding problems with writing portable OpenGL.

Thanks -
Cass

zeckensack
03-20-2003, 10:19 AM
Originally posted by jwatte:
Suppose I still do skinning in software. Then I can either:

1) Skin into cached memory (which will thus cut my available L1 cache size in half, which REALLY hurts on the Pentium IV). Then use BufferData to copy the data into the buffer, which means a second copypass.

2) Map the buffer, and skin directly into the buffer, which presumably lives in un-cached memory. This avoids an extra copy pass, AND it gives me more L1 cache for my bone matrices.

On a Pentium IV and a reasonable-size skeleton, the size of the bone matrices really starts hurting if you're going cached-to-cached, as there's only 8 kB of L1 cache, and (rule of thumb) half of that disappears if you're writing to cached memory (maybe only a quarter disappears if you write with MOVNAPS because the cache is 4-way (IIRC), but I wouldn't bet on it).

Anyway, it seems to me that mapping is The Right Thing To Do for any streaming data which you rewrite every frame, and completely rewrite as part of generating the data.
Your point is very valid, but ... doesn't the P4 bypass its L1 D-Cache for FP-Data?

Korval
03-20-2003, 10:34 AM
ARB_vbo isn't any more or less restrictive than conventional OpenGL vertex arrays. That is, you can't specify different components of a vertex attribute from different arrays, but you certainly can specify different vertex attributes in different buffer objects. This would be useful to draw the same model with different colors or texture coordinates or whatever. It's admittedly a stupid example, but consider:

The spec, and the examples, made it seem like that calling a second glBindBuffers is not a possibility, once you've attached a particular attribute array to a buffer. On page 22 of the Power Point presentation, it says that glBindBuffers "must preceed pointer calls". That seems to rule out your code. Also, the powerpoint slides seem to indicate that each gl*Pointer call is bound to the same Buffer.

[Edit]

On the other hand, I consulted the actual spec, and it says, "It is acceptable for vertex, variant, or attrib arrays to be sourced from any combination of client memory and various buffer objects during a single rendering operation."

So, I guess you're right. Good; I can still do what I wanted to.


The superbuffers group is working hard to get a finalized spec. As we near completion of the spec, we will strive to get it into public drivers.

Excellect.

[This message has been edited by Korval (edited 03-20-2003).]

bashbaug
03-20-2003, 10:49 AM
Originally posted by jwatte:
Anyway, it seems to me that mapping is The Right Thing To Do for any streaming data which you rewrite every frame, and completely rewrite as part of generating the data.

You're right. I was thinking of KRONOS's example where he already has his data in cacheable memory, and is simply doing a memcpy into the buffer.

I probably should have said, "If you're completely respecifying the contents of a buffer object and you're not streaming data, you're better off with BufferData."

Nice catch.

-- Ben

KRONOS
03-21-2003, 09:09 AM
I probably should have said, "If you're completely respecifying the contents of a buffer object and you're not streaming data, you're better off with BufferData."


I came to that conclusion short after a used the 43.03 drivers. It exposes the extension and the issue I had is gone. Maybe the 43.30 don't syncronize...

I haven't bench this but I don't know what to do: mapping or using BufferSubData. Both work and I can't see a diference. But I guess BufferSubData should be faster since the driver takes care of the access and it is the only one that knows where the memory trully is...

Robbo
03-21-2003, 11:43 AM
Originally posted by evanGLizr:
Have you talked to MS about this? AFAIK OpenGL cannot survive modechanges in XP because of XP's design (the OS invalidates the WNDOBJ handler when not in the same resolution the WNDOBJ was created in), and MS acknowledges that shortcoming.

No sensible app should keep an OpenGL context alive across a modechange (unfortunately, there are some non sensible apps floating around). That statement in the issues part of the spec encourages faulty programming.


Hey, thanks for the info! I'm going to add that one in to our "bug" tracker at work. I didn't know that! Although our users are very unlikely to change screen mode while the program is running, it is a possible problem they might run in to.

Austrian Coder
03-21-2003, 11:46 AM
So as far as i can see, this extension is supported by every NVidia and ATI card with the newest drivers, which will release in the near future.

Should there be a fallback, if this suberp extension is not supported? Then mybe it is more work.

Humus
03-21-2003, 01:41 PM
A fallback would probably be wise to have, but it's not much code needed to support both VBO and standard system memory vertex arrays.

jwatte
03-21-2003, 06:19 PM
> doesn't the P4 bypass its L1 D-Cache for FP-Data?


That's the first I heard of that. That would be very bad, as it would make latencies for reading the bone matrices very high.

Are you sure you're not thinking of the MOVNTPS SSE instruction, which allows you to manually bypass the cache write?

If this is a special mode in the P4, do you have a reference I could go look-see at?

zeckensack
03-22-2003, 02:47 AM
No hard references, sorry. That's only second hand info I picked up on the forums, but it should be officially documented somewhere. The P4 supposedly ignores L1 for FP data and instead falls back to its L2 cache. Which is not all that bad, we're definitely not talking about uncached memory access here.

I'm not sure atm whether this applies to x87 only, SSE2 only, or both. I don't have a P4, so I can't test it myself. But I'm pretty certain that it's true for at least one of these two.
The basic idea is that
1)FP data often comes in huge batches - a potential cache thrashing hazard
2)typical FP code doesn't suffer as much from increased latency, as long as there's enough bandwidth.
L2 would then be the natural choice, seeing how scarce L1 cache is on the P4.

Please take all of this with the mandatory grain of salt until someone with first hand knowledge clarifies.

fritzlang
03-22-2003, 10:30 AM
Originally posted by cass:

Early implementations of VBO were just for API correctness (so that apps could begin porting). In current internal builds, VBO is as fast as VAR.


When can we expect to see that driver?
I love this extension ( thanks http://www.opengl.org/discussion_boards/ubb/smile.gif ) but it is very slow as it stands now.
And will I be guranteed to get fastest possible video mem if there is sufficent on board?

From the spec:
"- Applications may still access high-performance memory, but this is optional, and such access is more restricted."

I did a simple test, untextured 2d patch, 257 x 257 vertices as 2d floats, unsigned short indices, drawing tristrips.
VAR = 128 fps.
Std-gl =60 fps.
VBO with indices in ram = 60fps.
VBO with indices in buffer = 60fps.

Cheers.

[Edit]
Shouldn't then, with optimal drivers, VBO potentially be faster than VAR, for static geometry, since the indices can be in video ram?
In this example VBO > 128fps?

[This message has been edited by fritzlang (edited 03-22-2003).]

[This message has been edited by fritzlang (edited 03-22-2003).]

JD
03-22-2003, 11:48 AM
Lars, I saw your docs and it gave me an idea. Download my .chm docs and tell me what you think. It's not complete just a format to give an idea what I had in mind. I'm not sure where I will go with this because it looks like a lot of work. This format is useful for vendor specific extensions. I feel sgi should create a new doc format allowing IHVs to plug into it. The user would then download docs from sgi and have it all in one place.

Download from http://forged3d.tripod.com

It's at the bottom of the pic on main page. Btw, that's my editor done in d3d9, just recently I thought of moving to gl for flexibility purposes. I'm still undecided though. Take care.

pkaler
03-22-2003, 12:23 PM
Originally posted by JD:
Download my .chm docs and tell me what you think.

Anyway to get that in a format that is readable where IE is not available?



This format is useful for vendor specific extensions. I feel sgi should create a new doc format allowing IHVs to plug into it. The user would then download docs from sgi and have it all in one place.


How about Docbook, or LaTex, or straightup html with some css.

I have Perl on my "to learn" list. Maybe I'll put together a script to parse the txt files and spit them into LaTex.

[This message has been edited by PK (edited 03-22-2003).]

V-man
03-22-2003, 12:29 PM
Originally posted by zeckensack:
The basic idea is that
1)FP data often comes in huge batches - a potential cache thrashing hazard
2)typical FP code doesn't suffer as much from increased latency, as long as there's enough bandwidth.
L2 would then be the natural choice, seeing how scarce L1 cache is on the P4.

Or maybe Intel made a bad choice when reducing the L1 cache size to 8KB (data) and they noticed bypassing the L1 for FP often improved performance.

Kind of stupid considering how much the performance improved with the 32K on the PMMX.

Intel seems to be bent on clock rate thinking GHz sells chips. These guys are going backwards. Rememeber the cacheless Celeron? What a joke.
and what about RAMBUS? gimme a break.

DarkWIng
03-22-2003, 12:31 PM
Will newer builds also reduce the cost of ARV_VBO binding? I'm using 43.30 and reducing binding calls increased by framerate by 15%. I still cant get it to work faster than 70% speed of VAR.

JD
03-22-2003, 02:41 PM
PK, yes, a better perhaps custom doc tool is needed. That's why I think sgi/ihv should head it. The nice thing about the microsoft's compressed htlm format is that it offers search, indexing and bookmarking abilities which I find nifty. Though MS has abandoned html help for its next xml format in longhorn os there are tools out there that can convert .chm into pdf but loose some functionality in the conversion. The .chm would be a temporary bandage, the help files are written in html so they are not hard to port to some other docs formats/tools.

SirKnight
03-22-2003, 03:35 PM
JD, I really like the format of your docs. If all extensions were setup in a format like this it would be much easier to get to the particular info I want to look at right now instead of scrolling through a huge document of pure text. Plus the bolding, coloring and all that is a nice touch. Maybe this could be a project we all contribute to to have all the extentsions in there. Because it would take a bit of work to get all the extensions in there. I'd be willing to help though.

-SirKnight

fritzlang
03-22-2003, 03:47 PM
Originally posted by DarkWIng:
Will newer builds also reduce the cost of ARV_VBO binding? I'm using 43.30 and reducing binding calls increased by framerate by 15%. I still cant get it to work faster than 70% speed of VAR.

I would like to hear how fast your VBO is compared to regular non-extended vertex arrays. As I mentioned I get the exact same result. But if you get significantly faster result you must be getting AGP or VRAM mem.

Thanks.

Korval
03-22-2003, 11:33 PM
And will I be guranteed to get fastest possible video mem if there is sufficent on board?

I imagine that either nVidia's VBO uses (or, will use when properly optimized) VAR internally, or it will use its own direct access. If it is the latter, understand that most GL high-performance development will shift to the accepted VBO. As such, nVidia will have no choice but to make VBO as fast as possible.

As for the debate on .chms... I have yet to see a better format for on-line programming documentation (which this qualifies as) than a good .chm file. If .chms can't be used on non-Windows systems, maybe somebody ought to write a .chm viewer for them. In fact, I thought that there already was a .chm viewer for Linux.

.pdf's are good for printing; not reading/searching/etc that an on-line document needs.

[This message has been edited by Korval (edited 03-23-2003).]

DarkWIng
03-23-2003, 12:10 AM
fritzlang : VBA is much faster than plain VA. I can't give you exact numbers but I would say about 2x.

fritzlang
03-23-2003, 01:55 AM
Thanks DarkWIng,
I must be doing something wrong. VAR is fast as always (wglAllocateMem(..., 0, 0, 1)) but I cannot get VBO to at all improve over std gl arrays.

This code uses buffered vertices and unbuffered indices, like I do it with VAR.

// Init
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 1);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, m_uiNumVertices * 2 * sizeof(float), m_pfVertices, GL_STATIC_DRAW_ARB);

// Draw
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 1);
glVertexPointer(2, GL_FLOAT, 0, BUFFER_OFFSET(0));
glEnableClientState(GL_VERTEX_ARRAY);
glDrawElements(GL_TRIANGLE_STRIP, m_uiNumIndices, GL_UNSIGNED_SHORT, m_pusIndices);
glDisableClientState(GL_VERTEX_ARRAY);

Thanks.



[This message has been edited by fritzlang (edited 03-23-2003).]

Humus
03-24-2003, 02:01 PM
I can't see any direct problems with your code, but I would recommend storing the indices in a ELEMENT_ARRAY too. Will help performance.

glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, vboIndexBuffer);
glBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, nIndices * sizeof(short), indices, GL_STATIC_DRAW_ARB);

// draw
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, vboIndexBuffer);
glDrawElements(GL_TRIANGLE_STRIP, nIndices, GL_UNSIGNED_SHORT, BUFFER_OFFSET(0));

KRONOS
03-25-2003, 03:39 PM
Is it OK to create about 200 buffers in tearms of performance? I can't think of any reason why it should be slower than just a few buffers. Or is binding very expensive?

Korval
03-25-2003, 05:25 PM
Originally posted by KRONOS:
Is it OK to create about 200 buffers in tearms of performance? I can't think of any reason why it should be slower than just a few buffers. Or is binding very expensive?

Considering that binding could do anything from moving a pointer to uploading that memory from system RAM to the video card... I would consider binding a buffer to be approximately as painful as binding a texture. In short; the fewer the better.

Pop N Fresh
03-26-2003, 12:54 AM
I think i've encountered a driver bug changing my code over to vbo from var. The following code repeatedly draws the vertices set with the first calls to gl*Pointer within the loop. The offsets do not update correctly. If I change the if( b > 0) code to something like if( b == 45 ) it will rendering the block of vertices correctly. Moving around the various gl calls and added glFlush or glFinish after rendering every block has no effect. This happens with NVidia's 4303 4330 and 4345 drivers.



glBindBufferARB( GL_ARRAY_BUFFER_ARB, svbo_blockvertices );

glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);

int br = 0; // blocksrendered

for( int x = 0; x < 32; x++ )
{
for( int z = 0; z < 32; z++ )
{
byte b = s_header->mIndexBlock[x*32+z];
if( b > 0 )
{
LandBlock&amp; lb = s_blocks[b-1];


int off = sizeof(BlockVertices) * (b-1);

glVertexPointer(3, GL_FLOAT, sizeof(TerrainVertex), BUFFER_OFFSET(off+20) );
glNormalPointer(GL_FLOAT, sizeof(TerrainVertex), BUFFER_OFFSET(off+8) );
glTexCoordPointer(2, GL_FLOAT, sizeof(TerrainVertex), BUFFER_OFFSET(off+0) );

for( int strip = 0; strip < 32; strip++ )
{
glDrawElements(GL_TRIANGLE_STRIP, 18, GL_UNSIGNED_SHORT, &amp;(s_blockelements[strip*18]));
}

br++;
}
// done drawing block
}
}

Graham
03-26-2003, 05:35 PM
Originally posted by DarkWIng:
Will newer builds also reduce the cost of ARV_VBO binding? I'm using 43.30 and reducing binding calls increased by framerate by 15%. I still cant get it to work faster than 70% speed of VAR.

well..


the extension specs:
Buffer changes (glBindBufferARB) are generally expected to be very lightweight, rather than extremely heavyweight (glVertexArrayRangeNV).

so I guess yes.

vincoof
03-26-2003, 11:56 PM
My guess is that performance difference between the two extensions depends on the underneath link to the graphics hardware. NV_vertex_array_range is designed for and by NVIDIA, so I would expect NV's extension to be a little bit faster than ARB's at least for current cards. The implementation of the ARB spec will probably be better in later drivers, so DarkWIng's 70% could reach 80-90%, but I doubt it will ever be better than 100% (on GF1-4).

cass
03-27-2003, 02:22 AM
Originally posted by Korval:
Considering that binding could do anything from moving a pointer to uploading that memory from system RAM to the video card... I would consider binding a buffer to be approximately as painful as binding a texture. In short; the fewer the better.

No, binding a buffer is a cheap operation. The reason we chose this API was that we get to re-use all of the gl*Pointer() calls and glDraw*() calls and (when PBO arrives) glReadPixels().

We considered adding VBO-style entry points for every call that takes a pointer, but in the end we decided that this way was nice and orthogonal, and thus will be more easily integrated into OpenGL implementations and application code.

Thanks -
Cass

CybeRUS
03-27-2003, 08:04 AM
I'm usign GeForce FX 5800 Ultra and driver 43.40, WinXP

I can't reach perfomance of VAR about 10%. I guess it will be fixed http://www.opengl.org/discussion_boards/ubb/smile.gif

My game is using about 2000 buffers (vertex&index). It's landscape.
When i deformate landcape blocks i delete buffer and create new.

And after 500 deformations it's crash in driver, maybe bug in manager.

All blocks is about 2Mb of video. My manager for VAR work perfectly and using fences (maybe bug in sync).

Korval
03-27-2003, 09:40 AM
The reason we chose this API was that we get to re-use all of the gl*Pointer() calls and glDraw*() calls and (when PBO arrives) glReadPixels().

We considered adding VBO-style entry points for every call that takes a pointer, but in the end we decided that this way was nice and orthogonal, and thus will be more easily integrated into OpenGL implementations and application code.

I don't think you quite understood what I was suggesting. I was simply suggesting that the data a buffer stores could possibly be in system memory and, upon being bound, would have to be uploaded to either AGP or video RAM. However, if a bind is going to be lightweight, I presume the driver won't be using regular RAM for VBO's.

jwatte
03-27-2003, 10:07 AM
I think data will conceptually be put where it belongs the first time it's used. It might get kicked out of there if it hasn't been used in a long while, somewhat like texture data. I would guess that almost all implementations will put the data in AGP memory, as they'll need video memory for texture and framebuffer traffic.

vincoof
03-27-2003, 11:30 AM
For those of you interested by NVIDIA drivers, new ones are available today. I haven't checked the support for ARBvbo but I guess something has been done. Maybe someone from NVIDIA can confirm if ARBvbo support has been enhanced in these drivers ?

MZ
03-27-2003, 02:48 PM
43.45? VBO is absent in extension string. I Didn't check function pointers.

Pop N Fresh
03-27-2003, 06:36 PM
I'm using 43.45 and it still only allows me to set the gl*Pointers (offsets) a single time after binding a buffer object. Still a ways to go I suspect.

cass
03-27-2003, 06:51 PM
What do you mean, PopNFresh?

Pop N Fresh
03-27-2003, 08:52 PM
I Bind my buffer object and enable the array client state. I then call gl*Pointer to set my offsets and draw using glDrawElements or glMultiDrawElements. Works fine.

I want to draw several different subblocks of the buffer however so in a loop I reset my offsets and then call DrawElements again. I use the same indices in my DrawElements call (I'm drawing chunks of terrain). Each call merely redraws from whatever offset was first set. The subsequent gl*Pointer calls do nothing. If I change my loop variable so it starts at a different chunk I get the same behavior. A different chunk gets rendered first but the offsets for subsequent chunks don't seem to update.

I posted the code a bit a few posts further up in the thread. I can't see anything wrong with it and it worked fine with VAR.

cass
03-28-2003, 09:56 AM
Pop-n-fresh,

Can you send me a small glut program that illustrates the problem? I'll go ahead and look into this, but a repro case would be very helpful.

Thanks -
Cass

cass
03-28-2003, 10:02 AM
Also, one additional question: If you move the BindBuffer call inside the loop (I know it makes it redundant), does it change anything?

Thanks -
Cass

cass
03-28-2003, 10:39 AM
Ok, I've checked with the engineer that implemented this in our driver, and the bug has been fixed. It will be available in the next driver release.

Unfortunately, there's no convenient work-around.

Thanks -
Cass

Pop N Fresh
03-28-2003, 11:33 AM
Yeah, I moved BindBuffer to within the loop, disabled / renabled the client state in the loop, bound a different buffer and then the buffer i'm using every loop... I couldn't find any workaround. I eventually figured the driver must be keeping a dirty bit somewhere that wasn't being set properly (or something like that) and gave up.

Good to know this will be fixed in the next release and thank you very very much for looking into to it. I appreciate it.

Kelvin

Mazy
04-03-2003, 12:53 AM
I think i got another bug..
driver version 43.45, using windows

I have multiple contexts (not pbuffers, only 'screen' contexts).
After i created the RenderContexts i use wglSharelists to le them use the same resources.
I create an ObjectBuffer with the first context 'current', and then i can use the buffer in all the contexts, but if i want to reinitalize the content by glBufferData the new data only affects the current active rendercontext.. with glBufferSubData it affects all.