PDA

View Full Version : VBO Performance Test



ToolTech
07-15-2003, 03:51 AM
Hi !

I have made a app that uses VBO and on my HW i get very low FPS. I have a GForce 4 Ti 4600 with drivers 44.03.

I would be happy if you would like to test it on Raden HW with VBO support and newer NVidia drivers with HW >= GForce4..

Here is the URL..
http://www.tooltech-software.com/downloads/gizmo3d/binaries/win32/VBO%20Test.zip

Thanx ahead !!!

BTW. You can see som of my IBR stuff in it..

PH
07-15-2003, 04:01 AM
Tested this on Radeon 9700 Cat 3.5. The VBO version runs very slow( 0fps when I press the 'f' key ), the non-VBO is quite fast ( 35fps ). The output seems messed up in both versions. Parts of the teapot is missing.

ToolTech
07-15-2003, 04:05 AM
The Teapot is just rendered from one image + depth map. Therefor it is missing a lot of "non visible" patches..

However. you get the same result as I do. The VBO version is SO SLOW !! Strange...

Mazy
07-15-2003, 04:14 AM
I havent tried that program yet, but im using vbo:s in my own programs on a radeon card ( and its been tested on gf4 and gffx aswell) and there we get a pretty nice preformance boost, so i wouldnt blame the drivers just yet.

^Fishman
07-15-2003, 04:16 AM
"This application has failed to start because MSVCP60D.dll was not found."
No, I don't use Visual Studio 6, I use Visual Studio .NET.

ToolTech
07-15-2003, 04:20 AM
You can find the missing files here ...
http://www.tooltech-software.com/downloads/gizmo3d/binaries/win32/win32_runtime.zip

I get 3 FPS using VBO and 30 FPS using the non VBO version.

PH
07-15-2003, 04:24 AM
It isn't a driver issue as ToolTechs test app runs slow on both NV and ATI hardware in VBO mode.

One thing that generally gives me bad performance is when I mess up and get gl errors per frame, but you probably already checked that.

Robert Osfield
07-15-2003, 04:28 AM
Originally posted by ToolTech:
Hi !

I have made a app that uses VBO and on my HW i get very low FPS. I have a GForce 4 Ti 4600 with drivers 44.03.


Hi Anders,

I implemented a VBO path in the OSG a couple of weeks back and found up to 50% peformance boost on coarse grained high polygon models.

However, on models that were composed of then of thosands of small peices of geometry the peformance of VBO is slower than using display lists. I think this is largely down to OpenGL calling overhead swamping the gains from VBO. The use of extensions and having to querry for them at runttime makes doing lots of extension calls expensive :-|

The drivers that I am using are Nvidia's 43.63 release under Linux. Results will obviously vary on different drivers/OS's/graphics hardware, but in general my findings have been positive, save crashes reported on Geforce2Gp laptops.

Robert.

ToolTech
07-15-2003, 04:34 AM
Hi Robert.

I get the scary feeling that my usage of shorts for vertex coordinates and mixing VertexAttrib with normal VertexPointer slows it down. In my other apps I also do get a gain but in this case it runs really messy. 10X slower !! How could I detect that using VBO is 10X slower on a HW ?? I mean.. VBO should be faster in ANY case right ?

ToolTech
07-15-2003, 04:41 AM
Here is the code used to render





gzVoid gzIBRGeometry: http://www.opengl.org/discussion_boards/ubb/tongue.gifreTraverseAction( gzTraverseAction *actionclass , gzContext *context)
{
if(actionclass->isExactType(gzRenderAction::getClassType())) // Exact a graphic action
{
if(!gzGraphicsEngine::has_vertex_program())
return;

//gzDepthFunc(GZ_LESS);

gzPushMatrix();

gzMultMatrixr(&m_transform.v11);

if(gzGraphicsEngine::has_vertex_buffer_object())
{
gzULong offsetToDepth=m_width*m_height*sizeof(gzShort)*2;

if(m_rebindIndex)
{
m_rebindDepth=TRUE;

if(m_bufIndexID)
{
gzDeleteBuffers(1,&m_bufIndexID);
m_bufIndexID=0;
}

gzGenBuffers(1,&m_bufIndexID);

gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,m_bufIndexID) ;

gzBufferData(GZ_ELEMENT_ARRAY_BUFFER,2*m_width*siz eof(gzULong),m_indexSet->getIndexAddress(),GZ_STATIC_DRAW);

if(m_bufID)
{
gzDeleteBuffers(1,&m_bufID);
m_bufID=0;
}

gzGenBuffers(1,&m_bufID);

gzBindBuffer(GZ_ARRAY_BUFFER,m_bufID);

gzBufferData(GZ_ARRAY_BUFFER,m_width*m_height*(siz eof(gzShort)*2+sizeof(gzFloat)),0,GZ_STATIC_DRAW);

gzBufferSubData(GZ_ARRAY_BUFFER,0,offsetToDepth,m_ indexSet->getXYAddress());

m_rebindIndex=FALSE;
}
else
{
gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,m_bufIndexID) ;

gzBindBuffer(GZ_ARRAY_BUFFER,m_bufID);
}

if(m_rebindDepth)
{
gzBufferSubData(GZ_ARRAY_BUFFER,offsetToDepth,m_wi dth*m_height*sizeof(gzFloat),m_depthMap->getArray().getAddress());
m_rebindDepth=FALSE;
}

gzEnableClientState(GZ_VERTEX_ARRAY);

gzEnableVertexAttribArray(1);

for(gzULong i=0;i<(m_height-1);i++)
{
gzVertexAttribPointer(1,1,GZ_FLOAT,FALSE,0,(const gzVoid *)(i*m_width*sizeof(gzFloat)+offsetToDepth));

gzVertexPointer(2,GZ_SHORT,0,(const gzVoid *)(i*m_width*sizeof(gzShort)*2));

gzDrawRangeElements(GZ_TRIANGLE_STRIP,0,2*m_width-1,2*m_width,GZ_UNSIGNED_INT,0);
}

gzDisableVertexAttribArray(1);

gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,0);
gzBindBuffer(GZ_ARRAY_BUFFER,0);

}
else
{

gzEnableClientState(GZ_VERTEX_ARRAY);

gzEnableVertexAttribArray(1);

for(gzULong i=0;i<(m_height-1);i++)
{
gzVertexAttribPointer(1,1,GZ_FLOAT,FALSE,0,((gzFlo at *)m_depthMap->getArray().getAddress())+i*m_width);

gzVertexPointer(2,GZ_SHORT,0,m_indexSet->getXYAddress()+i*m_width);

gzDrawRangeElements(GZ_TRIANGLE_STRIP,0,2*m_width-1,2*m_width,GZ_UNSIGNED_INT,m_indexSet->getIndexAddress());
}

gzDisableVertexAttribArray(1);
}

gzPopMatrix();

//gzDepthFunc(context->depthFunc);
}

}

Robert Osfield
07-15-2003, 04:43 AM
Originally posted by ToolTech:
Hi Robert.

I get the scary feeling that my usage of shorts for vertex coordinates and mixing VertexAttrib with normal VertexPointer slows it down. In my other apps I also do get a gain but in this case it runs really messy. 10X slower !! How could I detect that using VBO is 10X slower on a HW ?? I mean.. VBO should be faster in ANY case right ?

Eventually I'd hope VBO to efficient for all vertex formats supported by the hardware, but I think its still early days for the driver support. The spec mentions that float for vertex storage being optimized... cut and pasted from the vertex_buffer_object.txt :

2.8A.1 Vertex Arrays in Buffer Objects
--------------------------------------

Blocks of vertex array data may be stored in buffer objects with the
same format and layout options supported for client-side vertex
arrays. However, it is expected that GL implementations will (at
minimum) be optimized for data with all components represented as
floats, as well as for color data with components represented as
either floats or unsigned bytes.

ToolTech
07-15-2003, 04:46 AM
Robert..

In my case I have just tested with floats instead of shorts and I get the same results :-(

Hmm. Anyoine using VBO with VertexAttrib mixed with VertexPointer ?

Mazy
07-15-2003, 04:54 AM
How often do you rebind the depth?

the 'if(m_rebindDepth)'

since youre defining static buffers you shouldnt rebind them at all, or very seldom.

ToolTech
07-15-2003, 04:57 AM
Just once. The firt time the depth is uploaded and then for each depth update but in the sample app that only occurs once...

ToolTech
07-15-2003, 05:02 AM
I have use Quantify on the app and it shows that glClear and glVertexPointer does all the stalling (97%). They might want to do some flushing etc..

Anyway I don't get any GL errors in the code

ToolTech
07-15-2003, 05:30 AM
Ok. Just found something VERY interesting. The stall occurs when I mix VBO rendering with normal vertex arrays. The moving lamp in the demo is rendered by normal vertex arrays. When I remove the lamp geometry, the FPS goes up to 65 FPS ??

Is it forbidden to mix vertex arrays and vertex buffer objects ?????

PH
07-15-2003, 05:38 AM
Now, that's strange. No, you're allowed to mix and match as you please. I've done that myself with no problems ( vertices, texcoords with VBO, TS vectors using normal arrays ).

ToolTech
07-15-2003, 05:45 AM
Is it ok to do the

gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,0);
gzBindBuffer(GZ_ARRAY_BUFFER,0);

to enable the usage of "normal" vertex arrays ?

obirsoy
07-15-2003, 05:47 AM
Do not test performance with debug builds. Make a release build and then compare the results.

zeckensack
07-15-2003, 05:56 AM
Originally posted by ToolTech:
Is it ok to do the

gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,0);
gzBindBuffer(GZ_ARRAY_BUFFER,0);

to enable the usage of "normal" vertex arrays ?AFAIK yes, but you must resupply all pointers. The GL doesn't maintain extra vertex, texcoord, etc pointers to kick in when you turn VBOs off.

(that's how I understood it anyway)

Korval
07-15-2003, 12:34 PM
Eventually I'd hope VBO to efficient for all vertex formats supported by the hardware, but I think its still early days for the driver support. The spec mentions that float for vertex storage being optimized... cut and pasted from the vertex_buffer_object.txt

It isn't the drivers. It's the hardware. It simply can't read shorts or bytes (except for colors). It was made to read floats, because that's probably the most efficient in terms of reading and rendering.

^Fishman
07-15-2003, 02:14 PM
Originally posted by ToolTech:
You can find the missing files here ...
http://www.tooltech-software.com/downloads/gizmo3d/binaries/win32/win32_runtime.zip

I get 3 FPS using VBO and 30 FPS using the non VBO version.
You didn't include the file i'm missing.

MZ
07-15-2003, 02:49 PM
Both VBO and Non VBO crashes on my system, at the same stage (after it spits out some messages to console, and tries to open window (window's frame appears), then standard M$ crash-message-box. Either with your dll pack or without.
GF3, 44.03, W2K SP2

CybeRUS
07-15-2003, 10:53 PM
ToolTech:
When you using VBO, you can use all memories (System, Video, AGP), and when you call: glBindBuffer(....,0); you just bind zero buffer (system memory) for all gl*Pointer after that.
When you call DrawElement (or so) all enabled data (glEnableClientState(..)) copy to system memory (if it not in system, it's copy, because vertex pointer in system memory), and you see drop perfomance.
For example, second texture coord array located in VBO buffer (in video) or normals array...

You need call glDisableClientState(..) for all data which not using in current DrawElements, and set gl*Pointer for using arrays.

ToolTech
07-15-2003, 11:48 PM
I have got it working now because I had some trouble with mixing the standard vertex array code with the VBO code. However I get no performance increase. The same performance using VBO... .-(

Jan
07-16-2003, 12:51 AM
Well, thatīsomething, at last.

If you use seperated arrays, you can still increase the speed a lot.
Use interleaved arrays instead. Itīs pretty easy to set them up with VBO.
I changed my code from seperated VBO to interleaved VBO and got a 25% speed increase!

But first make sure, that you are really geometry-limited. With only a few polys, you may be fillrate-limited and donīt recognize any speedup when changing to interleaved arrays.
I tested it with 90.000 triangles (220.000 vertices).

Jan.

ToolTech
07-16-2003, 01:56 AM
In my code i have lets say m x n vertices and I render 2 x m vertices at a time due to the fact that otherwise the buffer gets too large.

Now is it better to divide the vertice buffer into n-1 buffers with 2 x m vertices in each buffer instead of one buffer with m x n vertices in it ?

Is there a size limit that makes the VBO run unoptimized just like glDrawRangeElements ?

CybeRUS
07-16-2003, 09:22 PM
And more..
Do not use GL_SHORT for vertex and normals and texcoords, use GL_FLOAT
Do not use GL_BYTE for color, use GL_UNSIGNED_BYTE
Use ELEMENT_ARRAY_BUFFER for index buffer (maybe your indecies is in ARRAY_BUFFER)

DrawRangeElements and DrawElements have same perfomance, because video cards have granular DMA and can read memory with stride or from many streams with same speed as from one place.

Cache all gl states/clientstates yourself, do not call IsEnable and so..

Optimize your data for Pre&Post T&L cache

well, i think you know it already.

jwatte
07-17-2003, 12:33 PM
Why not use GL_SHORT? It's the one non-float format actually supported by original GeForces, if the recommendations of the time can be trusted. And it's half the size, so it transfers twice as fast across the bus.

ToolTech
07-17-2003, 09:04 PM
jwatte is right. It is faster. I have tried to change to GL_FLOAT but then it just takes a longer time to transfer the data.

However I don't get any faster rendering with VBO compared to the non VBO version ? I am not fill rate limited so I am a bit puzzled...

Anyone who can comment on the VBO size question ?

CybeRUS
07-17-2003, 09:04 PM
Because it will converted to float alltime.
If you have large data, you will lost same perfomance on converting shorts to float as on transfering floats

Csiki
07-17-2003, 09:45 PM
Originally posted by ^Fishman:

Originally posted by ToolTech:
You can find the missing files here ...
http://www.tooltech-software.com/downloads/gizmo3d/binaries/win32/win32_runtime.zip

I get 3 FPS using VBO and 30 FPS using the non VBO version.
You didn't include the file i'm missing.

I have the same problem+++. http://www.opengl.org/discussion_boards/ubb/frown.gif
I've downloaded msvcp60d.dll and msvcrtd.dll, but the program make some critical error...
I have a Geforce4Ti4200, AthlonXP1800+, my driver's version is 44.03.
The program was compiled in debug mode? While?

ToolTech
07-17-2003, 10:57 PM
I will post a better IBR demo today that you can have a look at..

Csiki
07-18-2003, 09:30 AM
Originally posted by ToolTech:
I will post a better IBR demo today that you can have a look at..

I will see it after jul. 23. http://www.opengl.org/discussion_boards/ubb/smile.gif