VBO Performance Test

Hi !

I have made a app that uses VBO and on my HW i get very low FPS. I have a GForce 4 Ti 4600 with drivers 44.03.

I would be happy if you would like to test it on Raden HW with VBO support and newer NVidia drivers with HW >= GForce4…

Here is the URL…
http://www.tooltech-software.com/downloads/gizmo3d/binaries/win32/VBO%20Test.zip

Thanx ahead !!!

BTW. You can see som of my IBR stuff in it…

Tested this on Radeon 9700 Cat 3.5. The VBO version runs very slow( 0fps when I press the ‘f’ key ), the non-VBO is quite fast ( 35fps ). The output seems messed up in both versions. Parts of the teapot is missing.

The Teapot is just rendered from one image + depth map. Therefor it is missing a lot of “non visible” patches…

However. you get the same result as I do. The VBO version is SO SLOW !! Strange…

I havent tried that program yet, but im using vbo:s in my own programs on a radeon card ( and its been tested on gf4 and gffx aswell) and there we get a pretty nice preformance boost, so i wouldnt blame the drivers just yet.

“This application has failed to start because MSVCP60D.dll was not found.”
No, I don’t use Visual Studio 6, I use Visual Studio .NET.

You can find the missing files here …
http://www.tooltech-software.com/downloads/gizmo3d/binaries/win32/win32_runtime.zip

I get 3 FPS using VBO and 30 FPS using the non VBO version.

It isn’t a driver issue as ToolTechs test app runs slow on both NV and ATI hardware in VBO mode.

One thing that generally gives me bad performance is when I mess up and get gl errors per frame, but you probably already checked that.

Originally posted by ToolTech:
[b]Hi !

I have made a app that uses VBO and on my HW i get very low FPS. I have a GForce 4 Ti 4600 with drivers 44.03.
[/b]

Hi Anders,

I implemented a VBO path in the OSG a couple of weeks back and found up to 50% peformance boost on coarse grained high polygon models.

However, on models that were composed of then of thosands of small peices of geometry the peformance of VBO is slower than using display lists. I think this is largely down to OpenGL calling overhead swamping the gains from VBO. The use of extensions and having to querry for them at runttime makes doing lots of extension calls expensive :expressionless:

The drivers that I am using are Nvidia’s 43.63 release under Linux. Results will obviously vary on different drivers/OS’s/graphics hardware, but in general my findings have been positive, save crashes reported on Geforce2Gp laptops.

Robert.

Hi Robert.

I get the scary feeling that my usage of shorts for vertex coordinates and mixing VertexAttrib with normal VertexPointer slows it down. In my other apps I also do get a gain but in this case it runs really messy. 10X slower !! How could I detect that using VBO is 10X slower on a HW ?? I mean… VBO should be faster in ANY case right ?

Here is the code used to render

gzVoid gzIBRGeometry: reTraverseAction( gzTraverseAction *actionclass , gzContext *context)
{
if(actionclass->isExactType(gzRenderAction::getClassType())) // Exact a graphic action
{
if(!gzGraphicsEngine::has_vertex_program())
return;

  //gzDepthFunc(GZ_LESS);

  gzPushMatrix();

  gzMultMatrixr(&m_transform.v11);

  if(gzGraphicsEngine::has_vertex_buffer_object())
  {
  	gzULong offsetToDepth=m_width*m_height*sizeof(gzShort)*2;

  	if(m_rebindIndex)
  	{
  		m_rebindDepth=TRUE;

  		if(m_bufIndexID)
  		{
  			gzDeleteBuffers(1,&m_bufIndexID);
  			m_bufIndexID=0;
  		}

  		gzGenBuffers(1,&m_bufIndexID);

  		gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,m_bufIndexID);

  		gzBufferData(GZ_ELEMENT_ARRAY_BUFFER,2*m_width*sizeof(gzULong),m_indexSet->getIndexAddress(),GZ_STATIC_DRAW);

  		if(m_bufID)
  		{
  			gzDeleteBuffers(1,&m_bufID);
  			m_bufID=0;
  		}

  		gzGenBuffers(1,&m_bufID);

  		gzBindBuffer(GZ_ARRAY_BUFFER,m_bufID);

  		gzBufferData(GZ_ARRAY_BUFFER,m_width*m_height*(sizeof(gzShort)*2+sizeof(gzFloat)),0,GZ_STATIC_DRAW);

  		gzBufferSubData(GZ_ARRAY_BUFFER,0,offsetToDepth,m_indexSet->getXYAddress());

  		m_rebindIndex=FALSE;
  	}
  	else
  	{
  		gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,m_bufIndexID);

  		gzBindBuffer(GZ_ARRAY_BUFFER,m_bufID);
  	}

  	if(m_rebindDepth)
  	{
  		gzBufferSubData(GZ_ARRAY_BUFFER,offsetToDepth,m_width*m_height*sizeof(gzFloat),m_depthMap->getArray().getAddress());
  		m_rebindDepth=FALSE;
  	}

  	gzEnableClientState(GZ_VERTEX_ARRAY);

  	gzEnableVertexAttribArray(1);

  	for(gzULong i=0;i<(m_height-1);i++)
  	{
  		gzVertexAttribPointer(1,1,GZ_FLOAT,FALSE,0,(const gzVoid *)(i*m_width*sizeof(gzFloat)+offsetToDepth));

  		gzVertexPointer(2,GZ_SHORT,0,(const gzVoid *)(i*m_width*sizeof(gzShort)*2));

  		gzDrawRangeElements(GZ_TRIANGLE_STRIP,0,2*m_width-1,2*m_width,GZ_UNSIGNED_INT,0);		
  	}

  	gzDisableVertexAttribArray(1);

  	gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,0);
  	gzBindBuffer(GZ_ARRAY_BUFFER,0);

  }
  else
  {

  	gzEnableClientState(GZ_VERTEX_ARRAY);
  	
  	gzEnableVertexAttribArray(1);

  	for(gzULong i=0;i<(m_height-1);i++)
  	{
  		gzVertexAttribPointer(1,1,GZ_FLOAT,FALSE,0,((gzFloat *)m_depthMap->getArray().getAddress())+i*m_width);

  		gzVertexPointer(2,GZ_SHORT,0,m_indexSet->getXYAddress()+i*m_width);

  		gzDrawRangeElements(GZ_TRIANGLE_STRIP,0,2*m_width-1,2*m_width,GZ_UNSIGNED_INT,m_indexSet->getIndexAddress());		
  	}

  	gzDisableVertexAttribArray(1);
  }

  gzPopMatrix();

  //gzDepthFunc(context->depthFunc);

}

}

Originally posted by ToolTech:
[b]Hi Robert.

I get the scary feeling that my usage of shorts for vertex coordinates and mixing VertexAttrib with normal VertexPointer slows it down. In my other apps I also do get a gain but in this case it runs really messy. 10X slower !! How could I detect that using VBO is 10X slower on a HW ?? I mean… VBO should be faster in ANY case right ?[/b]

Eventually I’d hope VBO to efficient for all vertex formats supported by the hardware, but I think its still early days for the driver support. The spec mentions that float for vertex storage being optimized… cut and pasted from the vertex_buffer_object.txt :

2.8A.1 Vertex Arrays in Buffer Objects
--------------------------------------

Blocks of vertex array data may be stored in buffer objects with the
same format and layout options supported for client-side vertex
arrays.  However, it is expected that GL implementations will (at
minimum) be optimized for data with all components represented as
floats, as well as for color data with components represented as
either floats or unsigned bytes.

Robert…

In my case I have just tested with floats instead of shorts and I get the same results :frowning:

Hmm. Anyoine using VBO with VertexAttrib mixed with VertexPointer ?

How often do you rebind the depth?

the ‘if(m_rebindDepth)’

since youre defining static buffers you shouldnt rebind them at all, or very seldom.

Just once. The firt time the depth is uploaded and then for each depth update but in the sample app that only occurs once…

I have use Quantify on the app and it shows that glClear and glVertexPointer does all the stalling (97%). They might want to do some flushing etc…

Anyway I don’t get any GL errors in the code

Ok. Just found something VERY interesting. The stall occurs when I mix VBO rendering with normal vertex arrays. The moving lamp in the demo is rendered by normal vertex arrays. When I remove the lamp geometry, the FPS goes up to 65 FPS ??

Is it forbidden to mix vertex arrays and vertex buffer objects ???

Now, that’s strange. No, you’re allowed to mix and match as you please. I’ve done that myself with no problems ( vertices, texcoords with VBO, TS vectors using normal arrays ).

Is it ok to do the

gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,0);
gzBindBuffer(GZ_ARRAY_BUFFER,0);

to enable the usage of “normal” vertex arrays ?

Do not test performance with debug builds. Make a release build and then compare the results.

Originally posted by ToolTech:
[b]Is it ok to do the

gzBindBuffer(GZ_ELEMENT_ARRAY_BUFFER,0);
gzBindBuffer(GZ_ARRAY_BUFFER,0);

to enable the usage of “normal” vertex arrays ?[/b]
AFAIK yes, but you must resupply all pointers. The GL doesn’t maintain extra vertex, texcoord, etc pointers to kick in when you turn VBOs off.

(that’s how I understood it anyway)