Lower VAR/VBO performance with newer NVidia drivers

I noticed a while ago that the tnl performance of my terrain engine had dropped.

Back in 2003 my terrain engine was getting 95 Million polys/sec with dets 45.23. I’m not sure at what point it changed but with the latest drivers (67.66) I am getting 30 Million polys/sec.

My hardware and the code have not changed. I have a 1.6Ghz AMD and GF5900 Ultra.

I am using drawelements with vertex and texture coordinate pointers and VAR.

glVertexPointer(3, GL_FLOAT, 5 * sizeof(GLfloat), pMeshV);
glTexCoordPointer(2,GL_FLOAT,5 * sizeof(GLfloat), pMeshV+3);

glDrawElements(GLMode,count,GL_UNSIGNED_SHORT,pMeshI);

The interesting thing I’ve found is that with the old drivers if I removed the texture coordinate code the performance remained at 95 Million polys/sec. With newer drivers performance doubled from 30 to 60 million polys/sec. It’s as though the newer drivers have a bottleneck with accessing the data from video/agp memory.

I have ruled out any cpu or fillrate bottleneck.

I thought maybe this was just a VAR issue but I’ve just changed it to VBO and I get the same poor performance on the newer drivers ~30 Million polys/sec.

Can anyone shed any light on this?

I read on another forum a similar bug (using VAR, getting big performance drop on latest drivers)
Were you using VAR and requesting the buffers in video memory? (ie. not AGP memeory)

The issue turned out to be that latest drivers are more optimized for VBO(as they should be) and VBOs seem to make use of AGP memory.

End result was that the motherboard/AGP drivers were not installed on the tested machines-> poor AGP performance -> poor VAR/VBO performance.
(They seemed to indicate that VAR was giving AGP mem even when the old values for video were used)

I’ll dig out a link if you want…

Edit: Link
http://www.gamedev.net/community/forums/topic.asp?topic_id=291251&PageSize=25&WhichPage=1

Thanks for the link. Yes on newer drivers VAR seems to be allocating AGP instead of video memory even though I’m requesting video memory.

I hope I’ll be able to get the VBO performance up to VAR performance, though if it only ever uses AGP thats probably impossible.

Ok, I have it working at 105 Million polys/sec now, VBO obviously does use video memory fine.

I was allocating too much memory so VBO made it AGP.

Originally posted by Adrian:
[b]Ok, I have it working at 105 Million polys/sec now, VBO obviously does use video memory fine.

I was allocating too much memory so VBO made it AGP.[/b]
LOL… look at those numbers :slight_smile:

Makes me feel old.

You’re getting these on recent drivers?

Good. I’m looking into getting a GeForce 6600, so I was kinda hoping that their drivers hadn’t degraded in performance.

you want to see some of the new 3dlabs cards - I’m getting 160mil/sec with 1 directional light!

That’s a very nice number. I guess it’s only for the geometric part ? Ie. without textures, advanced lighting, pixel shaders or geomorphing ?

My own terrain engine in that case is getting between 60 and 80 MTris/sec, but that’s on a Radeon 9700. I’ve never tested it on better GPUs. But next week i should get a X850 XT PE - it better fly :slight_smile:

Y.

Originally posted by Korval:
You’re getting these on recent drivers?

Yes, 71.8.

Originally posted by Ysaneya:
That’s a very nice number. I guess it’s only for the geometric part ? Ie. without textures, advanced lighting, pixel shaders or geomorphing ?

Its rendering the geometry and a single 4kx4k texture. No advanced lighting or pixel shaders. The LOD code is mainly on but the code that renders the joins between tiles of different LOD is off because it uses immediate mode :slight_smile: Thats next on my list to optimise. With the join rendering code on performance drops to 90MTris/sec. Fog is switched off.

I’m having a problem with VBO whereby over time performance drops from 100 MTris/sec to 30 MTris/sec.

Its as though the buffers are being moved from video memory to agp memory when I bind and map to a buffer.

I initialise the buffers like this

	glBindBufferARB(GL_ARRAY_BUFFER_ARB, bufferObject[i]);

	glBufferDataARB(GL_ARRAY_BUFFER_ARB, 330240, NULL, GL_STATIC_DRAW_ARB);

I’m sure at this point the buffers are in video memory.

Before I refill the vertex buffers I do this

	glBindBufferARB(GL_ARRAY_BUFFER_ARB, bufferObject[VArrayNum]);

	pMeshV2 = (float*)glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);

I think at this point the buffers are being moved to agp memory.

If I recreate the buffer after the bind with this

	glBufferDataARB(GL_ARRAY_BUFFER_ARB, 330240, NULL, GL_STATIC_DRAW_ARB);

then performance stays high and I think the buffers are in video memory.

I would like to avoid continualy recreating and initialising the buffers as I think it is negatively effecting performance and I dont think it should be necessay to call bufferdata every time. Am I doing something wrong here or does it look like a driver issue?

Are your vertices changing each frame?

If you do a glBufferData each frame that’s no better than using VARS since you’re uploading the data to video memory each frame.

It’s possible that glMapBuffer loads the data to AGP memory where you can access it, but I’m not sure about this.

Is your object morphing or something? you can’t get away with matrix transforms?

I’m only uploading vertices for tiles in the terrain that have changed their lod or have just appeared in view. Tiles already in view are not touched.

I dont think glMapBuffer loads the data to AGP memory only since I can get video memory performance so long as I recreate the buffer before I call mapbuffer.

If so, can you use glBufferSubDataARB? Dunno anything about it since I’ve never used it, but it might help if you’re only changing part of your data.

So, if sum up correctly :
1- you build the terrain with glBufferData
2- now you fly over it with high performance for quite some time (few seconds at least)
3- at some point you update your terrain with glMapBuffer
4- now you fly over it with low performance for quite some time (few seconds at least)
5- everytime you update your terrain again with glMapBuffer you still get low performance.

And if you replace 3 and 5 with glBufferData calls, the upload time is slower but the rendering time is faster, is that it ?

Originally posted by ffish:
If so, can you use glBufferSubDataARB? Dunno anything about it since I’ve never used it, but it might help if you’re only changing part of your data.
Unfortunately not, I thought that may be a solution to, but when I tried it it crashed and in the spec it says
“It is an INVALID_OPERATION error to call BufferSubDataARB to modify
the data store of a mapped buffer.”

Originally posted by vincoof:
[b]So, if sum up correctly :
1- you build the terrain with glBufferData
2- now you fly over it with high performance for quite some time (few seconds at least)
3- at some point you update your terrain with glMapBuffer
4- now you fly over it with low performance for quite some time (few seconds at least)
5- everytime you update your terrain again with glMapBuffer you still get low performance.

And if you replace 3 and 5 with glBufferData calls, the upload time is slower but the rendering time is faster, is that it ?[/b]
Thats pretty much it but I dont replace the mapbuffer with bufferdata calls. I call bufferdata with a null pointer and still call mapbuffer.

I create 90 buffers, each buffer holds the vertex data for one tile of terrain. Although the entire terrain is made up of 1024 tiles, no more than 90 will be visible at any one time. I fill the buffers with the vertex data for the visible tiles. As the user moves over the terrain or turns the buffers that hold tiles that are out of view can have there vertex data replaced with tiles that have come into view. Usually only one or two buffers have their vertex data changed per frame.

It seems as though the bufferdata calls give me video memory but as I move over the terrain and I map and unmap the buffers(without calling bufferdata), the data is moved to agp.

i have a similar problem with the most recent drivers…

when a static draw vbo is created and filled a few times with buffer sub data it stays in video memory for some time, but after some iterations it seems, that the vbo is moved to agp memory and not being moved back to video mem even if no further data updates are done… this behavior was introduced with newer drivers (im afraid i presently dont know from which version on… if somebody is interested i will look it up once im back at work).
maybe there is a problem with the memory management system in the driver that prevents the swapping of static vbos that have been moved to agp memory back into video memory without reinitializing them.

Thanks, its good to hear its not just me.

You have 90+ VBO’s and you setup glVertexPointer 90+ times in a frame. This can be drawback. Did you try to put data in one big VBO and setup glVertexPointer once. NV says that they do very big work in driver in case when you using VBO and change vertex pointer .

yooyo

Yes, but there is also a vendor specific limit to the size of a VBO. You go over that limit and you get serious performance drop.
90 buffer changes per frame doesn’t seem excessive…huh, otherwise we might as well be using VAR if we have to micro-manage our data within VBO’s!!