PDA

View Full Version : Lower VAR/VBO performance with newer NVidia drivers



Adrian
01-31-2005, 11:56 AM
I noticed a while ago that the tnl performance of my terrain engine had dropped.

Back in 2003 my terrain engine was getting 95 Million polys/sec with dets 45.23. I'm not sure at what point it changed but with the latest drivers (67.66) I am getting 30 Million polys/sec.

My hardware and the code have not changed. I have a 1.6Ghz AMD and GF5900 Ultra.

I am using drawelements with vertex and texture coordinate pointers and VAR.

glVertexPointer(3, GL_FLOAT, 5 * sizeof(GLfloat), pMeshV);
glTexCoordPointer(2,GL_FLOAT,5 * sizeof(GLfloat), pMeshV+3);

glDrawElements(GLMode,count,GL_UNSIGNED_SHORT,pMes hI);

The interesting thing I've found is that with the old drivers if I removed the texture coordinate code the performance remained at 95 Million polys/sec. With newer drivers performance doubled from 30 to 60 million polys/sec. It's as though the newer drivers have a bottleneck with accessing the data from video/agp memory.

I have ruled out any cpu or fillrate bottleneck.

I thought maybe this was just a VAR issue but I've just changed it to VBO and I get the same poor performance on the newer drivers ~30 Million polys/sec.

Can anyone shed any light on this?

sqrt[-1]
01-31-2005, 01:34 PM
I read on another forum a similar bug (using VAR, getting big performance drop on latest drivers)
Were you using VAR and requesting the buffers in video memory? (ie. not AGP memeory)

The issue turned out to be that latest drivers are more optimized for VBO(as they should be) and VBOs seem to make use of AGP memory.

End result was that the motherboard/AGP drivers were not installed on the tested machines-> poor AGP performance -> poor VAR/VBO performance.
(They seemed to indicate that VAR was giving AGP mem even when the old values for video were used)

I'll dig out a link if you want..

Edit: Link
http://www.gamedev.net/community/forums/topic.asp?topic_id=291251&PageSize=25&WhichPage=1

Adrian
01-31-2005, 02:08 PM
Thanks for the link. Yes on newer drivers VAR seems to be allocating AGP instead of video memory even though I'm requesting video memory.

I hope I'll be able to get the VBO performance up to VAR performance, though if it only ever uses AGP thats probably impossible.

Adrian
01-31-2005, 03:56 PM
Ok, I have it working at 105 Million polys/sec now, VBO obviously does use video memory fine.

I was allocating too much memory so VBO made it AGP.

dorbie
02-01-2005, 12:50 PM
Originally posted by Adrian:
Ok, I have it working at 105 Million polys/sec now, VBO obviously does use video memory fine.

I was allocating too much memory so VBO made it AGP.LOL... look at those numbers :-)

Makes me feel old.

Korval
02-01-2005, 02:44 PM
You're getting these on recent drivers?

Good. I'm looking into getting a GeForce 6600, so I was kinda hoping that their drivers hadn't degraded in performance.

knackered
02-02-2005, 06:15 AM
you want to see some of the new 3dlabs cards - I'm getting 160mil/sec with 1 directional light!

Ysaneya
02-02-2005, 06:20 AM
That's a very nice number. I guess it's only for the geometric part ? Ie. without textures, advanced lighting, pixel shaders or geomorphing ?

My own terrain engine in that case is getting between 60 and 80 MTris/sec, but that's on a Radeon 9700. I've never tested it on better GPUs. But next week i should get a X850 XT PE - it better fly :)

Y.

Adrian
02-02-2005, 05:15 PM
Originally posted by Korval:
You're getting these on recent drivers?
Yes, 71.8.


Originally posted by Ysaneya:
That's a very nice number. I guess it's only for the geometric part ? Ie. without textures, advanced lighting, pixel shaders or geomorphing ?
Its rendering the geometry and a single 4kx4k texture. No advanced lighting or pixel shaders. The LOD code is mainly on but the code that renders the joins between tiles of different LOD is off because it uses immediate mode :) Thats next on my list to optimise. With the join rendering code on performance drops to 90MTris/sec. Fog is switched off.

Adrian
02-09-2005, 01:31 PM
I'm having a problem with VBO whereby over time performance drops from 100 MTris/sec to 30 MTris/sec.

Its as though the buffers are being moved from video memory to agp memory when I bind and map to a buffer.

I initialise the buffers like this

glBindBufferARB(GL_ARRAY_BUFFER_ARB, bufferObject[i]);

glBufferDataARB(GL_ARRAY_BUFFER_ARB, 330240, NULL, GL_STATIC_DRAW_ARB);

I'm sure at this point the buffers are in video memory.

Before I refill the vertex buffers I do this

glBindBufferARB(GL_ARRAY_BUFFER_ARB, bufferObject[VArrayNum]);

pMeshV2 = (float*)glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);

I think at this point the buffers are being moved to agp memory.

If I recreate the buffer after the bind with this

glBufferDataARB(GL_ARRAY_BUFFER_ARB, 330240, NULL, GL_STATIC_DRAW_ARB);

then performance stays high and I think the buffers are in video memory.

I would like to avoid continualy recreating and initialising the buffers as I think it is negatively effecting performance and I dont think it should be necessay to call bufferdata every time. Am I doing something wrong here or does it look like a driver issue?

Aeluned
02-09-2005, 01:38 PM
Are your vertices changing each frame?

If you do a glBufferData each frame that's no better than using VARS since you're uploading the data to video memory each frame.

It's possible that glMapBuffer loads the data to AGP memory where you can access it, but I'm not sure about this.

Is your object morphing or something? you can't get away with matrix transforms?

Adrian
02-09-2005, 01:56 PM
I'm only uploading vertices for tiles in the terrain that have changed their lod or have just appeared in view. Tiles already in view are not touched.

I dont think glMapBuffer loads the data to AGP memory only since I can get video memory performance so long as I recreate the buffer before I call mapbuffer.

ffish
02-09-2005, 06:39 PM
If so, can you use glBufferSubDataARB? Dunno anything about it since I've never used it, but it might help if you're only changing part of your data.

vincoof
02-09-2005, 10:45 PM
So, if sum up correctly :
1- you build the terrain with glBufferData
2- now you fly over it with high performance for quite some time (few seconds at least)
3- at some point you update your terrain with glMapBuffer
4- now you fly over it with low performance for quite some time (few seconds at least)
5- everytime you update your terrain again with glMapBuffer you still get low performance.

And if you replace 3 and 5 with glBufferData calls, the upload time is slower but the rendering time is faster, is that it ?

Adrian
02-09-2005, 11:19 PM
Originally posted by ffish:
If so, can you use glBufferSubDataARB? Dunno anything about it since I've never used it, but it might help if you're only changing part of your data.Unfortunately not, I thought that may be a solution to, but when I tried it it crashed and in the spec it says
"It is an INVALID_OPERATION error to call BufferSubDataARB to modify
the data store of a mapped buffer."

Adrian
02-09-2005, 11:37 PM
Originally posted by vincoof:
So, if sum up correctly :
1- you build the terrain with glBufferData
2- now you fly over it with high performance for quite some time (few seconds at least)
3- at some point you update your terrain with glMapBuffer
4- now you fly over it with low performance for quite some time (few seconds at least)
5- everytime you update your terrain again with glMapBuffer you still get low performance.

And if you replace 3 and 5 with glBufferData calls, the upload time is slower but the rendering time is faster, is that it ?Thats pretty much it but I dont replace the mapbuffer with bufferdata calls. I call bufferdata with a null pointer and still call mapbuffer.

I create 90 buffers, each buffer holds the vertex data for one tile of terrain. Although the entire terrain is made up of 1024 tiles, no more than 90 will be visible at any one time. I fill the buffers with the vertex data for the visible tiles. As the user moves over the terrain or turns the buffers that hold tiles that are out of view can have there vertex data replaced with tiles that have come into view. Usually only one or two buffers have their vertex data changed per frame.

It seems as though the bufferdata calls give me video memory but as I move over the terrain and I map and unmap the buffers(without calling bufferdata), the data is moved to agp.

Chuck0
02-09-2005, 11:45 PM
i have a similar problem with the most recent drivers...

when a static draw vbo is created and filled a few times with buffer sub data it stays in video memory for some time, but after some iterations it seems, that the vbo is moved to agp memory and not being moved back to video mem even if no further data updates are done... this behavior was introduced with newer drivers (im afraid i presently dont know from which version on... if somebody is interested i will look it up once im back at work).
maybe there is a problem with the memory management system in the driver that prevents the swapping of static vbos that have been moved to agp memory back into video memory without reinitializing them.

Adrian
02-09-2005, 11:54 PM
Thanks, its good to hear its not just me.

yooyo
02-10-2005, 03:54 AM
You have 90+ VBO's and you setup glVertexPointer 90+ times in a frame. This can be drawback. Did you try to put data in one big VBO and setup glVertexPointer once. NV says that they do very big work in driver in case when you using VBO and change vertex pointer .

yooyo

knackered
02-10-2005, 05:03 AM
Yes, but there is also a vendor specific limit to the size of a VBO. You go over that limit and you get serious performance drop.
90 buffer changes per frame doesn't seem excessive....huh, otherwise we might as well be using VAR if we have to micro-manage our data within VBO's!!

Adrian
02-10-2005, 09:05 AM
I don't think 90 Vertexpointer calls is causing a significant slowdown. Using one buffer would mean I would have to use int indices instead of shorts which would negatively effect performance.

I have another problem with VBOs. My skydome display list is now crashing but only if I create an index buffer object for the terrain. If I comment out the following line (and obviously the terrain rendering code)

glBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, NewNumIndices*2, &mesh_indices[i][0][0],GL_STATIC_DRAW_ARB);

there is no crash. The crash occurs if the terrain rendering code is commented out and I still create the index buffer. Everything works perfectly if I dont call the display list.

This points to some conflict between the vbo index buffer and display lists.

ffish
02-10-2005, 01:59 PM
Originally posted by Adrian:
"It is an INVALID_OPERATION error to call BufferSubDataARB to modify
the data store of a mapped buffer."But if you're using glBufferSubDataARB, doesn't that remove the need for a mapped buffer? I.e. use glBufferDataARB to initialise the data (once) then glBufferSubDataARB to change it as opposed to mapping the buffer and using the mapped memory to change the data.

Adrian
02-14-2005, 11:08 PM
Originally posted by ffish:
But if you're using glBufferSubDataARB, doesn't that remove the need for a mapped buffer? I.e. use glBufferDataARB to initialise the data (once) then glBufferSubDataARB to change it as opposed to mapping the buffer and using the mapped memory to change the data.Sorry I misunderstood you, yes using buffersubdata instead of the mapped buffer method works fine.

knackered
02-15-2005, 12:38 AM
Doesn't using buffersubdata remove any parallelism?
The app has to wait for the driver to copy the client data, doesn't it? Or does the driver copy the client data to some temporary area of agp mem first?

Adrian
02-16-2005, 03:17 AM
I've got to the bottom of the crash I was experiencing when calling a display list after using vbos.

I wasn't binding my element array buffer to zero after I finished rendering using VBO.

The following line fixed the crash.
glBindBufferARB(ELEMENT_ARRAY_BUFFER_ARB, 0);

It's not clear from the spec that this should be necessary, though I can see why it is.

"While a non-zero buffer object name is bound to
ELEMENT_ARRAY_BUFFER_ARB, DrawElements and DrawRangeElements source
their indices from that buffer object"

No mention of display lists.

yooyo
02-16-2005, 04:23 AM
I've got to the bottom of the crash I was experiencing when calling a display list after using vbos.

I wasn't binding my element array buffer to zero after I finished rendering using VBO.

The following line fixed the crash.
glBindBufferARB(ELEMENT_ARRAY_BUFFER_ARB, 0);

It's not clear from the spec that this should be necessary, though I can see why it is.

"While a non-zero buffer object name is bound to
ELEMENT_ARRAY_BUFFER_ARB, DrawElements and DrawRangeElements source
their indices from that buffer object"

No mention of display lists.
Im using FW 71.80 on both computers.

I have found same bug (or feature?) but it working on 6800GT/6800U and crash on 5600Go. Im using some 3d text (created with wglUseFontOutlines). Result of this call is display lists. I don't know how is this text character objects are created in wglUseFontOutlines call. Using glVertex call's or vertex arrays and glDrawElements call? Or this dependent on hw? Or maybe it is implemented in driver?

yooyo

vincoof
02-16-2005, 02:13 PM
If glBindBufferARB(ELEMENT_ARRAY_BUFFER_ARB, 0) fixes the crash, this probably means you've got some pointer out of bounds when rendering your triangles.

Adrian
02-16-2005, 04:04 PM
My display list consists of just immediate mode calls. The display list is presumably being compiled into drawelement calls that require that no VBO's are bound otherwise pointers will be out of bounds.

The question is, is it the software developers responsibility to turn off any vbo binding before calling a display list. If it is then it should be in the spec, if it isnt then it is a driver bug.

Korval
02-16-2005, 04:17 PM
Well, a display list is defined as a way of storing OpenGL commands to be used later. So, if you didn't use glDraw* in your display list, it can't do so either. Or, more accurately, it is a bug if it is at all evident that it is doing so rather than simply repeating the commands you built into the display list.

Essentially, if you replace your display list code with the actual call sequence you built into the display list, the results should be the same. If they are not, then it is a bug in the display list compiler.

Adrian
02-17-2005, 05:52 AM
Originally posted by Korval:
Essentially, if you replace your display list code with the actual call sequence you built into the display list, the results should be the same. If they are not, then it is a bug in the display list compiler.I just tested it, it is a bug in the display list compiler.

Here's some screenshots of the engine in action. I'm using a 10m resolution DEM of Mt. Whitney and a texture created from the data. There's about 1.2 Million polys on screen, it's running at about 70M polys/sec because I increased the view distance and now my vbo buffers are too large to all fit in video memory.

http://www.mars3d.com/Images/Whitney1.jpg
http://www.mars3d.com/Images/Whitney1WF.jpg
http://www.mars3d.com/Images/Whitney2.jpg
http://www.mars3d.com/Images/Whitney2WF.jpg

zed
02-17-2005, 09:27 AM
nice work adrian, couple of questions
in the wireframe youre using quads, i assume in filled version its triangles or tristrips?
with quads u will get cracks at the edges.
also what size patches have u found to work best 33x33 65x65 or other?
how are the patches fitting together are u tesselating the edge of the lower res patch?
ta zed

Adrian
02-17-2005, 01:36 PM
Originally posted by zed:
nice work adrian, couple of questions
in the wireframe youre using quads, i assume in filled version its triangles or tristrips?
with quads u will get cracks at the edges.
also what size patches have u found to work best 33x33 65x65 or other?
how are the patches fitting together are u tesselating the edge of the lower res patch?
ta zedI'm using tristrips, the patches are 128x128 at their maximum LOD and 16x16 at their lowest. I havent tested any other sizes I just guessed that would be close to optimum. Much smaller than that and you end up with too many tiles and the cpu overhead of dealing with that many tiles. Use bigger patch sizes and there is a lot of wasted gpu power on polys that are off screen, also you have to use INTs instead of shorts for the indices.

This image shows how the tiles with different LOD are joined.
http://www.mars3d.com/Images/JoinWF.jpg

zed
02-17-2005, 04:17 PM
seems simlar to what im doing, (displaying quads for the visual beauty in wireframe) though all my patches are the saem size (33x33 IIRC)
http://uk.geocities.com/sloppyturds/stuff/landscape_LOD_wire.jpg

Ysaneya
02-17-2005, 11:22 PM
Same here. Using 33x33 or 65x65 patches. I've found this technique to be one of the most CPU/GPU friendly. On the other hand, you do not have topologic optimizations (flat patches are as tesselated as mountaineous ones), but i can live with that.

Y.

3B
02-18-2005, 12:17 AM
33x33 here also :) I also allow chopping them up a bit though, so that I have a bit more control over tesselation levels. Not sure how much effect that actually has on my poly counts though, I should probably test that at some point.


Originally posted by zed:
how are the patches fitting together are u tesselating the edge of the lower res patch?
How do you organize the data for that? I store all the verts for a given patch in 1 VBO, then use the indices to drop points from the edge of the high res patch...seems like your way would require storing some extra verts somewhere?

zed
02-18-2005, 09:44 AM
How do you organize the data for that? I store all the verts for a given patch in 1 VBO, then use the indices to drop points from the edge of the high res patch...seems like your way would require storing some extra verts somewhere?the number of verts etc is a non issue, ive given up caring about them since the gf2mx 20mil/sec.
my method
each patch has its own verts, texture data etc. like i said the number of verts is a non issue (vertice popping is very hard to spot, its the texture popping that u need to concern yourself with) , u can happily increase the visabilty to 128km without much of a slowdown, the killer though is the extra textures that need to be generated
---- from my webpage----
basically u can wander around wherever u want without the terrain repeating (though it all looks similar in the current noise generation method im using) all terrain textures/verts etc are generated whilst u move around with no visible slowdowns ( i spread out the work over multiple frames ). In the above shot theres 16 vertices to every square meter, shading is 16x that resolution ie 0.0625m x 0.0625m. The visibility above i believe is 2km (increasing/decreasing has minor impact on the framerate). im using LOD with very little popping during movement.
-------

Adrian
02-22-2005, 03:22 AM
Originally posted by Ysaneya:
My own terrain engine in that case is getting between 60 and 80 MTris/sec, but that's on a Radeon 9700. I've never tested it on better GPUs. But next week i should get a X850 XT PE - it better flyDid you get the X850? Did it make a lot of difference to the performance?

Adrian
02-24-2005, 06:55 AM
I just uploaded a demo, if some people would be kind enough to test it out before I publicly release it. I'm interested in the TPS (tris/sec) on the newer ati/nv cards.

Thanks.

http://www.mars3d.com/Software/EarthExplorer.zip

AlexN
02-24-2005, 08:22 AM
Crashed after a few ticks into the loading.
Radeon 9800 Pro, 5.1 drivers

Adrian
02-24-2005, 09:19 AM
How many ticks(dots) before it crashed. Thanks.

AlexN
02-24-2005, 10:32 AM
Crashed once or twice with 2 visible, once with 4, and several times with 3 visible.

Adrian
02-24-2005, 11:59 AM
Goodness knows whats going on there then.

Anyone got it to work?

yooyo
02-24-2005, 01:08 PM
6800U with FW 75.90.
TPS is from 90 to 113M.

5600Go with FW 71.80
TPS is ~30M
yooyo

Chuck0
02-24-2005, 11:32 PM
Originally posted by Adrian:
How many ticks(dots) before it crashed. Thanks.4 dots are displayed then it crashes. 5.1 catalyst drivers with radeon9700pro

Adrian
02-28-2005, 09:23 AM
Thanks for the feedback guys. Looks like I have a bug that only manifiests itself on ATI hardware....again. I keep having this problem so I'm going to have to buy a 9600.