I noticed a while ago that the tnl performance of my terrain engine had dropped.
Back in 2003 my terrain engine was getting 95 Million polys/sec with dets 45.23. I’m not sure at what point it changed but with the latest drivers (67.66) I am getting 30 Million polys/sec.
My hardware and the code have not changed. I have a 1.6Ghz AMD and GF5900 Ultra.
I am using drawelements with vertex and texture coordinate pointers and VAR.
The interesting thing I’ve found is that with the old drivers if I removed the texture coordinate code the performance remained at 95 Million polys/sec. With newer drivers performance doubled from 30 to 60 million polys/sec. It’s as though the newer drivers have a bottleneck with accessing the data from video/agp memory.
I have ruled out any cpu or fillrate bottleneck.
I thought maybe this was just a VAR issue but I’ve just changed it to VBO and I get the same poor performance on the newer drivers ~30 Million polys/sec.
I read on another forum a similar bug (using VAR, getting big performance drop on latest drivers)
Were you using VAR and requesting the buffers in video memory? (ie. not AGP memeory)
The issue turned out to be that latest drivers are more optimized for VBO(as they should be) and VBOs seem to make use of AGP memory.
End result was that the motherboard/AGP drivers were not installed on the tested machines-> poor AGP performance -> poor VAR/VBO performance.
(They seemed to indicate that VAR was giving AGP mem even when the old values for video were used)
That’s a very nice number. I guess it’s only for the geometric part ? Ie. without textures, advanced lighting, pixel shaders or geomorphing ?
My own terrain engine in that case is getting between 60 and 80 MTris/sec, but that’s on a Radeon 9700. I’ve never tested it on better GPUs. But next week i should get a X850 XT PE - it better fly
Originally posted by Korval: You’re getting these on recent drivers?
Yes, 71.8.
Originally posted by Ysaneya: That’s a very nice number. I guess it’s only for the geometric part ? Ie. without textures, advanced lighting, pixel shaders or geomorphing ?
Its rendering the geometry and a single 4kx4k texture. No advanced lighting or pixel shaders. The LOD code is mainly on but the code that renders the joins between tiles of different LOD is off because it uses immediate mode Thats next on my list to optimise. With the join rendering code on performance drops to 90MTris/sec. Fog is switched off.
then performance stays high and I think the buffers are in video memory.
I would like to avoid continualy recreating and initialising the buffers as I think it is negatively effecting performance and I dont think it should be necessay to call bufferdata every time. Am I doing something wrong here or does it look like a driver issue?
I’m only uploading vertices for tiles in the terrain that have changed their lod or have just appeared in view. Tiles already in view are not touched.
I dont think glMapBuffer loads the data to AGP memory only since I can get video memory performance so long as I recreate the buffer before I call mapbuffer.
So, if sum up correctly :
1- you build the terrain with glBufferData
2- now you fly over it with high performance for quite some time (few seconds at least)
3- at some point you update your terrain with glMapBuffer
4- now you fly over it with low performance for quite some time (few seconds at least)
5- everytime you update your terrain again with glMapBuffer you still get low performance.
And if you replace 3 and 5 with glBufferData calls, the upload time is slower but the rendering time is faster, is that it ?
Originally posted by ffish: If so, can you use glBufferSubDataARB? Dunno anything about it since I’ve never used it, but it might help if you’re only changing part of your data.
Unfortunately not, I thought that may be a solution to, but when I tried it it crashed and in the spec it says
“It is an INVALID_OPERATION error to call BufferSubDataARB to modify
the data store of a mapped buffer.”
Originally posted by vincoof:
[b]So, if sum up correctly :
1- you build the terrain with glBufferData
2- now you fly over it with high performance for quite some time (few seconds at least)
3- at some point you update your terrain with glMapBuffer
4- now you fly over it with low performance for quite some time (few seconds at least)
5- everytime you update your terrain again with glMapBuffer you still get low performance.
And if you replace 3 and 5 with glBufferData calls, the upload time is slower but the rendering time is faster, is that it ?[/b]
Thats pretty much it but I dont replace the mapbuffer with bufferdata calls. I call bufferdata with a null pointer and still call mapbuffer.
I create 90 buffers, each buffer holds the vertex data for one tile of terrain. Although the entire terrain is made up of 1024 tiles, no more than 90 will be visible at any one time. I fill the buffers with the vertex data for the visible tiles. As the user moves over the terrain or turns the buffers that hold tiles that are out of view can have there vertex data replaced with tiles that have come into view. Usually only one or two buffers have their vertex data changed per frame.
It seems as though the bufferdata calls give me video memory but as I move over the terrain and I map and unmap the buffers(without calling bufferdata), the data is moved to agp.
i have a similar problem with the most recent drivers…
when a static draw vbo is created and filled a few times with buffer sub data it stays in video memory for some time, but after some iterations it seems, that the vbo is moved to agp memory and not being moved back to video mem even if no further data updates are done… this behavior was introduced with newer drivers (im afraid i presently dont know from which version on… if somebody is interested i will look it up once im back at work).
maybe there is a problem with the memory management system in the driver that prevents the swapping of static vbos that have been moved to agp memory back into video memory without reinitializing them.
You have 90+ VBO’s and you setup glVertexPointer 90+ times in a frame. This can be drawback. Did you try to put data in one big VBO and setup glVertexPointer once. NV says that they do very big work in driver in case when you using VBO and change vertex pointer .
Yes, but there is also a vendor specific limit to the size of a VBO. You go over that limit and you get serious performance drop.
90 buffer changes per frame doesn’t seem excessive…huh, otherwise we might as well be using VAR if we have to micro-manage our data within VBO’s!!