Recently I’ve tried to implement support of VAO, but it works terribly slow (about 1 fps and 200k tri/sec in indexed strips). The example app from ati.com is completely useless: it renders two triangles with 800 fps, and my code can render two triangles with 800 fps as well Did anyone implemented fast geometry transfers with VAO? And how?
That’s how I’ve tryed to do:
create 32 buffers (100 kb each) with DYNAMIC_ATI flag
for each triangles batch
{
update buffer data with DISCARD_ATI flag
render triangles with glDrawElements
}
Recently I’ve tried to implement support of VAO, but it works terribly slow (about 1 fps and 200k tri/sec in indexed strips). The example app from ati.com is completely useless: it renders two triangles with 800 fps, and my code can render two triangles with 800 fps as well Did anyone implemented fast geometry transfers with VAO? And how?
That’s how I’ve tryed to do:
create 32 buffers (100 kb each) with DYNAMIC_ATI flag
for each triangles batch
{
update buffer data with DISCARD_ATI flag
render triangles with glDrawElements
}[/b]
Yeah, it’s slow (about the same speed as Vertex Arrays) if you update the buffer every frame. Not sure if it’s a driver thing, or if it’s just the way the extension works.
You need to use a separate extension that lets you write directly into the buffer, else you get an extra copy step induced by the driver. 200 ktri/second doesn’t sound right, though – sounds as if you’re getting lots of partial evictions into AGP or some simiarly nonsense.
Yeah, it’s slow (about the same speed as Vertex Arrays) if you update the buffer every frame. Not sure if it’s a driver thing, or if it’s just the way the extension works.
It’s actually much slower than VA With VA I get about 2M tri/sec.
Has anyone else tested this?
Is it true that updating the vertex array object every frame is slower than normal vertex arrays?
I have read the spec and this whitepaper (http://www.ati.com/na/pages/resource_centre/dev_rel/ATIVertexArrayObject.pdf) and it seems that you can do it using GL_DYNAMIC_ATI when creating the object and glUpdateObjectBufferATI() every frame.
(I presume that you are creating the objects just one time at the beginning of the program and not every frame…)
What extension do you mean? I believe that VAO should work fast by itself, without any additional extensions.
The VAO extension only lets you point the driver at data you already have in system memory. Thus, the driver will have to copy the data into the VAO. If you change your data every frame, you get a lot of unnecessary memory traffic:
read source data
transmogrify your data
write destination buffer in system memory
read system memory buffer
write AGP memory
Meanwhile, nVIDIAs VAR extension lets you do this:
read source data
transmogrify your data
write AGP memory
Note that the extra overhead doesn’t matter for static data (data you only update once, and then leave in the buffer), only dynamic data.
I believe there’s another ATI extension which allows you to get a pointer into a VAO, but I can’t find it among the public specifications.
Originally posted by jwatte:
[b] The VAO extension only lets you point the driver at data you already have in system memory. Thus, the driver will have to copy the data into the VAO. If you change your data every frame, you get a lot of unnecessary memory traffic:
read source data
transmogrify your data
write destination buffer in system memory
read system memory buffer
write AGP memory
Meanwhile, nVIDIAs VAR extension lets you do this:
read source data
transmogrify your data
write AGP memory
Note that the extra overhead doesn’t matter for static data (data you only update once, and then leave in the buffer), only dynamic data.
I believe there’s another ATI extension which allows you to get a pointer into a VAO, but I can’t find it among the public specifications.[/b]
Perhaps that’s what that new GL_ATI_map_object_buffer extension does…
I believe this scheme is true if you are not doing any writes to other memory areas. Because if you do, cpu write combiners will be flushed. And what about writing to some intermidiate variables? Does that causes write combiners flush?
My point of using VAO is to avoid redundant data copies when doing multipass on dynamic data.
Originally posted by h2:
Then please post your results here.
I have made a quick test this morning… I have modified the SimpleVAO example from ATI to draw a sphere using 5120 tris (just vertex and normals, no texturing) and 15360 vertex (it is a quick test so there are no vertex sharing).
Using a static object (glNewObjectBufferATI(objectSize, verts, GL_STATIC_ATI)) works ok. It returns 1 as the vertex handle and the app is running at ~600FPS.
Using a dynamic object (glNewObjectBufferATI(objectSize, verts, GL_DYNAMIC_ATI) or glNewObjectBufferATI(objectSize, NULL, GL_DYNAMIC_ATI)) return 0 that means that the buffer can’t be allocated!!!
> I believe this scheme is true if you are
> not doing any writes to other memory
> areas. Because if you do, cpu write
> combiners will be flushed. And what about
> writing to some intermidiate variables?
> Does that causes write combiners flush?
You clearly have to manage your caches and write combiners (nee “line fetch buffers”) correctly. If you spit out an aligned block of 32 bytes (or better yet, 64 bytes) at a time, then that will write combine correctly no matter what the other memory traffic is.
Then you can make sure to manage your caches correctly. As there’s 8 kB of Dcache on a Pentium IV, you probably don’t want to use more than 4 kB of “auxiliary data” (stack + coefficients + whatever), and process your vertices in 4 kB input data chunks, doing a full pre-read (not just pre-fetch) of the data, so you know it’s going to sit in L1 and not confuse your LFBs.
If your data is very scattered, you should probably look into a way to make it less so
> My point of using VAO is to avoid
> redundant data copies when doing multipass
> on dynamic data.
You could get the same win with LockArraysEXT(), assuming the Radeon driver copies data to AGP memory when you lock and/or re-set the array pointers (which it should). However, in both VAO and LockArrays cases, you’ll have one redundant copy into system memory where you are generating your data.
Recently I’ve tried to implement support of VAO, but it works terribly slow (about 1 fps and 200k tri/sec in indexed strips). The example app from ati.com is completely useless: it renders two triangles with 800 fps, and my code can render two triangles with 800 fps as well Did anyone implemented fast geometry transfers with VAO? And how?
That’s how I’ve tryed to do:
create 32 buffers (100 kb each) with DYNAMIC_ATI flag
for each triangles batch
{
update buffer data with DISCARD_ATI flag
render triangles with glDrawElements
}[/b]
I think you are running into a synchronization issue that has since been fixed. If this is the issue, I believe you may be able to double buffer the VAO’s on your own to avoid the sync. Just allocate 64 instead of 32 and use half every other frame. Would it be possible to get a copy of the app to confirm that this is fixed?
As for the question with MapObjectBuffer, it is implemented. I think we held off publishing the spec publicly because of the possibility of an minor 11th hour revision to the interface.
-Evan
Oh, and as for the demo, it was intended to just be as simple as possible to avoid confusing people that just wanted to learn VAO. If you think we went to far on that one, that sort of feedback is good to help us decide what type of content to include.
[This message has been edited by ehart (edited 02-20-2002).]
[This message has been edited by ehart (edited 02-20-2002).]
Probably not. There is no way to tell how particular ICD is managing CVA. Just look at my results for CVA; they are actually slower than VA. Unfortunately, all that IHVs need is to make Q3 run fast, nothing else
Originally posted by kehziah:
Note that the driver I have is more recent that the one you used (1.3.2483 vs 1.3.2475). Streaming VAO has pretty good performance.
Kehziah,
How can I know the driver version? I got it from Display Properties -> Settings -> Advanced and I got 5.13.01.6015. This number is not similar to yours…
Thank you.
[This message has been edited by Zak McKrakem (edited 02-21-2002).]