ATI VAO beneficial for purely dynamic geometry?

I’ve never used VAO, so I’d like to know some things before starting the implementation. Maybe it’s not worth it and I could save some time

This is an old-school rasterization lib, which unfortunately by design has no object knowledge, it just takes triangles, does some processing and feeds them to the card. Typical geometry transfers range from 15 to 500 vertices (heavily depends on client app).

Questions, questions, questions
1)Are ATI’s geometry transfer extensions worth the implementation hassle when dealing with dynamic vertex arrays (updated once per frame)?
2)Can it reasonably handle vertex arrays that are updated many times more than once per frame, or should I resort to double/triple/whatever buffering of geometry to avoid stalls?
3)I’m inexperienced with this stuff. I’m right now using EXT_compiled_vertex_array plus EXT_draw_range_elements for transfer of dynamically updated vertex arrays. Should I expect a painful transition, or is there an easy way?

Reasoning:
I know I’m pretty much transfer bound. That’s why I’m looking for a better way. I was just wondering whether this can be overcome by using a VAO extension (which would alleviate cache contention issues too, as AFAIK AGP/Vid-Mem would be uncached anyway), or if it’s a general problem with too small vertex arrays, that cannot be solved.

#1: What makes you believe the library is transfer limitted? How many polys are you pushing, and what kind of hardware do you have?

#2: If you’re not using relatively large vertex arrays, then state changes are probably going to be the problem.

#3: It is never a good idea to upload into an array that you’re rendering out of. Since VAO has no synchronization token associated with it, you can’t really tell when you’ve reached that point. So, I would suggest using several arrays.

#4: Why is the data so dynamic? Can’t you just store them in a static VAO?

IMHO the use of VAO is highly recommended. I’m using it with a large number of static and dynamic arrays and it is working properly. The speed gain is really big.
For use the same array with different data, you can upload the data with GL_DISCARD_ATI. AFAIK, this should be similar to D3D VB, and it means that the driver is responsible of allocating a different array of memory if the previous one has not finish rendering. It is transparent to you.
Hope this helps.

I’ve noticed that with this benchmark you can test VAO (static & dynamic). I have not seen the details but this guy has put the source on his website: http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/007493.html

In this topic, you can read that nVidia & ATI are working on a common extension, similar to VAO. So, if you use VAO now it will, probably, not difficult to switch when it becomes available: http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/007473.html

Korval,

Re 1: If I replace the glDrawRangeElementsEXT call with a call to a display list, containing one decently sized triangle (just to make sure the implementation applies state changes), my frame rates increase somewhere between 20~100% (depends on client app). I’m pushing a maximum of about 1,5M vertices per second. That’s a Radeon 8500 on an Athlon XP1800+ (which even in immediate mode can do north of 6M vertices/s).

Re 2: Not a big problem. Or at least nothing I can change. I’ve made sure that all state changes are batched up, as well as all geometry transfers. I do have to change state a lot (obviously with these small geometry batches), but I already do it the best way I can.

Re 3: Yes, that does make sense. So maybe I should rather try to decouple updates and transfers by using multiple arrays. Should make a difference, even without VAO. After all, that might just be the problem.

Re 4: It just is
That lib has no notion of objects. It’s a rasterizer, as in ‘not a renderer’.
Geometry data comes in as transformed primitives. That’s a design limitation which I can’t do anything about.

Thanks so far. #3 really got me thinking. Maybe the whole VAO thing won’t be necessary at all.

New question:
VAO would (if I still should try to use it, which I’m not so sure about atm) allow me to do uncached writes directly to card memory, right?

Originally posted by Cab:
For use the same array with different data, you can upload the data with GL_DISCARD_ATI. AFAIK, this should be similar to D3D VB, and it means that the driver is responsible of allocating a different array of memory if the previous one has not finish rendering. It is transparent to you.
Hope this helps.

That sounds really nice

Originally posted by zeckensack:
New question:
VAO would (if I still should try to use it, which I’m not so sure about atm) allow me to do uncached writes directly to card memory, right?

As you can see in this topic: http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/006981.html
(it is always good to use the search tool )
You need GL_ATI_map_object_buffer in conjunction to get a direct pointer to the memory.