VAR&VA&Imidiate mode

okapota · June 5, 2002, 4:05am

i read that mixing Var and the other two will yeild poor results. does mixing means disabling var and than drawing in imidiate mode in the same frame? i guess so, just want to be sure.

AdrianD · June 5, 2002, 4:12am

did you used GL_NV_vertex_array_range2 ?
the main difference to the original extension ist that there is a new function:

gxDisableClientState(GL_VERTEX_ARRAY_RANGE_WITHOUT_FLUSH_NV);

unless you use this function, you will never get acceptable performance, when mixing VAR with other modes, because you will allways get a VAR flush, when you disable it…

okapota · June 5, 2002, 9:49am

so if i have lots of meshes, and only half of them can be put together in AGP at the same time, wont it be faster to draw what you can with VAR and the rest, or at least part of the rest with regular VA instead of memcopying every frame?

AdrianD · June 5, 2002, 10:11am

If you want to mix VAR with other VAs, use ext_vertext_array_range2 and you will get good performance.
If your geometry does not fit in your vertex memory, i suggest using the NVFence extension, which gives you a good way to syncronize your memcopys with your rendering.(so you can upload new geometry while the GPU is rendering another one)
take a look at the VAR_simple example in the Nvidia OpenGL SDK how it works.

if i can get a VAR buffer + fence extension, i allways draw everyting using VAR, because i can control by myself, what is where(and when) stored. Anyway, if you are using standard VA’s the driver must still copy your vertexbuffers in videomem(or agp) when DrawElements is called… but you have to wait for it, without a chance to do something else at this moment.

okapota · June 5, 2002, 10:42am

i got you. thanx. i know the fence mechanism, and i will use it.

Quaternion · June 5, 2002, 10:51am

And what about static and dynamic data? Won’t it be faster to use all the available AGP memory for static data (so you don’t need to memcpy it every frame), and render the dynamic without VAR?

harsman · June 5, 2002, 11:12am

What prevents you from using VAR with regular system memory? You don’t have to supply it with pointers pointing to memory you got from wglAllocatMemoryNV do you? I haven’t actually implemented VAR-support yet, but I think this should work nicely. Optimizing the use of precious AGP-memory with fence is probably better but this would be dead easy to implement.

Quaternion · June 5, 2002, 12:15pm

harsman, you must disable the vertex array range before you use pointers to memory outside the specified memory range.

AdrianD · June 5, 2002, 12:54pm

Originally posted by Quaternion:
And what about static and dynamic data? Won’t it be faster to use all the available AGP memory for static data (so you don’t need to memcpy it every frame), and render the dynamic without VAR?

the best solution ist to use VAR for both: static and dynamic data. When you use VAs, the driver does the memcopy for you, but it still uses video/agp memory for the vertexdata so the gpu can access them when it comes to process this piece of geometry.

JelloFish · June 5, 2002, 2:34pm

We have VAR implemented in conjunction with Immediate mode, would this cause problems as well? I know when you pass regular memory to glXPointer when VAR is enabled the system grinds to a halt, but I’ve never notied if there was a penalty for mixing immediate with VAR.

Quaternion · June 6, 2002, 12:57am

AdrianD, I am talking about a possible situation, when you don’t have enough AGP memory to store both static and dynamic data. I am sure it will be a waste to recopy all the static data every frame. I am asking what will be faster, using the AGP only for the static data, or divide it into two parts: static and dynamic.

AdrianD · June 6, 2002, 6:10am

Originally posted by Quaternion:
I am asking what will be faster, using the AGP only for the static data, or divide it into two parts: static and dynamic.

use var for static and dynamic data. it’s faster than the other solution.

[This message has been edited by AdrianD (edited 06-06-2002).]

AdrianD · June 6, 2002, 6:25am

Originally posted by JelloFish:
We have VAR implemented in conjunction with Immediate mode, would this cause problems as well? I know when you pass regular memory to glXPointer when VAR is enabled the system grinds to a halt, but I’ve never notied if there was a penalty for mixing immediate with VAR.

it depends on how much you mix them during a frame. in most cases this problems come up, when you are displaying very much geometry of differnd kind(= very much state changes). In most cases the mixing does not slow the system down, but it slows the optimization down: it runs not with same the speed it could run… Especially, when you disable VAR during a frame in the standard way, after submitting a huge amount of geometry to the gpu, the driver flushes all rendering tasks at this time and your cpu waits until the gpu finished them. This destroys any attempt to explot parallelism in your application.

when you are using VARs and want to mix them with any other modes, i strongly suggest to use VAR2 to disable VAR with gl_vertex_array_range_without_flush_nv.
it’s not a guarantee for a faster code, but it makes sure, you are not wasting some processing time…

okapota · June 7, 2002, 10:01am

what will happan if we use regular VA or even imidiate mode without disabling var first?

AdrianD · June 7, 2002, 10:21am

Originally posted by okapota:
what will happan if we use regular VA or even imidiate mode without disabling var first?

my expirience is: crash.

okapota · June 8, 2002, 3:11am

a crash? VA and imidiate mode together doesnt crash from my expuruence. but i guess its different with VAR. one more question, not related to OpenGL but to the subject -
will one big memcpy will be faster then a few smaller one? (besides function call overhead) maybe something complicated concerning the way the cpu works that will make it slower, i dont know.

AdrianD · June 8, 2002, 6:19am

i prefer the big one.

btw. smart compilers produces a single assember instruction, when you are doing a memcpy (MOVSD), and this is the most optimized way you can get for copying memory (on x86 machines of course…)

or do someone else know a much better way ?
(for my expirience is any kind of loop enrolling senseless on pentium machines)