Mixing VAR and system memory

budi · June 2, 2003, 3:17pm

Is it possible to use VAR memory and system memory both for rendering?
I’m trying to render a big model, but I want to put some of the data in VAR (video memory).
Is this possible?

Ostsol · June 2, 2003, 4:17pm

Technically you could mix VAR/VAO/VBO with standard OpenGL vertex arrays, which use system memory. Is there a reason why you can’t simply put the whole model into video memory? How big is the model?

[This message has been edited by Ostsol (edited 06-02-2003).]

imported_jwatte · June 2, 2003, 7:37pm

You can’t use both at the same time.

You can however first enable the VAR, draw some stuff out of VAR, then disable the VAR, draw some stuff out of system memory.

Use the _WITHOUT_FLUSH version of the VAR extension for better performance when doing this.

budi · June 3, 2003, 8:16am

Our current model is about 15 million triangles, but could scaled up to hundred of million triangles (but we do cull most geometry using occlusion culling, view frustum culling and level-of-detail).
The data that has to be rendered at any given frame usually is around 100K to 5 million vertices (depending on location of the user).
My experiment tells me that I can allocate about 100MB video memory to store vertices (approximately 3 million vertices). This is using Geforce FX 5800 Ultra.

So we are trying to implement our own memory management using VAR and system memory.
I tried your suggestion of enabling and disabling VAR during rendering.
It works when I’m storing more than 5% of the data in VAR, and the rest in system memory.
However, whenever only less than 5% of the data in VAR, it gives RUNTIME error.

Currently my rendering code looks like (Also, I’m not really sure whether I set fence correctly, but I tried different combinations and locations with no avail–doesn’t seem to do anything):

glFinishFenceNV(fence);
glEnableClientState(GL_NV_VERTEX_RANGE_NV);
Render the data in VAR (<5% of the total)
glSetFenceNV(fence, GL_ALL_COMPLETED_NV);

glDisableClientState(GL_NV_VERTEX_RANGE_NV);
Render the data in system memory (95%)

Another question that I have is that it seems whenever I set double sided lighting on, it disables VAR (it renders a lot slower).
Anyone knows why?
Or how can use double-sided lighting and VAR at the same time?

Ostsol · June 3, 2003, 8:32am

Originally posted by budi:
Another question that I have is that it seems whenever I set double sided lighting on, it disables VAR (it renders a lot slower).
Anyone knows why?
Or how can use double-sided lighting and VAR at the same time?

It may not be disabling VAR at all, but causing you to run into a performance limitation somewhere else, such as fillrate.

Nicolas_Lelong · June 3, 2003, 9:05am

Hi,

FWIW, using double-sided lighting on GF1/2 boards force the use software lighting, so combining with VAR kills performance.

I think it is ‘fixed’ on GF3+ but did not have any occasion to verify this.

Did you give a try to ‘GL_ARB_vertex_buffer_object’ extension to handle your data ? It may not be too robust for now, but should handle the memory swapping for you.

Cheers,
Nicolas

imported_jwatte · June 3, 2003, 10:21am

You could store all the data in system memory, and memcpy() into VAR the parts you want to draw, and then draw out of there. This is usually known as a “streaming” approach. It’s actually usually pretty fast.

Split your VAR in two halves, and set/test a fence when you cross from one half into the other, using a double-buffering scheme. Treat the area as a streaming FIFO of buffer area for queued card geometry.

budi · June 4, 2003, 5:51pm

Thanks for all the suggestions. About two-sided lighting is not hardware accelerated, I think it is fixed after geforce3.
So I still don’t know what is the problem with double sided lighting and VAR. I’m suspecting it is disabling VAR because the rendering rate is about the same as if I don’t use VAR. Also, it doesn’t seem like a fill-rate issue, changing window size doesn’t speed up the rendering.

Plese disregards the 5% VAR and 95% system memory problem, turns out was I had a bug in my program (inconsistent state during rendering).

You could store all the data in system memory, and memcpy() into VAR the parts you want to draw, and then draw out of there. This is usually known as a “streaming” approach. It’s actually usually pretty fast.
Split your VAR in two halves, and set/test a fence when you cross from one half into the other, using a double-buffering scheme. Treat the area as a streaming FIFO of buffer area for queued card geometry.

This is interesting. However, will it be fast?
The reason we are using our own memory management is that so we can exploit coherence from frame to frame. In fact, most geometry change a bit (10-20%) from one frame to another frame. This is good, because the rendering rate of VAR seems to be directly related to the amount of memcpy (changes in the data). So every frame, we only need to update about 10-20% of the data in video memory and the rest stays static.

However, using the above technique mentioned in the quote, will it not need to memcpy alot of the memory everytime? Unless, I’m missing something here.

Regards,

Budi

[This message has been edited by budi (edited 06-04-2003).]

system · June 5, 2003, 2:24am

So the part that stays static can be kept in video memory all the time.
For the part that is dynamic, why not use VAR for this as well and stream the data to it?

But I guess you have complex issue since you have a lot of geometry. How many megs can you keep in VRAM?

imported_jwatte · June 5, 2003, 6:24pm

memcpy() into AGP + draw is often faster than drawing directly out of system memory, and I’ve never seen it be slower.

Ysaneya · June 5, 2003, 11:24pm

Makes me wonder why the drivers are not internally doing it automatically. I mean how hard would it be to allocate a cache in AGP memory, and when you issue a glDrawElements call, put the vertices in here ? Is there any reason why it’s not already done ?

Y.

imported_jwatte · June 6, 2003, 7:01pm

Ysaneva,

I’m sure the drivers try. If you use LockArraysEXT(), they probably do pretty much that, assuming that you don’t lock more data than the pre-allocated buffer.

However, if you use DrawElements(), the driver doesn’t know how much to copy, so it’ll have to do something less efficient than you could on your own. DrawRangeElements fixes this, but still requires the driver to copy the same data twice if you multi-pass.

Also, you can allocate a pretty big FIFO pipe and use fences to double-buffer, whereas the driver really can’t afford to “splurge” that much on the speculation that you MIGHT need it.

Some reasons I can think of. I’m sure there’s one or two more gnarls that I don’t know about, too