allocating >32MB of agp/video memory

I’m trying to allocate a large chunk of video/AGP memory, but whenever I try to allocate more than 32MB, wglAllocateMemoryNV returns null. Is it possible to allocate more than 32MB? I have 64MB of video RAM and 128MB of AGP RAM. For video, I’m using the parameters (0 read, 0 write, 1.0f priority) and for AGP, I’m using (0 read, 0 write, 0.75f priority).

That’s because of the memory fragmentation. Try to allocate in 1 MB chunks just for example and you should get a bit more. The maximum I got was something around 48 MB on my GeForce 3, but to allocate so much is anyway insane. The remaining MBs are wasted by Windoze, the frame and ZBuffer and so on. I think something like 16 MB is the maximum you should reserve of Video T&L. I am managing it following way:

all x frames check:
-which of my objects makes most less use of T&L (is drawn very rare for example). Kick this one out of it.
-is any of my objects drawn very often and would make lots of use of T&L. Check, if enough space is gotten available and “upload” this one.

Through stealing the card all of it’s T&L memory you for sure won’t make your prog faster, because then you’ll force it to do extreme texture swapping later.

michael

Thanks for the reply. I tried allocating 1 byte at a time to see how high I can go. But for AGP memory (priority = 0.75), the second allocate call always returns null, and for video memory (priority = 1.0), the 4th allocate call always returns null. ie.

wglAllocateMemoryNV(1, 0, 0, 0.75f);
wglAllocateMemoryNV(1, 0, 0, 0.75f); NULL!

wglAllocateMemoryNV(1, 0, 0, 1.0f);
wglAllocateMemoryNV(1, 0, 0, 1.0f);
wglAllocateMemoryNV(1, 0, 0, 1.0f);
wglAllocateMemoryNV(1, 0, 0, 1.0f); NULL!

Hm… well, I manage it following way and it works very fine… a bit complex, but fine:

I reserve one big block of memory for each area. (video/agp/system).
Each region I call a “chain”. When I reserve memory I use my own function for it, taking away the first part of the smallest still big enough remaining “chain” in my list of the wanted memorytype.

void *C09AphLyxMemoryBank::malloc(int size)
{
if(mlpBasePointer==NULL)
{
return NULL;
};

int chain = -1;
int smallest = 0x7FFFFFFF;
void *p = NULL, *p2;

tryagain:

for(int i=0; i<mdwChainCount; i++)
{
if( (mChains[i].mdwSize>=size) && (mChains[i].mdwSize<smallest) )
{
chain=i;
smallest=mChains[i].mdwSize;
};
};

if(chain!=-1)
{
p=mChains[chain].mlpPointer;
mChains[chain].mdwSize-=size;
mChains[chain].mlpPointer=(void*)((char*)mChains[chain].mlpPointer+size);
if(mChains[chain].mdwSize==0)
{
mChains[chain]=mChains[mdwChainCount-1];
mdwChainCount–;
};
}
else
if(!mbIsTAndLMem)
{
p2=realloc(mlpBasePointer,mdwSize+RESIZE_SIZE);
mChains[mdwChainCount].mdwSize=RESIZE_SIZE;
mChains[mdwChainCount].mlpPointer=(void*)((int*)mlpBasePointer+mdwSize);
mdwChainCount++;
mdwSize+=RESIZE_SIZE;
ConnectChains();
goto tryagain;
};
return p;
};

When I free the memory again, I add this chain and see if it’s connectable to one of the others (memory region edges adapt).

void C09AphLyxMemoryBank::ConnectChains()
{
QuickSortChains(mChains,0,mdwChainCount-1);
for(int i=0; i<mdwChainCount-1; i++)
{
if((int)mChains[i].mlpPointer+mChains[i].mdwSize==(int)mChains[i+1].mlpPointer)
{
mChains[i].mdwSize+=mChains[i+1].mdwSize;
mChains[i+1]=mChains[mdwChainCount-1];
mdwChainCount–;
i–;
};
};
};

void C09AphLyxMemoryBank::QuickSortChains(C09AphLyxMemoryChain *data,int from,int to)
{
if(from==to)
return;

int Lo,Hi,Mid;
C09AphLyxMemoryChain T;

Lo = from;
Hi = to;
Mid = (int)data[(Lo + Hi)/2].mlpPointer;

do
{
while((int)data[Lo].mlpPointer < Mid)
Lo++;
while((int)data[Hi].mlpPointer > Mid)
Hi–;

  if(Lo <= Hi)
  {
  	T=data[Lo];
  	data[Lo] = data[Hi];
  	data[Hi] = T;
  	Lo++;
  	Hi--;
  };

} while( !(Lo > Hi));

if(Hi > from)
QuickSortChains(data, from, Hi);
if(Lo < to)
QuickSortChains(data, Lo, to);
};

This way it works very fine for me, hope this may help you out a bit. You should in general do the memory management inside your program, so you have the most control over all.

Michael

BlackJack, AFAIK you should allocate one big fast memory buffer, not many small buffers. Is that what you are doing (couldn’t understand what you mean by “chain”)?

Shlomi.

Quaternation: That’s what I do. Each of my “Memorybanks” has got one chain after the initialization. This is the one big memory block. I have one bank for each memory type. If the memory type is not available the size is of this first chain is zero, so that a reservation try will always fail of course.

So after init it looks like this:

[----------------1---------------]

Now I reserve memory and take away some of the memory and I have now:

[----2----] <-- what i reserved
[-----------1---------] <-- remaining

If I free my reserved memory again, I have first of all, because all freed chains are added at the end by default:
[----------1----------][----2----]

Then I do a quicksort by the memory regions and get again:
[----2----][----------1----------]

The memory regions are checked and combined again if possible to one big chain:
[---------------1----------------]

The result of the bank’s “malloc” is a pointer. The parameters of the “freemem” are the pointer and the size.

Michael

So you have implemented a memory manager that can allocate and deallocate AGP/Video memory. I still not sure about how you use it. If you use it with NV_vertex_array_range then why do you need to allocate small memory parts? just memcpy the static data once, and memcpy the dynamic data every frame (using fences to gain parallelism). Am I missing something here?

Shlomi.

If the “static data” is only semi static you need to actually manage AGP memory like a regular heap.

For a console type game where geometry and enemies are fixed in stone per level, this doesn’t make sense. For something where your working set can change arbitrarily over time, say a game like EverQuest, this is necessary.

Quaternation: Yes, it is a memory manager. I only wanted to show him a possible solution how to solve his problem.

You can reserve memory as system, agp or video if you already know before what’s the most senseful for this vertexbuffer. Or as managed to let the program automatically check all some frames which “place” would be the best for this vertexbuffer (depending on count of draws, count of changes…).
The managed ones always have a local copy of the data. Buffers which aren’t requested as WriteOnly as well. If enough space in the video memory got free again, the best mesh will be put in again. But the Video/AGP memory does not extend or so… in difference to the system memory of course. I decide how much is reserved depending on the maximum available T&L memory, at the moment a fourth, so something around 8 MB on my GeForce 3, but I guess I’ll let it become a bit more, since using s3tc i anyway don’t need so much space for textures anymore as before.
Well, may be an important info: I have very many objects, which are very often drawn per frame. Like trees or houses of same type. Also like zero vertex-animated ones… except the some birds and fishes. The chance for animated objects to get into videomem I set to very low and it happens just, if there’s really tons of free space, cuz you see nothing else. In this case I morph them via VertexProgram then… else with the good old CPU as in the old engine once made for the TNT cards family.

And that system works very fine for me… at least everything I experienced with it yet. Because of AGP… well, I don’t know why, but AGP is useless on my PC. May be 5% faster in best case (if much is culled and so). In general it’s not a bit faster than system memory for me. May be it’s different on other machines, but how shall I test, what I can’t see? Will kick my boss a bit to get a new machine, may be I’ll see a difference on that then.

Speed of memory compared to system memory on my PC:

Video memory : ~2.0…~3.5x
AGP memory : ~1.05x

Michael

[This message has been edited by BlackJack (edited 06-22-2002).]

Well, the AGP supposed to be the ultimate place to store both static and dynamic data before sending it to the card due to the DMA reads. Don’t know why it’s not the situation on your machine, but in general it is much faster.

One more reason on my list I can tell my boss to get a new PC finally. Well, I guess VIA says all, doesn’t it?

Thanks in any case… well, I guess I will use it for my morphdata then. As it’s not slower than system memory at least the ones who’ll buy the game using it will have an improve then… but that I can’t test this at all is… bad.

Michael