AMD Transform Feedback GPU Memory Leaky

Hi everyone.

The scenario is the following:
I wrote a very simple example program where I generate five tris from five points with a geometry shader. I stream those five tris to a vertex buffer via transform feedback and render it with glDrawArrays. The loaded vertex buffer for tfb is just as large as the geometry output requires, i.e. 5324Bytes. 24Bytes are two vec3. Position and color. The output is as expected. The five tris are rendered.

The observed problem:
As I figures out, exactly when calling glDrawArrays DURING transform feedback(I DONT mean rendering the result)run through, the observed GPU memory footprint of the described little program grows by approx. 150 MB. I checked this behavior with Process Explorer by putting a glFlush and glFinish after the mentioned glDrawArrays for the input vertices to the transform feedback stage(!).

Pseudo:


// during initialization of program
// transform feedback stage

bind feedback
render five points and stream out to vertex buffer// here the memory grow by 150 MB
unbind feedback

// in runtime loop
// render stage

render result of transform feedback


When I check the GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN by glGetQueryObjectuiv, I correctly get 5. When I release the transform feedback buffer along with the streamed out vertex buffer, the 150 MB footprint doesn’t disappear. When I release every single gl object in that simple program, the footprint of 150 MB doesn’t disappear.
When I do not render with transform feedback, i.e. render an empty buffer, the memory does not grow by 150 MB.

Additionally, I just observe that memory grow on my PC with AMD Radeon HD 7700 Series.
When I check the same program on my other PC with nvidia card(gforce 650GTX TI), there is no grow in GPU memory.

The real problem is in my production code, where I stream out approx. 350000 Primitives(I assume a lot more) and the freaking GPU Memory(1GB) is running full and havok!
350000 Prims * 3(Tri) * 12 Byte(1xvec3) = 12.6 MB right?

Does someone observed the same problem?
I updated to the latest drives available today. Catalyst Version 13.4

Thanks
Alex

Ok, one step ahead.

When I disable the geometry shader, and just stream out two triangles via vertex shader only, those 150 MB are not allocated. Here is the geometry shader I use, that produces 150 MB GPU memory allocation on my AMD card:



#version 150

// can be specified here or is retrieved from programparameteri
//layout(points) in;
layout (triangle_strip) out;
layout (max_vertices=3) out;

in VertexData{
	vec3 color ;
} vertices_in[1] ;



out vec3 oa_position ;
out vec3 oa_color ;



uniform mat4 u_mat_proj ;
uniform mat4 u_mat_view ;


void main()
{
	//for( int i=0; i<1;/*gl_in.length();*/ ++i )
	int i=0;
	{
		oa_position = vec3(gl_in[i].gl_Position) ;
		oa_color = vertices_in[i].color ;
		EmitVertex() ;
		oa_position = vec3(gl_in[i].gl_Position + vec4(1.0,0.0,0.0,1.0));
		oa_color = vertices_in[i].color ;
		EmitVertex() ;
		oa_position = vec3(gl_in[i].gl_Position + vec4(-1.0,1.0,0.0,1.0));
		oa_color = vertices_in[i].color ;
		EmitVertex() ;
	}
	
	EndPrimitive() ;
}


It should just produce one tri for one input primitive(point).

Any idea?

Thanks
Alex

My guess would be that the additional GPU memory is needed for internal purposes by the driver for geometry shaders. I’d bet that if you would have multiple of different geometry shaders the GPU memory usage won’t grow with it. The reason for not freeing that memory is probably to avoid keep allocating/deallocating those internal buffers every single time you use geometry shaders.

Btw, I wouldn’t consider it a memory leak if the memory is freed once you delete the context. Try deleting the context and checking the memory usage. I’d bet it will disappear.

You cannot expect a driver not needing some GPU memory for internal purposes for whatever reason, you also cannot expect a feature to work exactly the same on two GPUs from different vendors.

Hi random_dude.

Since I can exactly specify how much memory my app should use for gs and transform feedback, there is no need to allocate as much memory. The transform feedback is limited to the amount of memory you give it via the output VBO and the gs is limited by that implicitly and the the output layout specifier. there you specify how many output vertices you give the pipe per input primitive! 150 MB would equal to some million vertices…

Plus, that behavior is getting worse if I use multiple tfb buffers. As said, the GPU memory is running full in no time.

Once again, I was talking about internal purposes, not about your application’s data.

If you would take the time and look up some publicly available documents, you would see that some GPUs require some internal buffers for communication between the geometry shader and previous/subsequent shader stages.

First of all, 150 MB is nothing I would consider some internal storage since GPU memory is very rare. Plus, reference that “publicly available documents” and specify the amount of memory required by the shader stage instead of just claim that rudely.
Second. Playing around with the gl context is something I do not consider. I dont think it is very smart to invalidate or recreate a gl context just after using a gs in tfb.

I never told you that you should recreate the context, I just told you that it’s not a memory leak if the memory gets released with the context.
And seriously, yes, 150MB is not a small chunk of memory, but is it really that much with today’s GPUs having GBs of it?
I’m not trying to justify the use of that 150MB of memory, what I’m saying is that t cannot be considered memory leak if releasing the context frees it up.

Releasing the context would mean to break the rendering loop. I dont want to do that. Plus, think about it, in what relation stands the execution of a random gs/tfb combination with the freeing of a whole context? For me, the rendering context is untouchable. It doesn’t need to be released for the whole duration of the application. Because there is simply no need for that. I never consider touching glx/wgl because of pure gl calls. For me, that would be a bug. Plus, why is it possible for nvidia to do it without using 150 MB of GPU memory. Plus, my graphics card does not have tons of “GBs”, I only have 1 GB and the system uses 300 MB of it constantly, making my GPU memory limited to 700 MB for me and other apps!

So please just specify those documents of yours if you like to make a honest contribution, so maybe my questions will be answered!
Or if you dont want to reveal those docs, at least specify what for that amount of memory would be used “internally”. All data that a gs produces goes straight to the buffer. So the buffer are the “internal” storage and those buffers are also limiting the output of those stages. Read the specs for that.

http://www.opengl.org/registry/
and then
GL_ARB_transform_feedback2 and
GL_ARB_vertex_buffer_object and
GL_ARB_geometry_shader4

If you would take the time and look up some publicly available documents, you would see that some GPUs require some internal buffers for communication between the geometry shader and previous/subsequent shader stages.

And if you would take the time and look up some publicly available documents, you would see that these “internal buffers” are internal to the GPU. They don’t take up GPU memory and they don’t take up system memory. They don’t get reallocated every frame, and they don’t hang around after a frame and accumulate.

So no, this is not due to those “internal buffers” for inter-stage communication.

what I’m saying is that t cannot be considered memory leak if releasing the context frees it up.

That’s a pointless conversation of semantics. It’s clear what he means: the driver is constantly allocating more and more memory. Whether that memory is forgotten about or not, it shouldn’t be constantly allocating new memory like that every frame.

You can argue that it’s technically not a “memory leak” by some arbitrary definition, but it’s not helping to solve the problem.

[QUOTE=Alfonse Reinheart;1251221]And if you would take the time and look up some publicly available documents, you would see that these “internal buffers” are internal to the GPU. They don’t take up GPU memory and they don’t take up system memory. They don’t get reallocated every frame, and they don’t hang around after a frame and accumulate.

So no, this is not due to those “internal buffers” for inter-stage communication.[/QUOTE]

You are confusing things with LDS or GDS. The communication buffer between geometry shaders and other shader stages is not on-chip memory but actual video memory.

That’s a pointless comment. He didn’t say it at all that it allocates it every frame. It allocates it only once and it never allocates any more, indifferent of how many frames/shaders you have.

Just check these documents:

http://www.x.org/docs/AMD/

Then check in R6xx_3D_Registers.pdf the register SQ_GSVS_RING_BASE. There is a GPU address needed there to specify the ring buffer used for communication between geometry shader and other shader stages.

Ill check out the link. But I still have to mention that those 150 MB is not a one time allocation.
In my production code, I have a 30 tfb/gs shader programs where I choose one in a round robin fashion as long as other shader programs are busy with transform feedback. And if I use more of those combinations, the memory still grows enormously. I can not see any relation to the primitive count. So, the 150 MB is only for the simple scenario where I only stream out 6 vertices. I would say, it relates somehow to the number of used tfb/gs combinations, not the streamed out primitives. I can tell you that if I stream out approx 800k Prims, the memory of 1GB is full. So Alfonse is right. There is a constant memory allocation but there is no freeing of that memory.

Ok, I check that doc in particular. So it uses a ring buffer. According to that document, the ps also uses such a ring buffer and temp buffers. It doesn’t absolutely convince my that this might produce that overhead.

What I also did, but not realized before, is that my problem has nothing to do with transform feedback.
If I simply render the result from the gs, so no stream out, the memory just grows. So that problem is totally gs specific. And also totally AMD specific. I do not have the same problem on the nVidia card.

It would be nice if someone could just execute a very simple geometry program and tell my what you see in the process explorer’s GPU memory graph?! Maybe someone has a very simple gs sample program ready to go… That would be great!

[ATTACH=CONFIG]429[/ATTACH]
Here is what my simple gs program (generate 5 triagnles from 5 input points) causes on my GPU memory graph(AMD card). Monstrous. That peak is not seen on the nVidia card!

[EDIT]
I downloaded the gDEB from the AMD site in order to check what is causing that memory usage. The gdeb only shows 20 MB used for the same program that shows that 150 MB plus in the process explorer.
[ATTACH=CONFIG]430[/ATTACH]
BTW. those 20 MB is exactly what I see in the process explorer on the nVidia card for the same program.

Thanks for helping me out.

Okay, apparently I misunderstood your original problem. That in fact means that there is something wrong.

Hi,

Here’s a question: you create multiple geometry shaders, use them for transform feedback. You said memory usage reaches 1GB. After that, if you delete all shaders and other objects, and call glFinish, do you still see 1GB being used, despite you deleted everything?

Ups, sorry, I think my last post disappeared.

I made some tests and filled the GPU memory completely.

I made two observations:

1: the memory consomption steps correlate with the number of independent geometry shaders! I can see this effect only on my AMD card.

2: after filling the whole GPU memory I realized a memory drop. I do not release any memory, so I think it is the driver that releases that memory. If the GPU memory fills up nearly to the maximum, 1GB in my case, I realized that drop in the process explorer. The gDEB and the process explorer graphs correlate there. It makes sense to me now. The thing with the internal memory makes sense now. Although I don’t understand why so much memory is required. Anyway. Maybe, if the driver realizes a memory shortage, it releases its internal memory. See for yourself.
[ATTACH=CONFIG]134[/ATTACH]
[ATTACH=CONFIG]135[/ATTACH]

As said. My nVidia PC(driver,…) does not have that behavior.

But thanks

Edit:
The gDEB shows a memory consumtion of 768 MB. That’s ok. It correlates with the process explorer.

Ok, I think the problem is what the process explorer shows is not correct.

When I allocate 600 MB memory with glBufferData, gDEB shows that, but process explorer shows this allocated GPU memory just from time to time. Totally strange.
In other situations, it is exactly the way around. When I deallocate that 600 MB GPU memory with glBufferData to 0MB for a buffer, gDEB shows that fact correct, but process explorer just shows that I allocated 600 MB GPU memory. You can see that fact in the images in my last posts.

So in my opinion and what I tested is that process explorer shows the false data in its GPU memory graph. Or, it shows the memory consumption correctly, but the driver does some optimization by not deallocating that memory if it is deallocated but mark it as free, or something like that. Process explorer just sees that memory allocated but gDEB sees it as free… or I just dont understand what being shown in that graph… x)

However. I think I have to rely on gDEB for real-time(what is currently going on) information. Alot of steam for nothing. Thanks anyway.