glBufferData(GL_STATIC_DRAW) vs DYNAMIC_DRAW

Hi,

My application loads up about 650MB of data into GPU memory. This data is meant to be uploaded once and never modified afterwards. Because of this, I’m using GL_STATIC_DRAW.

I’m using multiple TBOs, the largest of which is about 115 MB. Data is uploaded with multiple glMapBufferRange/glUnmapBuffer cycles, 16 MB at a time. I’m always carefully freeing up the system memory I am using.

Out of curiosity, I compared GL_DYNAMIC_DRAW and GL_STATIC_DRAW.

GL_STATIC_DRAW:

  • GPU memory, as shown by GPU-Z is about 650 MB.
  • process memory usage = 230 MB

GL_DYNAMIC_DRAW:

  • GPU memory, as shown by GPU-Z is about 337 MB.
  • process memory usage = 420 MB

Why on earth is GPU memory only 337 MB in the latter case? Could somebody help me interpret this?

When loading less data, I observe the same thing. 395 MB then 222 MB used, for instance.

Thanks,
Fred
Windows XP SP3 32 bits
nVidia 285.58 drivers
GeForce GT 430 1GB DDR3

NB: Every time a frame is rendered the whole 650 MB are pretty much accessed by my shaders. The frame rate is exactly the same with STATIC or DYNAMIC_DRAW.

Is the GPU memory (as reported by GPU-Z) a reliable value ?
Maybe a special ‘trashable’ memory is used for dynamic draw, knowing that if needed it could be deleted, as it will be reuploaded soon ?

I may be completely wrong, but I think this extra memory is due to duplicated data that OpenGL may keep in order to be updated frequently. If I was designated to implement this, I would try to do something like the double buffer technique, do not touch in the data that will be drawn and update the duplicated data to be drawn next.

You can check memory usage with your GL driver and compare with GPU-Z
http://www.opengl.org/wiki/Common_Mistakes:_Deprecated#glAreTexturesResident_and_Video_Memory
extension for NV = GL_NVX_gpu_memory_info

The GL_*_DRAW options are just hints to the driver about how you intend to use the buffer; drivers are in no way obliged to respect these hints and may place the buffer into any memory region they see fit. The definition of “see fit” here would be totally down to the people writing the driver, and may include consideration of other internal states in addition to the hint you give (or may even completely ignore the hint, which is allowed by the spec).

So in your GL_DYNAMIC_DRAW case it seems possible that the driver decided something like “woah, lots of video memory being used here, better keep some spare for emergency needs and shove these other buffers into system memory”. Or maybe something entirely different - without specific knowledge of driver internals it’s just guesswork.

The lack of observed performance difference is most likely explained by your main program bottleneck being elsewhere.

Sounds right to me.

This isn’t the case. All I have is 7 draw calls - doing a lot of work, yes.

The GL_NVX_gpu_memory_info extension I guess would report different kinds of memory being used. Maybe there is a tool around that displays the various memory sizes (“evited”, “dedicated” and so on)

Even with 7 draw calls you could still be bottlenecked on fillrate, vertex transforms, per-fragment shading, etc. Another possibility is that you’re exceeding a hardware limit (something like GL_MAX_ELEMENTS_VERTICES or GL_MAX_ELEMENTS_INDICES) and are being punted back to software emulation on the vertex pipeline.

There is no bottleneck, I’m just doing an awful lot of processing.

STATIC_DRAW and DYNAMIC_DRAW have no difference it seems, in terms of performance. I said there was no difference in performance because I wanted to rule out a possible swap behavior (there is no swapping).

Now the problem I’ve got is that I am uploading 14 TBOs, totalizing 794MB. I don’t understand why, but I’m running out of video memory (even with STATIC_DRAW). Apparently, the driver doesn’t want to give me access to the whole 1GB of memory.

The last 2 TBOs being loaded are successively 180MB then 123MB. It fails at the 180MB upload. Alone the 180MB works fine though!

Maybe this is worth another thread.