Low framerate during first seconds.

Hi,

I’m having a really strange performance issue occurring during the first seconds of my app.
When launching the app, it’s a three cases scenario:
1)Some erratic frames at the very beginning, then 70+fps.
2)Some erratic frames at the very beginning, then ~12fps, then, after a few seconds, BAM, 70+fps.
3)Some erratic frames at the very beginning, then stuck at ~12fps.

Camera is static, so is the scene.

I used nsight to try figure out what’s happening, especially in case 2 where I can see what are the differences.
When it’s rendering at ~12fps, some glDrawRangeElementsBaseVertex calls takes a lot of time, SwapBuffers calls too, and after switching to “cruise speed”, those calls are taking a more moderate amount of time

Memory usage is the same in all cases.
CPU Usage is a bit higher when stuck at 12fps, but nothing remarkable.

My scene is really simple (only a few cubes), but I use a lot of post process (SSAO, SSR). Reducing the framebuffer size to a smaller one doesn’t change anything (although top speed is faster).
I also tried to disable all my post process, issue is still there but case 1 is the most common.

it is really similar to what’s described in this post, although I’m not using vsync at all (double checked code, and also forced vsync off with the nvidia control panel) nor SDL.

I also tried to put some glFlush()/glFinish() here and there in my code, without success.

A few specifications:
OS: Windows 7
CPU: Core I7 6700
GPU: Nvidia 560Ti
Drivers: 385.41
I’m stuck with this issue for months now, it’s driving me crazy:sick:

Thanks for any help/advice/question.

[QUOTE=Crashy;1288440]I’m having a really strange performance issue occurring during the first seconds of my app.
When launching the app, it’s a three cases scenario:
1)Some erratic frames at the very beginning, then 70+fps.
2)Some erratic frames at the very beginning, then ~12fps, then, after a few seconds, BAM, 70+fps.
3)Some erratic frames at the very beginning, then stuck at ~12fps.[/QUOTE]

Here are a few suggestions that might help you out:

[ul]
[li]Pin down the behavior to a specific feature of your engine, and turn everything else off (helps isolate what you’re doing at the GL API level that’s giving the driver constipation). [/li][li]Survey what’s left (texture updates, buffer updates, framebuffer binds, etc.). Disable these one at a time. [/li][li]Disable VSync and call glFinish after Swap (make the driver keep up with the work you give it; driver queue ahead can trigger blocks at random points in your frame and drive you completely nuts trying to track down bottlenecks; btw readahead is what “Max Prerendered Frames” seems to control, though with my suggestion you don’t have to care). [/li][li]Check/tune your settings in the NVidia Control Panel (recommended: Triple Buffering OFF, Threaded Optimization OFF, VSync Use App [or OFF], Profile ‘3D App - Visual Simulation’). [/li][li]Disable MSAA on the system framebuffer and your internal FBOs (MSAA consumes more memory and implicit downsample processing on glBlitFramebuffers and/or SwapBuffers that you’d like to exclude as a contributing cause). [/li][li]Plug in a GL debug message callback (glDebugMessageCallback; the driver may actually be trying to tell you what’s wrong, or at least give you a clue). [/li][li]Carefully monitor GPU memory consumption via NVX_gpu_memory_info (check at the beginning of your run and at the end of every frame; if evicted ever changes, you’re overrunning; if the numbers ever change after startup, you’re doing something to instigate that which could be related to your perf problem; also verify that you’re not even close to exceeding GPU memory). [/li][li]Are you using textures? Are you prerendering with them after upload? If not, you should be. [/li][li]Are you using large texture arrays? There are some really odd tricks to getting those to be allocated properly and to consistently perform well (problems similar to what you’re seeing). Ask if this applies to you. [/li][li]Are you updating buffer objects? There are some tricks there that are more standard across vendors. Ask if this applies. [/li][li]Avoid changing FBO configuration (can be expensive; there are some tricks to optimize this if necessary). [/li][li]Run GPUView to get a line on what the driver-GPU interface is up to under-the-covers (sometimes reveals very time-consuming uploads from the CPU/driver to the GPU going on that you might have thought should already have been done.) [/li][li]Verify that you’re not doing any driver readbacks (pretty obvious, but worth mentioning). [/li][li]Finally, whatever you find, if you’d like some insight on those findings, tell us in more details what what you’re doing and post some GL code snippets – that’ll help us provide more useful suggestions. [/li][/ul]

…When it’s rendering at ~12fps, some glDrawRangeElementsBaseVertex calls takes a lot of time, SwapBuffers calls too, and after switching to “cruise speed”, those calls are taking a more moderate amount of time

Yeah, the apparent time consumption in the draw calls and Swap isn’t too surprising. Much of the state validation and driver work happens here as I understand it.

I definitely empathize with your situation. I’ve diagnosed and fixed a number of such problems within engines running on the NVidia OpenGL driver, and it’s either a fun brain teaser or stressful and frustrating (depending on how much time you have) trying to get some clue that will point you to what “voodoo magic” inside the driver is going wrong, and then even more fun trying to figure out what you can do about it.

I don’t fault NVidia. Their driver quality is great! The fact is that OpenGL is much too high an abstraction over today’s GPUs to keep problems like this from “not” happening. There is so much black magic going on under-the-covers that most OpenGL developers don’t even know is going on. Can’t wait until Vulkan is ubiquitous, or we at least have much better insight into what the OpenGL driver is doing under-the-hood (e.g. through Vulkan-level debug info provided to the GL debug message callback to help the app developer track down problems like this).

Hi and thank youi for your reply.
I’ve finally found what was wrong: too much GPU-memory used !

[ul]
[li]I had some extra large (8k) textures I used to generate small sdf font textures, but never removed them after sdf generation.[/li][li]My shadow maps are packed into a single big GL_R32F atlas, and the depth buffer of this FBO is as large as the atlas, so it’s a huge amount of memory. Correct me if I’m wrong, but I think if I use GL_DEPTH_COMPONENT32F as a format I’ll reduce memory used by 2.[/li][/ul]
Sometimes I miss console programming where it just crashes when there is no more memory available.

The 560 ti has “only” 1gb memory, and if you add a few render targets + memory already use by the system (200MB on desktop here), the amount of allocated GPU memory is rising quickly.

Driver is indeed doing a good job, trying to manage the memory the best way possible !