OpenGL Context creation time on Win32

An app I’m working on has entered the polishing stage, and one of our goals is to cut down startup time to less than 700 msec, but OpenGL is causing us to hit a glass ceiling due to context creation times - fastest I can get context creation (2.x) on Windows (7, x64 if that matters) is 425 msec. First of all, is this normal or am I doing something drastically wrong? And if it IS normal, don’t you guys think it’s a HUGE amount of time? My dev machine (where those results are from, warm startup) is a Core 2 quad 2.0 with a GTX260M 1 GB. Sampling leads me to believe that ChoosePixelFormat is where the most time is being spent, along with a mysterious wglNumHardwareFormats (Google has no results for the latter).

Any tips on reducing the context creation time? Am I really hitting the ceiling there?

On a core 2 duo 1.86GHz i get:
ChoosePixelFormat = 978ms
SetPixelFormat = 148ms
CreateContext = 2ms
MakeCurrent = 5.7ms

Total startup time is 1684ms, so the ChoosePixelFormat call uses 58% of one core.
The second core is loading 5 DLL’s and reading a data file at the same time as the first core is in ChoosePixelFormat.
I have also noticed that the call to read the extensions string can take a very long time unless i suspend my other threads, giving the driver exclusive use of both cores.

Perhaps you could avoid calling ChoosePixelFormat every run by storing the PixelFormat number that was used in the previous run and using DescribePixelFormat to see if its suitable.
Then you only need to call ChoosePixelFormat when something changes, such as a driver upgrade that adds new formats.

Is it just me or does this amount of time spent feel ridiculous? Can you see how much of this is actually spent inside driver DLLs and how much is spent in Windows DLLs?

I believe that an entire second is too much time to create an OpenGL window, and our guidelines specify that it’s an unacceptable amount of time for app startup. Are the Windows developers being incompetent, are the driver devs lazy or are there technical reasons for this?

PS: I’m not creating a full-screen window. I understand that a mode switch is time consuming and that this is unavoidable, but creating a windowed context shouldn’t consume that much time…

After installing the latest driver version on this computer (NVIDIA 258.96) the ChoosePixelFormat time is now 200-250ms.
But if i stop all my other threads first, so that this call has exclusive access to both core’s, then this drops to 120ms.

I traced what it is actually doing and found that ChoosePixelFormat calls DescribePixelFormat repeatedly and the FIRST time DescribePixelFormat is called it loads several DLL’s and initialises OpenGL.
ChoosePixelFormat/DescribePixelFormat is only taking 30ms itself, the rest of the time is the OpenGL initialisation time.

I also found that immediately after rebooting the computer the times to create a window, call ChoosePixelFormat and create a context all were more than doubled, probably because several required DLL’s had not yet been loaded into memory.

The only way to make this faster is to have the application startup automatically at boot time with a hidden window and go into an idle state until needed.
This is what the tray applications do, so you could make a tray application that pops up instantly when someone clicks on the icon, by pre-creating the OpenGL window and hiding it until needed.

Even on a warm startup, I find the ~450ms to be a lot. Here’s a copy of my sampling trace:

http://www.mganin.com/gl/trace-censored.png

Simon, while your suggestion does fix the problem, I don’t think it’s a wise choice from an end user experience perspective - I don’t wanna fall victim to the “I’m writing the most important application in the world” syndrome, and I definitely wouldn’t want to be the person responsible for slowing down people’s boot times.

Any ideas?..

If you simply set the process or thread priority of the startup code to a very low value,
ie. SetPriorityClass( hProcess, IDLE_PRIORITY_CLASS );
or SetThreadPriority( hThread, THREAD_PRIORITY_IDLE );
then it wont execute until the boot process is complete, so it wont have any effect on the boot time.
After the boot completes it can then change its thread priority back to normal, setup the (hidden) OpenGL window, and suspend itself until the event occurs that triggers it to activate.

Why does it have to be so fast anyway? What causes it to popup on the screen and start rendering? Does it really need to start rendering less than 700ms from startup? or just show a window in 700ms and the actual rendering could start a second later?

With software, particularly commercial, first impressions are the most important impressions. If your software takes several seconds of total unresponsiveness in which it locks up other programs to start up, you lend the impression that your application is a slow, bloated piece of software even if it’s really quite fast.

Considering how long it takes most software to start up, it should not be hard to look fast.

You can display a normal windows window in much less than 100ms, and that is fast enough to look instantanious.
This could just be a splash screen that displays for a second or two until your OpenGL window is ready, or even better, display a menu or some buttons so the user has something to do and doesn’t need to wait for the rendering to start at all.

As long as something happens in the first 100ms and the program is ready to use in a few seconds then i would certainly call it “fast”.

I must go now, i started a game a few minutes ago and should just have enough time to make a cup of coffee before it finishes loading…

NeXEkho, I couldn’t have said it better.

Simon:

  • Displaying an empty window can happen in less than 100ms, but what use is an empty window?

  • Splash screens are a non-solution to a problem that was probably invented by a marketing oriented retard. It IS common, but The Right Thing is to start fast, period. When it takes my PC more time to start a mail app than my iPhone, someone fucked up big time. Outlook, I’m staring at you.

  • How do I display that button, assuming such button is useful at that stage of the app, before the rendering context is even created? If I do it via native Win32 API, what happens to it when the window is created? Any solution that induces flickering is unacceptable, as the punishment for that here is having to live with a constantly on strobe light till the bug causing the flickering is fixed. I don’t like strobe lights.

I still find it hard to believe that a warm-startup context creation needs an entire half a second, without a mode switch. I want to say that someone along the call stack is incompetent, but I was hoping to get some more info about whether there’s a valid technical reason for such performance before I start pointing fingers.

Just do 2 windows, the first one appears in 30ms and lets the user choose ‘new file’/‘open file’ or some option relevant to the particular application like a login name or a password (or just show an open-file dialog).
While the user is dealing with that then the OpenGL window can be initialised and is ready to be made visible as soon as they press a button.

If you want an antialiased pixel format then you need 2 windows anyway, the first one to get a context so you can call wglChoosePixelFormatARB to set the pixel format for the real OpenGL window.

As to why it takes so long to initialise OpenGL?
The fact that your ChoosePixelFormat on 64-bit Windows 7 takes twice as long as my 32-bit XP indicates that its probably a windows OS problem rather than a driver problem.
It may be the new Vista/Win7 driver model that slows it down.
Why is it surprising? Everything Microsoft has ever written has been incredibly slow and riddled with bugs.

Before pointing fingers though,
Are you using a recent driver (257.15 to 258.96)? These seem to initialise much faster than older versions.
do you have at least one free core on the processor when you call ChoosePixelFormat ? (ie. no more than 3 threads active on a quad core).
Have you tried running this on a machine with a fast flash drive? The delay could be caused by reading DLL’s into memory from the hard disk (Although if you run the program twice in a row then all required DLL’s should already be loaded).
Have you tried it with and without aero compositing turned on?

The fastest i have ever been able to get my program up and running is just under 1 second (not counting loading textures from disk to the GPU which can take a long time if you have a lot)
But sometimes it takes more than 5 seconds simply because most of the needed DLL’s have been unloaded from memory to make way for a memory hogging program that i ran prior to mine.

I was on a pre 257. Upgraded to 257.12, time is halved! Thanks a mil Simon, that was indispensable.

And while I agree about Microsoft’s code being mostly clueless fucktardness, NTDEV seems to be an exception: their code is always stable, fast and neat; and they’re the ones who I assume will mostly be involved with WDDM.