Endless loop of bitmaps // vsync fails

Dear all,

This is an updated version of my question in the beginners forum (#post1251826) about how to load bitmaps on the GPU before displaying them in an endless loop at exactly 60 Hz.

It was suggested to include vsync functionality to make sure the program waits for the next vblank, which I have done:

void setVSync(int interval=1)
{
  const char *extensions = (const char *)glGetString( GL_EXTENSIONS ); 
  const char sub[] = "WGL_EXT_swap_control";
   if( !strstr(extensions,sub) == 0){
     wglSwapIntervalEXT = (PFNWGLSWAPINTERVALFARPROC)wglGetProcAddress( "wglSwapIntervalEXT" );
     if( wglSwapIntervalEXT )
      wglSwapIntervalEXT(interval);
   }
}

Also, vsync is enabled through nvidia’s control panel.

When I test the software on two single images (one black, one white), it succesfully loads and displays BWBWBWBW… in a loop, but at (seemingly) random intervals, glitches occur in the form of a frame being displayed twice or not at all as BWBWBBW… Unfortunately, for my application, it is crucial that they are ALL transmitted at EXACTLY the maximum refresh rate of the monitor (which is 60Hz).

I have been struggling with this for days, if anyone knows why this could be or has any suggestions for other paths to take ( timer functions?), please let me know!

Thanks in advance,

Sam

PS: if any part of my code could enhance understanding of this behavior, please ask.

  1. Try using three or more images so you know whether it’s doubling a frame (ABCABBC…) or skipping a frame (ABCACABC…); with two images, you can’t distinguish these two cases.

  2. Determine the frame rate without vsync enabled.

Possible solutions depend upon the information gleaned from the above two points.

If it’s skipping a frame, then vsync simply isn’t working for whatever reason. I would expect it to be more robust in full-screen mode than displaying in a window. Ensure that “triple buffering” isn’t enabled in the driver’s control panel.

If it only runs marginally above 60 Hz without vsync, you need to get it to run faster so that you have some headroom. For such a simple task, 10x or even 100x that rate is feasible with modern hardware, but use of legacy features may result in a fallback to a suboptimal approach or even software rendering.

If it’s occasionally doubling a frame, but it normally runs well above 60 Hz without vsync, you need to look into whatever support Windows has for real-time scheduling (if it was Linux, I could offer some advice, but I’m not familiar with that aspect of Windows).

Thanks for your detailed reply.

First, I have determined the framerate with vsync disabled in the nvidia control panel. My application was benchmarked with fraps (windows FPS software http://www.fraps.com/) to run at approximately 340-350 Hz.
I would think this means the software should be able to run fast enough for a stable 60 Hz stream?

Next, I benchmarked the software with vsync enabled. The software runs relatively stable at 60 Hz on my 60 Hz monitor, with the occasional 59 Hz or 61 Hz pike. I guess this is the source of the glitch I see? Does this mean vsync doesn’t work properly?

Finally, I tried to find out whether it’s doubling or skipping a frame as you suggested. I did this by displaying a sequence of grid images that have been shifted out of phase slightly between each frame. When displayed consecutively, they form a moving grid video.

A full period of such a grid was chosen to be 60 pixels wide, and the grids are shifted one pixel relative to the previous frame. What I saw was the occasional stutter in the image loop (hard to tell at 60 fps but I would say it doubles a frame rather than skips one). Also, some tearing could be noticed occasionally (The top part of the next image was already displayed)…Is this a vsync issue?

Any thoughts?

If your CPU application presumes that it is running synchronized with VSYNC, you must call glFinish() after SwapBuffers (wglSwapBuffers on Windows, glXSwapBuffers on Linux) in order to force the CPU to wait on VSYNC.

Without a glFinish, the driver will merily queue up the swap request, return immediately, and let you start processing the next frame on the CPU if you want. While the GPU won’t actually do the swap until vsync, your CPU will continue on after the swap before vsync happens.

Adding glFinish() forces the driver on the CPU to block until the SwapBuffers is actually executed on the GPU, which if sync-to-vblank is enabled, is probably when the next vsync rolls around.

I say probably because if your swap request requires some driver-internal processing before the swap can occur (e.g. MSAA downsample sampling and filtering), then that has to happen first. Only after its really ready to do the swap internally will it start waiting for the next vsync to do the deed. In other words, internal swap overhead may cause you to blow a frame if you’re on the hairy edge and result in you missing a vsync to catch the next one.

Another possibility beyond CPU-vsync-synchronization is that your app processing may be blowing frames (causing your app to overrun a frame and miss a vsync). But that is easy to identify with CPU-side timers. Also put timing calipers after the glFinish after SwapBuffers and look at deltas between these.

If enabling vsync causes the framerate to change from 350 Hz to 60 Hz +/- 1 Hz, then vsync is working.

This suggests that the actual buffer swap is being delayed by a variable amount after vsync.

To get “proper” double-buffering (aka buffer flipping), the entire screen needs to be double-buffered, not just a window. DirectX has a notion of “full-screen exclusive” mode, but I don’t know if this is available to OpenGL, or how it would be enabled (certainly, it won’t be available if you’re using a normal window). You could try different combinations of PFD_* flags in the ChoosePixelFormat() call.

I have included glFinish(); in my display function, to no avail: the glitches/horizontal tearing still occur. I have also included the timers as proposed by Dark Photon. My display function now looks like this:

void display() 
{   
    g_current_frame_number = g_current_frame_number + 1;
	glClear(GL_COLOR_BUFFER_BIT);
	glBindTexture(GL_TEXTURE_2D, texture[current_image]);
	//printf("
 %i",texture[k]);
	// we use white, so all colors of the texture are used
	glColor3f(1.0f, 1.0f, 1.0f);
	glBegin(GL_QUADS);
	// draw one quad (rectangle) with the texture	
	glTexCoord2f(0.0f, 1.0f); glVertex2f(-1.0f, -1.0f);
	glTexCoord2f(1.0f, 1.0f); glVertex2f( 1.0f, -1.0f);
	glTexCoord2f(1.0f, 0.0f); glVertex2f( 1.0f,  1.0f);
	glTexCoord2f(0.0f, 0.0f); glVertex2f(-1.0f,  1.0f);
	glEnd();
	
	// sets next image to be used, if out of images, go back to 0
	current_image++;
	if(current_image >= n)
	{
		current_image = 0;
	}
  
 glPopMatrix(); 
 glutSwapBuffers(); 
 glFinish();
 new_time = glutGet(GLUT_ELAPSED_TIME);
 delta_time = new_time - old_time;
 printf("
time_elapsed = %i",delta_time);
 old_time = new_time;
}

with Vsync enabled, delta_time amounts to 16 17 17 16 17 17 ~= 16.66667 as expected.
with Vsync disabled, delta_time is around 2 ms, with the occasional 1 or 3 ms.

What would you deduct from this? The numbers seem normal to me…

Another thing I noticed, not sure if worth mentioning though, is that when the window in which the sequence is displayed, is enlarged, the tearing becomes increasingly worse.

(I haven’t played around with the suggestion to play around with the PFD_* flags in the ChoosePixelFormat() call, not sure where to start with this but I’ll have a look.)

I would deduce that vsync is working, in that it won’t perform two swaps within the same frame, but that the swap doesn’t occur immediately.

That’s to be expected. If the swap (well, it will be a copy if you’re running in a window) doesn’t occur during the vertical blank period, then you’ll get a tear at the raster position at the time the copy occurred, provided that position was between the top and bottom of the window. The larger the window, the more likely that the raster position will be in the window (if the copy occurs after the raster position has passed the bottom of the window, you won’t get a tear but you will get a duplicated frame).

In full-screen exclusive mode, there isn’t a copy. The driver maintains two complete framebuffers and swaps which one is sent to the monitor during the vertical blank period. Some information about doing this the hard way can be found here and here. If you’re using GLUT, see glutGameModeString() and glutEnterGameMode() (glutFullScreen() isn’t necessarily “exclusive” full-screen, whereas “game mode” has to be, as it allows you to specify screen resolution and refresh rate).

Ok great!

Entering game mode has made the frame sequence a lot more stable (sometimes glitches - but no tearing- occur right after I start the display, after a while the loop seems to run perfectly! Can’t say with 100% certainty as my eyes are tired from all the staring at the moving grids.)

To summarize, the reason this worked was by going into an “exclusive full-screen mode”, enabling the operating system to use double buffering? Does this mean windowed displays can never be double buffered, which is what caused the glitches? Does this also mean that this trick always requires a dedicated PC system with gpu to “fuel” a monitor (no second monitors as this would disable double buffering)?

Thanks again for your advice,

Sam

They “could” be double buffered, but usually aren’t. The situation may be better with compositing systems such as Aero (Windows) or Compiz (Linux).

It shouldn’t matter whether you have other monitors, so long as the application in question has exclusive use of one of them. Games normally use full-screen exclusive mode, and it’s fairly common for development systems to be dual-monitor so that you can run the game on one monitor and a debugger (and/or similar tools) on another one. I don’t know how well GLUT handles this situation; GLFW has explicit multi-monitor support.

[QUOTE=GClements;1251946]

Does this mean windowed displays can never be double buffered
They “could” be double buffered, but usually aren’t. [/QUOTE]

What’s your source for that information?

The situation may be better with compositing systems such as Aero (Windows) or Compiz (Linux).

In my experience, compositors make this worse, as you’re no longer directly in charge of when rendering and swapping happens (the compositor is). I always do realtime rendering with compositors disabled; avoids this problem entirely. Without a compositor enabled, I have never had problem getting a HW accelerated double buffered window that behaves as such, whether windowed or full screen.

Some compositing desktops have an option to disable the compositor when you’ve got a window running fullscreen. Without a compositor, it just works and you don’t have to care.

Experience. Note that in this context, “double-buffered” refers to hardware buffer-flipping.

So why doesn’t all this work using Single Buffering, with vsync enabled? It seems the rendering time is close to 2-3ms, which is fast enough to have the next image to be displayed ready in time. Is it because there isn’t a second buffer ready to swap “immediately”? Or how does this work?

Seems strange to me to build any application with single buffering then?

Vsync is only meaningful in conjunction with double-buffering.

With single-buffering, you’re drawing directly to the displayed screen, so the screen changes at the point that the rendering command is issued, which will be whenever the OS decides to give your program a time slice.

With double-buffering, all rendering is performed to an off-screen buffer. This is either copied to the screen (when running in a window) or becomes the screen buffer (when running in full-screen exclusive mode with buffer-flipping) at some point after SwapBuffers() is called.

Without vsync, the copy/flip may occur as soon as SwapBuffers() is called. With vsync, it should be delayed until after the start of the vertical blank period. With buffer-flipping, it should happen within the vertical blank period. But if it uses a copy, that may be just another command in the graphics queue.

Single-buffering can be useful if you have exclusive access to the screen and you’re performing incremental updates (i.e. only drawing the portions which have changed).

If you aren’t displaying continuous animation but only redraw in response to events, and redraw is fast relative to the frequency of events, the expense of a second buffer may not be justified on systems with limited video memory (which may have only been a megabyte or two when OpenGL was originally conceived).

Great explanation, thanks to both of you for the clarifications.