PDA

View Full Version : Why SwapBuffers slows down the performance?



wolfman
01-10-2004, 01:02 AM
tried to measure performance on Cirrus Logic 2mb video card(PCI) and saw that function SwapBuffers itself(without any drawing) slows down performance terribly!!!.
It is the worst thing that can happen.
How can I optimize if it just one function call?
What can I use instead?

I have windows xp

Korval
01-10-2004, 02:19 AM
Turn off V-sync.

MikeC
01-10-2004, 05:27 AM
Originally posted by Korval:
Turn off V-sync.

And, ideally, refrain from double-posting to this forum and the OpenGL under Windows one...

BTW, disabling vsync is only really useful for benchmarking. You can't have more *visible* frames per second than your monitor has refreshes, so it just wastes time and causes ugly tearing for no good reason.

ZbuffeR
01-10-2004, 05:44 AM
>>>>BTW, disabling vsync is only really useful for benchmarking.

Not completely true.

>>>>You can't have more *visible* frames per second than your monitor has refreshes, so it just wastes time and causes ugly tearing for no good reason.

True when you have more fps than the refresh rate of the monitor.
But when you have less, it is much nicer for the eyes to turn off vsync, trust me.

I would really like a "half-vsync", which would wait only when drawing faster than refresh rate...

AdrianD
01-10-2004, 07:18 AM
wolfman, you are trying to measure the opengl performance of a 2-MB Cirrus Logic graphicscard ???
this is a joke, right ?
this card does not have any 3D acceleration, so you are benchmarking your CPU...
and your slowdown is simply because your backbuffer is in system memory, and it must be transferred over the PCI-bus every frame to a very,very,very old graphicscard with very very slow videomem.

orbano
01-11-2004, 05:04 PM
turning on y-sync can lower the "framerate" of other subsystems like physics and input. I run games @60hz, but feel the difference between 60hz and 120hz for example (mouse move is smoother and faster) (of course usually this applies on programs that require more graphical calculations than others)

ZbuffeR
09-20-2007, 01:02 PM
[WARNING: I just revived a very old thread]

Darn, I really should have patented or whatever my idea of "half-vsync" at the time !!

I would make millions by suing Epic about Gears of War lol :p :


There are hybrid solutions to VSYNC. Gears of War uses VSYNC whenever a frame takes less than 33ms to render and immediately displays the frame if it took more.

This means we VSYNC > 30 FPS (and hence clamping to 30 FPS) and don't drop down to ~20 FPS (32ms + 16ms) just because the framerate might be 29 FPS in rare cases.
Taken from :
http://forums.epicgames.com/showthread.php?p=24608843#post24608843

Zengar
09-20-2007, 01:08 PM
This will teach you :p

BTW, does anyone know, did Terry Welsh (mogumbo) actually patented his parallax mapping? About every game uses it now...

Mikkel Gjoel
09-20-2007, 02:00 PM
Excellent idea - so how would you go about implementing that in reality? Does it make sense to use wglSwapIntervalEXT on a per-frame basis?

ZbuffeR
09-20-2007, 05:45 PM
That is a good question ideed. There were talks (some months ago) about an extension or something to generalize NV fences that might allow to do that.
I can not find it anymore, if anybody has ideas ...

knackered
09-21-2007, 04:41 AM
can be done efficiently with a one frame latency...

if (lastFrameTime<16 &amp;&amp; !vysnc)
{
wglSwapInterval(1);
vsync=true;
}
else if (vsync)
{
wglSwapInterval(0);
vsync=false;
}
SwapBuffers();

Zengar
09-21-2007, 05:47 AM
Or similar code over the average of last N frames

ZbuffeR
09-21-2007, 07:49 AM
@Zengar: why averaging ? To the contrary you want to be as reactive as possible, else the result would be ruined by tearing.

@knackered: to be tested, but I would said the precision is not enough in your snippet.

The goal it to be _certain_ that when below target refresh rate, there is not vsync at all. From the CPU side, it is hard to know precisely when the GPU is done rendering (without doing flushes I mean).

Maybe this extension would help, but it looks like NV only for now :
http://www.delphi3d.net/hardware/extsupport.php?extension=GL_EXT_timer_query

I will have to test, but having a baby don't help to have time for that :) .

Zengar
09-21-2007, 11:53 AM
I thought that SwapBuffers included an implicit Finish? So you can just measure the FPS after SwapBuffers?

Lindley
09-21-2007, 01:48 PM
EXT_timer_query is usually pretty good about giving accurate timings for minimal slowdown. But it's not perfect. Adding glFinish() calls between queries can still affect some results.

Jan
09-21-2007, 02:05 PM
Maybe you could do a glFinish, then check how much time your frame already took and then decide to enable or disable vsync, before you actually call SwapBuffers:

t1 = time
...
frameupdate
rendering
...
glFinish
tdiff = time - t1
if tdiff < ...
EnableVSync
else
DisableVSync
SwapBuffers


Though i am not sure, whether that would actually work, at all.

Jan.

Overmind
09-22-2007, 02:49 AM
Yes, this would effectively reduce the wasted time in SwapBuffers to zero :p

Jan
09-22-2007, 03:22 AM
That was my thought. It would mean, that SwapBuffers would be doing nothing more, than actually swapping the buffer. However, it would either sync this to the monitor refresh, or it wouldn't, depending on the time it took to render the frame.

However, one might need to have a dedicated render-thread, since the thread itself will wait for SwapBuffers to return and thus would waste CPU-cycles.

Any other ideas? I am not really convinced by my solution, myself.

Jan.

tamlin
09-22-2007, 06:52 AM
Please note that SwapBuffers hasn't neccessarily "finished" all it should, as we may be led to believe.

I have a performance testing piece that has displayed a professional vendors implementation (one of the big two, with probably quite common h/w) sucking over ten million CPU cycles (whether due to spinning on a spinlock or actually doing work I don't know, and it really was closer to 13e6 IIRC, which @2.4GHz clock was over 5ms!) worth of time for glClear(color|z) after a successful swap unless I added a Sleep() for a while after swapping (artificial test, only to test the speed and overheads of the implementation). If the code did Sleep() between swap and glClear, the clear call "only" took ~13k CPU cycles IIRC (at least it wasn't anywhere close to even 1e6 CPU clock cycles).

I wrote this to display that what I have previously believed (and likely most of you) that calling glFinish or swapping has measured all of the time - it may not be true. In the case I observed, the time is somehow "amortized" until after the following glClear is completed.

(should there be interest in trying this code locally, if for nothing else to gloat "Haaa haaa, you made an error here!" :-), I could probably boil the source down)

Overmind
09-22-2007, 07:22 AM
That was my thought. It would mean, that SwapBuffers would be doing nothing more, than actually swapping the buffer.In case my sarcasm got lost: If you call glFinish, you actually empty the pipeline. This is practically the worst thing that can happen to your performance.

SwapBuffers won't take much time after glFinish because glFinish has already taken much more time than SwapBuffers would have. You are ruining overall performance just to be able to measure the frame time more accurately...

Brolingstanz
09-22-2007, 01:54 PM
Seems there's a variation of Heisenburg's principle at work here: the more precisely you try to measure something, the more intrusive the measurement necessarily becomes, and the more likely you are to adversely affect what you're measuring.

Simon Arbon
09-23-2007, 12:36 AM
I read somewhere (I will post a link when i find it) that the operation of Swapbuffers depends on the driver you are using.
Very old drivers did do a glFinish when Swapbuffers was called, but in order to increase pipeline performance most modern drivers put the swapbuffers command in the command queue and return immediately (ie. they DONT do an implicit glFinish).

Some drivers (NVIDIA?) then let you continue writing to the command queue and only block if you try to do a second swapbuffers before the first one finished, or if the command queue is full.

Other drivers will block on the next OpenGL command sent after swapbuffers, if the swap has not completed yet.

TAMLIN: This is why you are measuring 13k CPU cycles in the glClear, the driver returns from swapbuffers early so the CPU can do some non-OpenGL tasks while waiting for the queued commands to execute (and optionally the next VSync).
When you send the next OpenGL command, the swapbuffers is still pending, so the driver blocks your thread until the swap has completed.

glFinish should NEVER be called on modern hardware, it will destroy your performance by stalling the GPU.
The only way to accurately measure frame timings on a pipelined system is by using a fence.

V-man
09-23-2007, 05:56 AM
No, drivers should not call glFinish. That would kill parallelism between CPU and GPU.
When you call SwapBuffers, the driver pushes all GL commands to the GPU because it is imperative that the current frame be completed. This is a "glFlush", meaning empty the command queue.


Other drivers will block on the next OpenGL command sent after swapbuffers, if the swap has not completed yet.Not on a GL call. GL calls always go to a queue in RAM. They will bock on the SwapBuffers call.

Simon Arbon
09-24-2007, 01:27 AM
V-man: drivers should not call glFinish. That would kill parallelism between CPU and GPU.Thats what i just said.

V-man: When you call SwapBuffers, the driver pushes all GL commands to the GPU because it is imperative that the current frame be completed. This is a "glFlush", meaning empty the command queue. Yes, thats what i believe happens with all modern NVIDIA drivers, but i have seen various posts such as www.experts-exchange.com (http://www.experts-exchange.com/Programming/Game/Game_Graphics/OpenGL/Q_20971926.html?testCookie=true) or www.gamedev.net (http://www.gamedev.net/community/forums/topic.asp?topic_id=445562&whichpage=1&) which suggests that ATI drivers may work differently.
Tamlin's results certainly suggest thread blocking during a glClear.

TAMLIN - Are you using ATI or NVIDIA ?

The only way to prove how a various drivers handle swapbuffers is to do some profiling to see where the thread spends its time.

Mikkel Gjoel
10-05-2007, 05:42 PM
Just for the record - for my personal, immediate purposes, the following setup seems to work fine :)

if( dt>24) {
if(vsync) {
wglSwapIntervalEXT(0);
vsync=false;
}
}
else if( dt<12){
if(!vsync) {
wglSwapIntervalEXT(1);
vsync=true;
}
}
SwapBuffers();dt being the time between the start of the last render, and the start of the current - ie. one frame lag.

dorbie
10-08-2007, 01:48 PM
If you want to get 60Hz just as smooth as 120 Hz then reduce transport latency from input through to swap. All the rest about smoothness is B.S. By all means run your physics (collision really) at a higher rate if you insist (that might help for other reasons), but you will find that input quality is more a function of latency once you attain 60Hz (or whatever your monitor refresh rate is) than any other factor. Driving graphics at an even higher frame rate is the least efficient way of reducing transport latency, but it gives good benchmark.

There's an awful lot of ignorance promulgated about this subject. It seems to have percolated up from benchmark running gamers who live with a full frame of buffered draw data implemented by everyone in the industry to keep throughput up. Throughput is not everything and that should be obvious when people advocate running at a frame rate higher that the display can drive get quality interraction.

One day graphics professionals may actually start implementing the lessons from a quarter of a century ago.

For now the tail still wags the dog.

Hampel
10-09-2007, 12:18 AM
@Dorbie: but how do you implement such low latency with all this queuing and multi-threading in the graphics drivers?

dorbie
10-09-2007, 03:23 PM
You can time your input and runtime loop to minimize transport delay through your simulation (game code), particularly the POV, implement vsync while eliminating the buffering of a frame (which will make things worse if you don't) and that means blocking to drain the fifo with a glFinish.

Exactly where depends on the implementation details and you have choices like synching post clear at differing costs.

You can actually get really sophisticated about this and time input and draw kickoff relative to vsync, if you do this you may even avoid the need to block to drain the FIFO (something the card makers hate but too bad).

Using fences would be a useful way of managing this but ultimately I think you really do need to key of the vsync timer.

Note to driver writers who might chime in. I'm sure you have something more valuable to contribute than the usual "don't block and run balls to the wall" so stretch yourself before posting benchmarketing advice. Too many games are written and run as if they were benchmarks.

ZbuffeR
10-09-2007, 05:03 PM
Mikkel Gjoel: you are right, you snippet works very well !
This is really nice to the eye, even with black+white vertical stripes in variable high-speed horizontal scroll, my worst case as far as display refresh is concerned.
I don't get why your first threshold value works so well, but I could not get better results with a smaller value. However 85 Hz monitor would mean around 11-12 milliseconds, not 24 right ?

Jan
10-09-2007, 05:45 PM
I am not that familiar with the details of D3D, but in many games you can select a resolution bundled with a refresh-rate (e.g. 60 Hz or 85 Hz). Does this have anything to do with vsync? I mean, why would i want to select 60 Hz for my monitor refresh-rate, when the game could run faster than that? And why would a game want to set a different refresh-rate than what is set from the OS already?

Jan.

ZbuffeR
10-10-2007, 02:15 AM
Why ?
Because this refresh rate depends on the monitor, not on the game.
LCD are typically 60 or 75 Hz only.
CRT can usually do better, but it depends on the resolution, so it is nice to be able to choose it.
(the bundled list presentation is not the most convenient, I agree on that)
Having more than 60Hz is only useful :
- to reduce flicker (on CRT only)
- when you need a particular refresh rate, such as a multiple of 24 Hz to mimic original movies rate.
- to have even smoother animations in ultra high speed cases (the Quake1-3 games come to my mind) but as pointed Dorbie, other latencies come into play, such as the mouse events rate. In quake 3 you can select among 3 different mouse filters, (rougly: no filter, interpolation filter, or extrapolation filter)

Overmind
10-10-2007, 03:19 AM
and that means blocking to drain the fifo with a glFinishOf course, glFinish is not generally bad. But calling glFinish between rendering and SwapBuffers, with no simulation code in between, is about as bad as it can get in terms of performance ;)

knackered
10-10-2007, 03:39 AM
to get smoother frame rates, you should really be interpolating between the last frame and the current one using r2t and alpha blending.

Simon Arbon
10-11-2007, 01:09 AM
@Jan: Does this have anything to do with vsync? I mean, why would i want to select 60 Hz for my monitor refresh-rate, when the game could run faster than that? And why would a game want to set a different refresh-rate than what is set from the OS already?
There are 3 different effects that need to be accounted for: Flicker, Jerkiness, and Strobing.
Flicker is caused by the physical characteristics of the monitor, as the image is re-drawn on the screen it will be brightest at the most recently drawn lines and will have faded in intensity at other parts of the screen.
This mostly occurs with a CRT, and if the field rate is too low then the screen will seem to flash on and off.
The same effect occurs with movie film as the shutter blocks the light when the film is advanced to the next frame.
For most people a field rate of 50Hz is enough to prevent flicker (This is the minimum acceptable monitor refresh-rate for these types of displays)

Jerkiness is caused by the rate at which the human eye will perceive a series of still images as a continuous movement.
TV has a frame rate of 25 or 30Hz, while movie film only operates at 24Hz per frame.
This is enough to prevent the jerkiness that can be seen in very old movies (which had a 16Hz frame rate) or games run on hardware that isn't fast enough.

TV displays 2 interlaced fields per frame so it can meet the minimum required 25Hz frame rate and 50Hz field rate to prevent both of these effects, while movie film shows each frame several times.
Computer monitors are usually run slightly faster than this as it reduces eyestrain (and non-interlaced so field rate = frame rate).

Unfortunately there is another effect (Strobing) that effects computer games more than video.
Video cameras capture an image of a moving object during most of a frame period, hence it will be MOTION BLURRED.
It is this blurring of moving objects that prevents strobing.
With a computer animation however, we are generating a series of perfectly clear still images with no blurring, and even at 60Hz a moving object will look like its strobing (Similar to a disco with a strobe light flashing).
If you have a camcorder with a variable 'Shutter speed' feature, try filming a fast moving object with both the minimum & maxmum settings to see what i mean.

For most OS programs this wont happen so the monitor can be set to 60Hz, but when a game starts-up it may need to change this to 100Hz or more to prevent the strobing effect.
The only other way around this is to add your own artificial motion-blur when rendering.

A game should be run as fast as possible, but still locked to VSYNC, so if your maximum monitor frame rate is 100Hz (and the GPU can keep up) then that is what the game should run at.
If it finds itself skipping frames then it should switch to a slower frame rate that the GPU can keep up with.

Stephen A
10-11-2007, 07:03 AM
Thanks for the detailed post, an interesting read that clears many things up.

While this is veering off-topic, do you know of any papers that describe a (semi-) physically correct simulation of a camera suitable for realtime rendering?

dorbie
10-12-2007, 08:28 PM
Originally posted by Overmind:

and that means blocking to drain the fifo with a glFinishOf course, glFinish is not generally bad. But calling glFinish between rendering and SwapBuffers, with no simulation code in between, is about as bad as it can get in terms of performance ;) The problem with your observation is it focuses on a single performance metric. When you consider transport delay you might arrive at a very different conclusion.

That said there are ways to be smart about this. You just have to apply yourself to the problem. You'll notice I mentioned fences, you could use one to trigger input and POV update, but run all sorts of physics etc in the mean time, it really depends a lot on the details of your scenario.

dorbie
10-14-2007, 01:06 AM
Originally posted by knackered:
to get smoother frame rates, you should really be interpolating between the last frame and the current one using r2t and alpha blending. Unless you're talking about motion blur for intraframe acumulation I disagree.

When you have the next frame show it, anything else will hurt interraction and you can already see ghosting if you drive swaps at less than refresh, making that artifact even more persistent won't help.

FYI in the past when swap was below refresh rates I've implemented a dynamic video pan intra frame to smooth pitch and heading rates on an SGI infinite reality. It is not without its own drawbacks if you have moving targets in the scene.

knackered
10-15-2007, 07:31 AM
To be honest I was just going off my experience with a movie player I wrote. I blended the previous frame with the current one, with alpha derived from the fractional part of the movieTime* fps. I hadn't thought too much about how it would work in an interactive setup. Ever so sorry.

speedy
10-16-2007, 12:43 PM
From the Unreal Tournament 3 Demo .ini file, default settings:

[Engine.GameEngine]
bSmoothFrameRate=TRUE
MinSmoothedFrameRate=22
MaxSmoothedFrameRate=62

I wonder what they actually do, because the game action really seems smoother when it's enabled, and I was positively surprised by the responsiveness.

Benchmarkers recommend disabling that particular setting because it clamps the FPS.

ZbuffeR
10-21-2007, 05:08 PM
It seem to be the topic du jour for games, ETQW has something similar:
http://community.enemyterritory.com/forums/showpost.php?p=44596&postcount=1

And well, I must admit the default settings are smoother and input delay seems really reduced, compared to "render as much as possible + vsync".
Even the tearing is not so noticeable.

Prune
02-17-2010, 04:46 PM
Just for the record - for my personal, immediate purposes, the following setup seems to work fine :)

if( dt>24) {
if(vsync) {
wglSwapIntervalEXT(0);
vsync=false;
}
}
else if( dt<12){
if(!vsync) {
wglSwapIntervalEXT(1);
vsync=true;
}
}
SwapBuffers();dt being the time between the start of the last render, and the start of the current - ie. one frame lag.

Guys, is this still helpful if triple-buffering is used?

Thanks

Prune
02-18-2010, 02:52 PM
Hello?

ZbuffeR
02-18-2010, 03:11 PM
it is different.
personnally i don't like triple buffering.

Prune
02-18-2010, 07:17 PM
Hi ZbufferR,

What don't you like about triple-buffering?
Does it make sense to use the methods in combination?

Also, what about interaction between those and the "Maximum pre-rendered frames" setting in the NVIDIA control panel applet?

Prune
02-24-2010, 04:38 PM
... any comment?