PDA

View Full Version : OT: C++ peformance timers.



LostInTheWoods
10-24-2002, 11:44 AM
I am looking for an open platform way to check the time in very small intervals. Basicaly, im looking to create a effiecent FPS counter for my engine, (among other things) and im looking for a timer that will run on ANY OS, so i dont have to mess with platform code (the reason I use OpenGL) Thanks

jwatte
10-24-2002, 12:38 PM
There is no such thing as a platform independent time function, except for time() (which is only "seconds" accurate).

ftime() is somewhat more accurate and a little bit portable. However, it suffers from the general TickCount() drift problem, caused by ISA interrupt controller legacy madness AFAICT.

The RDTSC instruction is portable to all user-mode x86 operating systems, assuming you can find a portable assembly syntax (I like NASM for that reason). Of course, then you're faced with trying to figure out the CPU speed, which is, uh, "hard" on SpeedStep chips.

What I would do is to define an abstract interface that returns time in some useful unit (say, seconds as a double) and then use #ifdef for each platform.

Note that there isn't really any such thing as portable code, only code that has been ported :-)

kansler
10-25-2002, 12:19 AM
What about the clock() function?

Rob Fletcher
10-25-2002, 01:53 AM
eh?

So, on my system ...

1% man clock

NAME
clock - analog clock in a window

SYNOPSIS
/usr/sbin/clock

or ...

DESCRIPTION
CLOCK obtains the current time, in ASCII hh:mm:ss format, from the
real-time clock


or ...

SYNOPSIS
#include <time.h>

clock_t clock (void);

DESCRIPTION
clock returns the amount of CPU time used since the first call to clock.


and ...BUGS
The implementation of clock conflicts with the definition of the routine
found in the ANSI C Standard. The discrepancy will be transparent,
however, so long as programs which adhere to that Standard use the
difference in two invocations of clock for timing information, as
recommended by the Standard.


clock() probably not very portable?


Perhaps???
Really what you need to do is NOT to look for a high precision timer so you can work out the time per frame. BUT, simply use a timer which will give you "second accuracy" (and I mean accuracy, within a couple of msecs I expect (unless you really are looking for frame rates in excess of 1000 fps!!), and then count the number of frames displayed per second.

Simple implementation would be to have a your counter in your display function, and fire off a timer every second which read the count, and reset it, and lo, and behold, a reasonable estimate of FPS.

Mind you, I really don't understand why people get so hung up on FPS? Surely, what you need is a FPS that gives you the smooth motion etc you require for your applications.

If you don't get this, then you need to be looking at your code to see what can be optimised to give better perceived visual performance.

An FPS reading perhaps is only a "debug" function! or a benchmarking thing.

Rob.

kansler
10-25-2002, 03:27 AM
I was talking about the function clock() (mind the parentheses), not a program. If an OS has an ANSI C library it also has the clock() function because this function is defined in ANSI C. So don't say it ain't portable.

LostInTheWoods
10-25-2002, 03:35 AM
Ok so your idea is this, count the number of frames displayed each second. I could do this with the ANSI C time functions. Then based on how many frames in that second adjust what needs adjusting. Right? Sounds liked a plan.

jwatte
10-25-2002, 09:10 PM
Lost,

Unfortunately, with only one second of granularity, the precision of your frame rate measurement goes *DOWN* as your frame rate becomes *LOWER*, because the clock may have ticked over to the next value at any time during the previous frame.

I e, the maximum error of your sample period is one entire frames' worth of time (as long as each frame takes less than one second).

GPSnoopy
10-26-2002, 03:39 AM
Personnaly I use QueryPerformanceCounter() under Windows and gettimeofday() under Linux.

IMO they're the most precise timers. But they have to be manipulated correctly, otherwise you might easily lose the extra precision you gained. (e.g. don't convert the 64bits value of QueryPerformanceCounter directly to a double.)

dorbie
10-26-2002, 07:23 AM
Simply counting fps is not enough. You need to measure the deltatime each frame to keep ingame physics and animation updating at a consistent rate. Distances traveled should track the elapsed time as much as possible, that's pretty fundamental for any 3D game and it needs to be reasonably responsive to short term variations in frame rate due to instantaneous graphics load.

jwatte
10-26-2002, 12:10 PM
GPSnoopy,

QueryPerformanceCounter() (and any high-resolution timer dependent on the same data) has stability problems where it will sometimes take large jumps forward (or back) in time.

I've noticed this bug in pretty much any chipset in the last few years; I think the list in this Microsoft Knowledgebase link is not exhaustive:
http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q274323&

GPSnoopy
10-26-2002, 12:52 PM
My chipset is in that list (i440 BX). But I've never seen this problem, although I heard a lot about it.

I found out most timer problems I had were related to precision mistakes, not hardware bug.

jwatte
10-26-2002, 06:12 PM
We have a test size of around 50 machines, and we'd see it once a week or so.

mattc
10-28-2002, 06:56 PM
thanks a lot for queryperf... info, i wasn't aware of that http://www.opengl.org/discussion_boards/ubb/smile.gif i'm actually shocked that there's a hw bug of this sort:

the "performance counter" is an age-old timer chip, available since earliest isa pc's - if you've done hw timer interrupt programming under dos (pre-win9x days), it's that same programmable timer at max 1.193MHz frequency (== 0x1234dd) on ports 0x40 - 0x43...

...which is also the reason why c runtime clock() function has crap resolution (18.2 ticks per second): it relies on bios which, on bootstrap, programs the timer to use the max clock divisor of 65336 (for minimum cpu load). 0x1234dd / 0x10000 = 18.2 clock ticks per second...

so there's basically no good way to get timer services from c runtime (that's suitable for games or midi sequencers), a bit like printer or networking services. my advice is to write a timer class to handle os specifics, then your code can use the class and you only need to port the timer class for your target platform.

[edit]i noticed that queryperf... calls incur an unreasonable cpu load, presumably cos you're asking for info that ultimately has to be obtained from hardware ports, which is a virtualised ring 0 (kernel) resource. not that big a deal on its own, but coupled with that hw bug above makes me think of the rdtsc instruction. my problem with that opcode is (a) how can it be reliable under a multithreading/multitasking os: programs that use this instruction regularly misreport my cpu speed, and (b) for reasons i can't recall right now, it involves using timeGetTime() over a 500ms interval to get cpu clocks per seconds... how can you trust a timing mechanism based on measuring with a different mechanism? help http://www.opengl.org/discussion_boards/ubb/wink.gif

[This message has been edited by mattc (edited 10-28-2002).]

GPSnoopy
10-29-2002, 06:12 AM
Well, RDTSC works well with QueryPerformanceCounter() to get the CPU frequency (in less than 100 ms).

But then it's the same problem, u use another timer to initialize yours. http://www.opengl.org/discussion_boards/ubb/wink.gif

LostInTheWoods
10-29-2002, 06:26 AM
Well my current setup takes the physics into account also. This is what I am planning on doing. I am going to set a variable called (TimeStepAverage).

Now my rendering code, and my physics code go hand in hand, Meaning that the physics code is fired each frame. ( I know thats expensive, but on todays hardware i think its doable). So what i do is this, I base my rendering and physics code on a set time of 60FPS.

So if the player move function is something like.

PlayerCtr = PlayerCtr + Move;

Move is based on a speed of fire at 60 times per second, of if i wanted him to move 60 feet in one second, the move would equal 1 foot. Got it?

Ok from there i simply add this

PlayerCtr = PlayerCtr + Move*TimeStepAverage

LostInTheWoods
10-29-2002, 06:28 AM
Well my current setup takes the physics into account also. This is what I am planning on doing. I am going to set a variable called (TimeStepAverage). Which is the number of frames fired that second, devided by 60 (the target number of frames).

Now my rendering code, and my physics code go hand in hand, Meaning that the physics code is fired each frame. ( I know thats expensive, but on todays hardware i think its doable). So what i do is this, I base my rendering and physics code on a set time of 60FPS.

So if the player move function is something like.

PlayerCtr = PlayerCtr + Move;

Move is based on a speed of fire at 60 times per second, of if i wanted him to move 60 feet in one second, the move would equal 1 foot. Got it?

Ok from there i simply add this

PlayerCtr = PlayerCtr + Move*TimeStepAverage;

So if the number of frames this second was 120, the move is scalled by 1/2. Making the physics just as accurate as at 60, but allowing the frame rate to be independant at the same time.

Also if the frame rate dropped to 30, the move would be multiplied by 2, making it still accurate, and non frame based. Does this sound feasible?

pATChes11
10-29-2002, 10:43 AM
But this is the problem: you may get 61 FPS, or 1099.0738 FPS, or any other number of frames per second. You cannot possibly ensure that you will get whatever number of frames per second you so desire. That's why knowing your exact FPS is important; it also just so happens to be a pretty good indicator of system performance.

zed
10-29-2002, 10:59 AM
u can also run a fixed time loop (this is what i do, ie all that crap happens at 30fps no that*dt + that*dt)
there was a TOTD/COTD from about a year ago on flipcode about it.

zeroprey
10-29-2002, 11:46 AM
Originally posted by jwatte:

QueryPerformanceCounter() (and any high-resolution timer dependent on the same data) has stability problems where it will sometimes take large jumps forward (or back) in time.


Whats the alternitive for windows machines? I was using this and was unaware of this problem.

marcus256
10-29-2002, 01:03 PM
If you want a portable, high resolution timer class, have a look at glfwGetTime in GLFW (http://hem.passagen.se/opengl/glfw) . It tires to use the best timer available, in these orders:

Windows:

1) RDTSC
2) QueryPerformanceCounter
3) GetTickCount

Unix/Linux:

1) RDTSC (x86) or CLOCK_SGI_CYCLE (SGI stations)
2) gettimeofday

There are still things to be resolved. For instance:

- Disable RDTSC on "laptops" (i.e. CPUs with variable core frequencies)
- Make CPU frequency determination more robust (for RDTSC, especially under Unix where we can't SetPriorityClass( ..., REALTIME_PRIORITY_CLASS ) )
- Add support for Sun gethrtime (better than gettimeofday)
- Handle 64-bit wrap arounds



[This message has been edited by marcus256 (edited 10-29-2002).]

marcus256
10-29-2002, 01:32 PM
Ok from there i simply add this

PlayerCtr = PlayerCtr + Move*TimeStepAverage;

So if the number of frames this second was 120, the move is scalled by 1/2. Making the physics just as accurate as at 60, but allowing the frame rate to be independant at the same time.

Also if the frame rate dropped to 30, the move would be multiplied by 2, making it still accurate, and non frame based. Does this sound feasible?

No... The idea is correct, but the problem you will most certainly encounter is that the true "FPS" will sometimes vary greatly within a second (or whatever averaging interval you choose), which will result in very strange, jerky, movements.

You really need to change TimeStepAverage to TimeStepThisFrame, and then you actually have the "dt" method anyhow.

There are two ways to do this:

1) Constant frame rate (locked with VSync and/or whatever clever timing loop) - compensate physics etc if you "loose" one or more frames (you're not fast enough to meet the frame deadline) by "ticking" twice or more the next frame

2) Variable frame rate - measure the time it took to render the last frame, and give it your best shot and guess that the next frame will take the same time to render

In my opinion, the latter method is the simpler one (e.g. it's not possible to guarantee VSync on, nor easy to set monitor refresh rate to a suitible value based on the performance of the target system).

Note that even with method two, you may need to do several "ticks" for one frame, since if dt gets to large, physics and collision detection will become unstable.

zeckensack
10-29-2002, 01:43 PM
Originally posted by marcus256:
<...>
- Handle 64-bit wrap aroundsThey don't matter if all you want is delta t. Unsigned subtraction eliminates wraparound ... almost.

64bit is about 100 years @ 5GHz. If you manage to need a time delta larger than that, it'll wrap around to zero. But I don't really think it'll hurt your animation much http://www.opengl.org/discussion_boards/ubb/wink.gif

(even one year @ 500GHz still seems good enough to me)

marcus256
10-31-2002, 02:15 AM
>> Handle 64-bit wrap arounds
> They don't matter if all you want is delta t. Unsigned subtraction eliminates wraparound ... almost.

Yes, I realized that too yesterday (havent thought much about it - just assumed "64 bits is plenty"). I think I'm on the safe side, but I may have to look through my code just to see that I don't do any foolish things.

Rob Fletcher
10-31-2002, 04:27 AM
Would this work then?

Hang your display function off a timer which fires at e.g. 20ms to give you 50 FPS (or whatever to get the frame rate you need)

Set an idle function to compute your physics as required for the next frame. Set a flag to stop the idle loop re-calculating.

In the display func, draw the graphics, and then reset the physics flag to let the next idle loop calculate the next step.

btw: I did notice the () on clock(), just was pointing out that not all solutions are windows, and not all unices are the same either! Was being a little "humourous" ... sorry if it wasn't taken that way!

Rob.

LostInTheWoods
10-31-2002, 05:13 AM
I think i will just stay with my current method. All this talk of performance timing is making my head hurt. There realy should be an easier way, but i guess not. lol.

Current method. I have married the physics, collision detection, movement, graphics EVERYTHING togeather (seems logical anyhow).

I go though my scene, render loop physics loop, etc; I then get to the bottom of my loop and call a Timer function. Based on the gluttimerfunc. It simply tells me if it has been more than 16ms since my last frame. If not, it waits till then to fire the next loop. If so, it fires the next loop. This way I NEVER get out of since with my physics, everything moves at an exact speed. Only problem is, if a PC is TOO fast, it will spend alot of time waiting, but at 60fps, i dont much care about that. As long as it dosnt drop below 60, cause then the movement will get jerky.

I was thinking about also putting in a fall back mechanism. Basicaly, If i based all my moements on 60fps. And i did a timer check (using the c library timer, cause I only need second to second accuracy to check for how many frames per second). I would simply check every 5 seconds or so, what the average frame rate was. If the frame rate was TOO low, i would simply drop my frame set from 60 to 45, and so on. This way the picture may become a little choppy, but the physics and player movement wouldnt. Make sense?

pATChes11
10-31-2002, 12:51 PM
Only to the developer :P

I prefer using the previous frame's time, because it doesn't really matter at all if your framerates don't fluxuate more than, oh, maybe four to eight percent of your FPS..? I dunno, that's just a guess.

zeckensack
10-31-2002, 02:21 PM
I don't think it matters much. By virtue of double or triple buffering and hardware pipes you'll be a bit behind anyway.

It is important that all time spent is measured and applied to physics. Which means that if your animation runs for ten minutes, you'd better let your physics code see ten minutes (with the notable exception of slow motion and other time base 'effects').

I don't think that delay tactics or sparse time sampling can stand this test. Accuracy, and exactly one sample per frame, that's what you need.

[This message has been edited by zeckensack (edited 10-31-2002).]

marcus256
11-01-2002, 11:32 AM
Originally posted by LostInTheWoods:
I think i will just stay with my current method. All this talk of performance timing is making my head hurt. There realy should be an easier way, but i guess not. lol.

In my opinion, the easiest answer IS the "performance timer" solution (frame-to-frame timing):



loop
{
// Get time and delta time for this frame
t = glfwGetTime();
dt = t - t_old;
t_old = t;

// As suggested, do everything here
UserInput();
Physics( t, dt );
Collision( t, dt );
Draw();

// We're done with this frame, swap buffers
glfwSwapBuffers();
}
until( some_criterion );

In your physics engine, for istance, use dt for calculating propagation, rather than some fixed value or average frame time value (I also included 't' in the call to the functions in the pseudo code, since some things may work better with absolute time rather than propagation).

This will give you the by far most accurate simulation, and it's really simple. You don't have to worry about lost frames or dropping framerates (you know, you will probably have completely different FPS for different scenes - just dropping to the lowest of these would probably give you 10 FPS on many machines).

marcus256
11-01-2002, 11:51 AM
Originally posted by jwatte:
The RDTSC instruction is portable to all user-mode x86 operating systems, assuming you can find a portable assembly syntax (I like NASM for that reason). Of course, then you're faced with trying to figure out the CPU speed, which is, uh, "hard" on SpeedStep chips.

Hi jwatte,

Do you know of a solution for this SpeedStep problem? I was thinking of disabling RDTSC (and fall back to some lesser timer source) on CPUs with variable core frequency, and this is what I came up with:

1) Transmeta apparantly has a constant clock for the TSC, even if the core clock is changing - Transmeta rocks!
2) It is simple to detect AMD PowerNOW! using the CPUID instruction => disable RDTSC on those chips
3) There is no way in h*ll to detect Intel SpeedStep (?!)

It may be possible to have a "chip detection list", flagging "no RDTSC" for all Intel chips which are labelled "Mobile" (basically check bits 7-0 of EBX for CPUID 1 - 06,07,0e,0f should be Mobile).

Any better ideas?

jwatte
11-01-2002, 06:28 PM
What we ended up doing was running RDTSC, QueryPerformanceCounter() and timeGetTime() in parallel, and vote.

When RDTSC was out-voted by the other two, we took that to mean that the CPU speed had changed and updated our measurement.

Given the specific nature of our real-time simulation, we ended up using RDTSC for intra-frame timing (because it's so cheap) and actually voting/updating/base-lining once per frame.