PDA

View Full Version : Linux fine, windows **** ?



jide
01-07-2002, 03:50 AM
my programm works fine under Linux (around 40 im per second) and it's fluid.
But under Windows, even if i have a better framerate, the render hangs many times in a second.
So we could see more than one car when accelerating or breaking.

Studying this, i think it's a time problem.
why linux time is 1/1000000 second, otherwise windows time is 1/1000 second.
Does that may improper calculations, so the scene is not good ?
Another way is maybe the keyboard link.
What's the problem ?

GPSnoopy
01-07-2002, 06:37 AM
I bet you're using GetTickCount() in your timer. Don't use it, it's not precise enough.

Use instead QueryPerformanceCounter() and QueryPerformanceFrequency().

They are a lot more precise, the only problem is that they work with integer on 64bits, and if you don't use them correctly you could lose the precision you gained.
Try to stay in int64 as long as possible for the math parts before putting the results into double.

tfpsly
01-07-2002, 12:55 PM
Do not forget that Linux uses an interpolated timer that is precise to 1 ms, whereas windows uses the hardware clock

dorbie
01-07-2002, 06:17 PM
Once again I see an excuse to post my old win32 timing code.




// option to low pas high res timer
//#define LOW_PASS_HIGH_RES

// Compute elapsed time since last call (call once per frame)
float DeltaTime()
{
static int first = 1;
static int count;
static BOOL HighRes;
static DWORD this_time, old_time, oelapsed, elapsed[20];
static float this_timef, old_timef, oelapsedf, elapsedf[3], resolutionf;
LARGE_INTEGER pcount;

if(first)
{
first = 0;
// test for high res timer and get resolution in milliseconds
HighRes = QueryPerformanceFrequency(&pcount);
if(HighRes)
{
resolutionf = pcount.LowPart/1000.0f;

// init low pass array to 60 Hz assumption
for(count = 0; count < 3; count++)
{
oelapsedf = elapsedf[count] = 16.667f;
}

QueryPerformanceCounter(&amp;pcount);
old_timef = (float) pcount.LowPart;
old_timef /= resolutionf;
}
else
{
// init low pass array to 60 Hz assumption
for(count = 0; count < 20; count++)
oelapsed = elapsed[count] = 16;
// init time to current value
old_time = GetTickCount();
}

count = 0;
return 16.667f;
}
else
{
if(HighRes)
{
// Use High Res Timer to compute elapsed time in ms
// low pas to eliminate any jitter is optional
QueryPerformanceCounter(&amp;pcount);
this_timef = (float) pcount.LowPart;
this_timef /= resolutionf;
// stick with old elapsed if loopback detected
if(!(this_timef < old_timef))
{
oelapsedf = elapsedf[count] = this_timef - old_timef;
}
else
{
elapsedf[count] = oelapsedf;
}

old_timef = this_timef;
#ifdef LOW_PASS_HIGH_RES // option to low pas high res timer
count ++;
if(count == 3)
count = 0;

return (elapsedf[0] + elapsedf[1] + elapsedf[2]) *.3333333f;
#else
return (elapsedf[0]);
#endif
}
else
{
// Use Low res timer to compute elapsed returns ms
// Must low pass over several frames since timer
// res may be much < 1 ms
this_time = GetTickCount();

// stick with old elapsed if loopback detected
if(!(this_time < old_time))
{
oelapsed = elapsed[count] = this_time - old_time;
}
else
{
elapsed[count] = oelapsed;
}
count ++;
if(count == 20)
count = 0;
old_time = this_time;
return (elapsed[0] + elapsed[1] + elapsed[2] + elapsed[3] + elapsed[4] +
elapsed[5] + elapsed[6] + elapsed[7] + elapsed[8] + elapsed[9] +
elapsed[10] + elapsed[11] + elapsed[12] + elapsed[13] + elapsed[14] +
elapsed[15] + elapsed[16] + elapsed[17] + elapsed[18] + elapsed[19]) *.05f;
}
}
}

jide
01-07-2002, 10:54 PM
sorry Dorbie, i don't understand anything on what you're doing.

I don't use clock per sec (ăround 19 hits per sec). So, I use ftime() under Windows with double variables and gettimeofday() under Linux with double variables.

the first is precise at 1/1000 sec (Windows)
the linux is precise at 1/1000000 sec.

Is it a good solution , or does your
QueryPerformanceCounter() and QueryPerformanceFrequency() is better ?

And that's maybe not a timer problem !

JD

jide
01-08-2002, 01:56 AM
... and while passing float instead of double for all variables (time, and vertices...), all is more speedest, but hangs remain under Windows !!

hmmm... i use display list with small or huge models, and that's the same thing (but the framerate different).

does multi-thread would forget this problem ?
can it be the keyboard function (under glut), or its implementation ?
I remember that, before, the keyboard callback was in the main file, and was implemented here. Now, i move the implementation in a class method. I remember that, under Linux, it slowed down the rotation speed.

I don't see any other way.

please help, that's very constraignant to have that under Windows

thanks a lot

JD

OldMan
01-08-2002, 03:05 AM
I`m not sure.. but GLUT has a limit of frequency that you can call the functions (that would explain the hangs). Did you tried to use the glut GameMode? I I remember well GLUT doesn`t call display function more than 30 FPS... (I really don`t remeber the number... but I`m sure I read something about this ).

jide
01-08-2002, 04:38 AM
Oldman, i already have a framerate superior than 150 im/sec, so I don't think glut display function callback is too limited for having such a way under Windows.

It's very strange because i remember it works properly at the beginning (around 1 year ago now).

No, i haven't try to use glutGameMode(); I don't know how to use it nor. Help would be appreciated here.

cordially,

JD

GPSnoopy
01-08-2002, 06:54 AM
jide, under Windows, use the PerformanceCounter.

Both are explained at the bottom of this page: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/time_4po3.asp

IMO dorbie's code is way to complex for such a simple matter. http://www.opengl.org/discussion_boards/ubb/wink.gif
Also including another code path if the Performance Counter doesn't work, as in dorbie's code, is a bit useless IMO. All modern system (since the Pentium) have a performance counter included.

PS: LARGE_INTEGER is the same as __int64 under VC++.

[This message has been edited by GPSnoopy (edited 01-08-2002).]

jwatte
01-08-2002, 08:07 AM
QueryPerformanceCounter() is available on all PCI and better systems. However, it has a tendency to skip forward about 4 seconds every so often (Microsoft blames the chip sets, but all chip sets cause the same bug...)

Every CPU from the Pentium on has the RDTSC instruction, which returns a 64-bit integer which increments once per CPU cycle. Thus, if you know how fast your CPU is, you can get nanosecond resolution in timing (modulo instruction pipelining/scheduling). And calling RDTSC is much, much less overhead than calling timeGetTime() or QueryPerformanceCounter().

GPSnoopy
01-08-2002, 09:05 AM
Isn't QueryPerformanceCounter actually using RDTSC?

PS, RDTSC returned integer increments on a constant basis... but not specificly on each CPU cycles. (Current CPUs increments at each cycles, but it might change, espescially with the increasing clock speed)

Elixer
01-08-2002, 10:25 AM
Originally posted by jwatte:
QueryPerformanceCounter() is available on all PCI and better systems. However, it has a tendency to skip forward about 4 seconds every so often (Microsoft blames the chip sets, but all chip sets cause the same bug...)

Every CPU from the Pentium on has the RDTSC instruction, which returns a 64-bit integer which increments once per CPU cycle. Thus, if you know how fast your CPU is, you can get nanosecond resolution in timing (modulo instruction pipelining/scheduling). And calling RDTSC is much, much less overhead than calling timeGetTime() or QueryPerformanceCounter().

jwatte, Where did you find that it skips ~4 secs every so often? That would explain a situation I had last year, and I thought it was because I had a error someplace... I would think they would mention this in the MSDN docs?

jwatte
01-08-2002, 11:41 AM
Elixer,

The bug is documented in the knowledge base on the web MSDN site.

GPSnoopy,

No, QPC does not use RDTSC (I was also under that mis-impression for a long time).

The ia32 architecture definition (instruction reference) explicitly says that the processor increments the time stamp counter every clock cycle, and resets it to 0 whenever the processor is reset. I take this as an iron-clad guarantee that there is a 1:1 relation between clock cycles and RDTSC ticks. Of course, how many instructions/u-ops can actually get executed in a clock cycle may change between CPUs.

The trick is figuring out what your CPU speed actually is; especially when you're on SpeedStep and it might change on the fly. I use one of the other timers as a reference now and then to measure CPU speed, and re-sync my CPU speed estimate. Works well.

dorbie
01-08-2002, 02:56 PM
It tries to compute the delta time in milliseconds between the last frame and this frame. It doesn't round to 1000th of a second, it gives you a fractional result in milliseconds, feel alter the scale of the return value.

It doesn't need a double, my computer isn't that fast and neither is yours. A deltatime float in milliseconds is enough for anyone with a PC.

There are two methods used, one uses a high performance counter the other doesn't it will only fall back on the slower counter if it can't find the better one.

There is also the option to average the result over several frames to avoid jitter. This also helps with a slower counter because the resrult is some resaonably accurate average time based on several frames so it won't get rounded as much by the slowe timer option.

Beyone this you don't really need to understand it, if you throw it in your code, it will do what you need.

jide
01-08-2002, 11:25 PM
Dorbie, i would like to understand before putting anything in my code.
In all cases, my actual time function (its a method) is less big than yours. However, if it's better to use PerformanceCounter, i will use it, almost if i'll get nanosecond precision as under Linux.
The problem is that now, it will change almost my clock class: i use double (or float) values, and now i have to use __int64 nearby.

In all cases, i tried many time to count how much time does my system take to count
1000 000 000 in NULL (for( i=0; i<1000000000; i++) for example).
On my old system (AMD K6-2 300MHz) the frequency was about 275MHz, and now, on an Athlon 1600+Xp it's about 690MHz. How i was estonished !!! -- if anyone could explain it ?
I hope this your code will help (under Windows), but now i must find a better under Linux (it seems).

thank you all

JD

dorbie
01-08-2002, 11:44 PM
I admire and agree with your position, but would have expected you to examine the code. I was a bit surprised it attracted so many comments for something so simple.

I'm not sure I understand the rest of your post, but here goes. I think it's somewhat naieve to use a loop like the one you have to measure performance, different compilers could optimize this differently, including unrolling it. It might even be possible to optimize it to the equivalent of:

i=1000000000;

In addition the ability to pipeline these instructions and the dependency of the loop on the previous itteration's result would affect the performance severely. You also have a branch which will easily be predicted but it'll still block on the increment.

Basically this is a VERY bad way to try and measure performance. Clock is not the whole picture and the ability to promote instructions is heavily dependent on the suitability of the code for pipelining and the availability of instructions and data being used, and the suitability of the instructions to be run on the multiple instruction units on the processor.

jide
01-09-2002, 01:17 AM
yes, but that's a simple command to execute!

OK for your code. it seems need the cpu frequency to work properly. But, in general, we haven't got the exact cpu speed, so maybe time would loose exactitude (no ?).

// init low pass array to 60 Hz assumption
for(count = 0; count < 3; count++){ oelapsedf = elapsedf[count] = 16.667f;
}

this (16.667f) is not correct: using 16.6666666666666... would be better, isn't it ?

Do you have real-time here ?
Anyway, i will test it under Windows. otherwise under Linux, I haven't got such a way to do, and must stay on a basis command with GetTimeOfDay().

You may found your code so simple, but not anyone could have done that so easily (i didn't know). MSDN doens't give me this way when seeking for time, chrono, clock. So, i had to use ftime().

Thanks a lot, i will tell you (i hope soon) how Windows stand with it.

Ah, sorry, i have forgot.
Yesterday, i tried to execute my demo on a friend's computer (under windows of course).
he has got a celeron 433. the demos seemed not hanging so much as on my computer.
may i have not correct drivers ? (that's another way of solution for my problem).

JD

dorbie
01-09-2002, 02:15 AM
Jibe,

It isn't simple to execute when the processor is designed to simultaneously work on several instructions but must wait on the result of this instruction before proceeding to the next. You can assume it should be simple and continue to be shocked and surprised at the result or you can accept the explanation you asked for.

As for the 16.667 it is a gross assumption for the first time through the loop. What you are complaining about is an error of about 4 ten-millionths of a second in a piece of code which is GUESSING what the frame rate is likely to be. At this stage the frame rate could be anything, it's just a filler which is better than zero. As for the rounding, it's as likely the 60Hz video clock had more of an error in it that that number, which is in milliseconds. It is also more likely that the graphics is running at 30Hz or 100Hz and the guess is out by large ammounts for the first two frames.

jide
01-09-2002, 03:45 AM
OK,

i think i accept your explanations.

If I understood well your second part, i think 16.667 is just to scale the first value for the next calls. to find higher or lower rates, i think.
Now, in your code, i didn't understand the lowres part (maybe doing average time value?)
.

So, your code give laps time between two calls in milli second and is much better than ftime(). That's all right !

JD

dorbie
01-09-2002, 04:17 AM
The averaging is there to smooth out any noise of jitter over several frames. In the case of the high res timer it's optional.

In graphics a timer like this typically measures the time taken for the last frame and uses this to animate for the next frame. That can have undesirable effects especially with load ballancing, so averaging can help a bit.

With the low res timer you don't have enough resolution for the kind of measurements I was making so averaging helps you extract a reasonable high resolution result from a low resolution timer if you're in a loop.

You can ignore the low res stuff, I think all PCs have the higher resolution timer now.

You also don't want to average the high resolution timer because of the nature of your measurement. It was a reasonable option for me because I was in a rendering loop with consistent frame times, you are not.

You may even want to just look at how I use the QueryPerformanceCounter call and take your timings directly from that call.

jwatte
01-09-2002, 12:29 PM
I believe the problem (under windows) is this:

1) timeGetTime() is easy to call, but returns only milliseconds (which at 50 fps gives an error of up to 5%!) and takes a long time to execute.

2) QueryPerformanceCounter() returns microsecond-or-better resolution, but is slightly harder to call (because you need to divide by the resolution) and will occasionally step forward in time by > 4 seconds. It is faster than timeGetTime(), but still takes a good two microseconds on a P-III 1 GHz.

3) RDTSC is very fast (I measured 47 nanoseconds), and cycle accurate. However, it is hardest to call (you have to define a "naked" assembly function) and you have to find out the effective CPU speed from somewhere (system registry, or measuring it).

What I end up doing is using RDTSC, but using timeGetTime() to calibrate the RDTSC every so often. If I do this calibration every 100 frames, that 5% error has shrunk to 0.05%, which I can live with :-) The draw-back is that it takes some time for the calibration to warm up, and I have to take a wild guess at CPU speed before then based on spinning for 10-20 milliseconds and counting cycles (which gives a good 10% error up front, worst case).

GPSnoopy
01-09-2002, 04:08 PM
I've never got that 4 secondes step forward. However I've read about it on MSDN.

I only call QueryPerformanceCounter() once per frame. So speed isn't really important there. (0.000002 sec?)

I use RDTSC along with QueryPermanceCounter(). With two RDTSC, a sleep() of 100ms and two QueryPerfCounter() to get the elapsed time, I get a CPU frequency accuracy of about 0.05%. (with 1000ms of sleep, it's 0.005%)
However I don't have that 4 sec step foward bug, so...

jide
01-10-2002, 11:17 PM
hello,

yesterday, i tried Windows 2000 on my computer. So, i changed my drivers. And there, my programm didn't hang anymore, but the framerate was less than before.

Was it a driver problem ?