Bug 81693

Summary:

Timer interrupts appear longer than they should be

Product:

[Retired] Red Hat Linux

Reporter:

Leonard Ciavattone <lencia>

Component:

kernel

Assignee:

Arjan van de Ven <arjanv>

Status:

CLOSED WONTFIX

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

8.0

Keywords:

FutureFeature

Target Milestone:

---

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Enhancement

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2004-09-30 15:40:24 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Test program to show time inaccuracy	none
Output from test program	none

Description Leonard Ciavattone 2003-01-12 22:35:56 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Description of problem:
In my application I use setitimer(ITIMER_REAL, &tval, NULL) to set a 20 ms 
timer. I then sit on select() and awake every 20 ms if no I/O has occured. I 
check the time (with gettimeofday()) and verify that the delta between 
interrupts (i.e., wakeups) is 20 ms. With RedHat 7.2 the interrupts happened 
at 20 ms all the time (99.9%) even without real-time scheduling.

However, with 8.0 when I call gettimeofday() after the interrupt it appears 
that the interval is always 2 to 10 ms longer than it should be. I tried the 
non-smp kernel (with my 2 CPUs) as well as taking out the second CPU and 
trying the non-smp kernel - both still fail. I upgraded to the latest kernel 
patches and it still fails. Finally, I went back to 7.2 (kernel 2.4.7-10) on 
the same hardware and it worked like a champ.

Is this a bug or did something in timer processing change?

Please help!


How reproducible:
Always

Steps to Reproduce:
See description    

Additional info:

Comment 1 Arjan van de Ven 2003-01-12 22:54:47 UTC

20ms is an exact multiple of the kernel timer in 2.4.7-10 while it isn't in
2.4.18-14......

Comment 2 Leonard Ciavattone 2003-01-13 16:17:05 UTC

Created attachment 89337 [details]
Test program to show time inaccuracy

Comment 3 Leonard Ciavattone 2003-01-13 16:18:18 UTC

Created attachment 89338 [details]
Output from test program

Comment 4 Leonard Ciavattone 2003-01-14 21:52:59 UTC

Additional testing has shown that it is not related to the kernel timer 
granularity.

I've raised the priority because the more I looked into this the more I am 
convinced that this is a big problem. This will break existing applications 
that expect/require reasonably accurate timers. This problem does not occur 
with the base RedHat 7.3 install, but does occur when the most recent 7.3 
kernel patches are applied.

Comment 5 Arjan van de Ven 2003-01-14 22:37:54 UTC

I still don't see what is so super high priority here:
Timers in 2.4.7-10 are 10msec accurate
Timers in 2.4.18-X are 1.9msec accurate
due to a lucky chosen number (multiple of 10msec) you get apparently soft
realtime behavior that is better than 10msec accurate.

Comment 6 Arjan van de Ven 2003-01-14 22:49:29 UTC

if you use a value of 101139 instead of 100000 you'll get closer results with
the 1.9ms timer for example...

Comment 7 Leonard Ciavattone 2003-01-15 21:24:33 UTC

Thank you for your attention on this. While I agree that a higher resolution 
is nice, the fact that it is not a true multiple of the previous value (10) 
means that users (even with real-time scheduling) can no longer get the exact 
same timing intervals as before. If it was changed to 1, 2 or 5 - applications 
wouldn't even notice.

If you consider real-time voice and video protocols (which is where my problem 
occured) you see that they expect packet transmissions at "typical" timing 
intervals (10, 20, 100,...). As an example, we're testing (the AT&T ISP 
network) G.729 and G.711 VoIP and the protocol is expected to transmit a 
packet every 20 ms (not 19.4 or 21.3). Although the difference may seem small, 
it does mean that we can no longer use Linux as a high-accuracy test and 
measurement tool for real-time applications. I think other companys that deal 
with VoIP or video may start having similar problems.

Anyway, in light of this not being a "true" bug - could you please help me 
with alternatives. Specifically...

Is it possible (via a kernel setting) to change the timing interval back to 
what it was (or a true multiple of 10)?

<and>

Who or what organization could I contact regarding my concerns to try to 
influence future kernel changes in this area?

Thanks again for your help.

Comment 8 Arjan van de Ven 2003-01-15 21:51:38 UTC

> Is it possible (via a kernel setting) to change the timing interval back to 
> what it was (or a true multiple of 10)?

it's a kernel config option, however you don't need to recompile. The i586
kernel (as opposed to the i686) kernel still has the old value.

Comment 9 Michael K. Johnson 2003-01-15 23:28:51 UTC

It's not feasible to make the clock run-time-settable; we looked into
that because we would have loved it to be feasible.

Since networks will perturb timing a little bit anyway, it seems to me
that run-time adjusting your loop waits based on gettimeofday() is both
sufficient and the only real way to keep your typical timing intervals
in appropriate sync.

Comment 10 Leonard Ciavattone 2003-01-16 22:39:49 UTC

I've discovered (I think) the specific setting in question - CONFIG_HZ is set 
to 512 (which yields the 1.9 ms - 1/512). Also, I've come across a couple of 
things regarding this specific change (SEE http://kerneltrap.org/node.php?
id=464 <AND> http://lists.insecure.org/lists/linux-kernel/2002/Oct/6355.html).

As fas as modifying the app...If I truly need 20 ms timers I'd have to take an 
interrupt at 19.4 ms and poll/spin (via select) until gettimeofday() shows 
that 0.6 ms has expired - not a very efficient mechanism. With the previous HZ 
value of 100 I would get 20 ms 99.9% of the time and simply throw out the 
handful of measurements that did not meet that requirement.

I guess the big question is why not use a true multiple of the previous value 
(say 500 or 1000). I think the 2.5 kernel is using 1000 and 500 is pretty 
close to your current i686 value. Either one would solve this issue and still 
give the desired effect. I would very much like to make a formal enhancement 
request to do just this in a future release - What should I do?.

I am interested because we (the AT&T ISP test lab in NJ) have 78 measurement 
servers running RedHat and if this will change in the furture I'll simply stay 
on 7.2 for now. However, if CONFIG_HZ is going to stay 512 I'll have to change 
to a different distribution. Unfortunately, I won't have a choice.

Thanks again for you assistance.

Comment 11 Arjan van de Ven 2003-01-16 23:52:11 UTC

1000 and 500 are not really feasible. The upstream 2.5 kernel has this yes but
they don't have all the corner cases solved ;(
512 (as power of two) is possible due to divides becoming shifts (and you can't
really divide a 64 bit number in the linux kernel, but you can shift)

As I wrote before, the i586 kernel DOES have HZ=100 so even in the current
releases there always is a HZ=100 kernel.

Comment 12 Leonard Ciavattone 2003-01-20 20:51:03 UTC

Given the latest information, I've reclassified this as normal/enhancement 
because I do think it would be a good idea for the default HZ to be a true 
multiple of the previous value (even though the implementation details may 
need to be worked out). Although the i585 kernel still uses the old value 
(100), it does require extra installation steps (an undesirable requirement in 
large environments).

I've been trying to compensate in my application to create even timing 
intervals (20.0, 40.0, 60.0,...) and I must say all the solutions so far are 
pretty ugly. Since I think others will eventually run into the same issue I do 
believe it is worth the effort to change this in the future.

Thanks again for your patience and assistance.

Comment 13 Alan Cox 2003-06-05 14:09:26 UTC

One other thing to note here is that in most cases gettimeofday() is far more
accurate
than the timer interrupts.

You can also generate a wide range of interrupt timings on off the rtc chip.
This is normally done for fancy tools like profilers but nothing stops you using int

Comment 14 Bugzilla owner 2004-09-30 15:40:24 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/