Bug 81693
Summary: | Timer interrupts appear longer than they should be | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Leonard Ciavattone <lencia> | ||||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 8.0 | Keywords: | FutureFeature | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Enhancement | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2004-09-30 15:40:24 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Leonard Ciavattone
2003-01-12 22:35:56 UTC
20ms is an exact multiple of the kernel timer in 2.4.7-10 while it isn't in 2.4.18-14...... Created attachment 89337 [details]
Test program to show time inaccuracy
Created attachment 89338 [details]
Output from test program
Additional testing has shown that it is not related to the kernel timer granularity. I've raised the priority because the more I looked into this the more I am convinced that this is a big problem. This will break existing applications that expect/require reasonably accurate timers. This problem does not occur with the base RedHat 7.3 install, but does occur when the most recent 7.3 kernel patches are applied. I still don't see what is so super high priority here: Timers in 2.4.7-10 are 10msec accurate Timers in 2.4.18-X are 1.9msec accurate due to a lucky chosen number (multiple of 10msec) you get apparently soft realtime behavior that is better than 10msec accurate. if you use a value of 101139 instead of 100000 you'll get closer results with the 1.9ms timer for example... Thank you for your attention on this. While I agree that a higher resolution is nice, the fact that it is not a true multiple of the previous value (10) means that users (even with real-time scheduling) can no longer get the exact same timing intervals as before. If it was changed to 1, 2 or 5 - applications wouldn't even notice. If you consider real-time voice and video protocols (which is where my problem occured) you see that they expect packet transmissions at "typical" timing intervals (10, 20, 100,...). As an example, we're testing (the AT&T ISP network) G.729 and G.711 VoIP and the protocol is expected to transmit a packet every 20 ms (not 19.4 or 21.3). Although the difference may seem small, it does mean that we can no longer use Linux as a high-accuracy test and measurement tool for real-time applications. I think other companys that deal with VoIP or video may start having similar problems. Anyway, in light of this not being a "true" bug - could you please help me with alternatives. Specifically... Is it possible (via a kernel setting) to change the timing interval back to what it was (or a true multiple of 10)? <and> Who or what organization could I contact regarding my concerns to try to influence future kernel changes in this area? Thanks again for your help. > Is it possible (via a kernel setting) to change the timing interval back to
> what it was (or a true multiple of 10)?
it's a kernel config option, however you don't need to recompile. The i586
kernel (as opposed to the i686) kernel still has the old value.
It's not feasible to make the clock run-time-settable; we looked into that because we would have loved it to be feasible. Since networks will perturb timing a little bit anyway, it seems to me that run-time adjusting your loop waits based on gettimeofday() is both sufficient and the only real way to keep your typical timing intervals in appropriate sync. I've discovered (I think) the specific setting in question - CONFIG_HZ is set to 512 (which yields the 1.9 ms - 1/512). Also, I've come across a couple of things regarding this specific change (SEE http://kerneltrap.org/node.php? id=464 <AND> http://lists.insecure.org/lists/linux-kernel/2002/Oct/6355.html). As fas as modifying the app...If I truly need 20 ms timers I'd have to take an interrupt at 19.4 ms and poll/spin (via select) until gettimeofday() shows that 0.6 ms has expired - not a very efficient mechanism. With the previous HZ value of 100 I would get 20 ms 99.9% of the time and simply throw out the handful of measurements that did not meet that requirement. I guess the big question is why not use a true multiple of the previous value (say 500 or 1000). I think the 2.5 kernel is using 1000 and 500 is pretty close to your current i686 value. Either one would solve this issue and still give the desired effect. I would very much like to make a formal enhancement request to do just this in a future release - What should I do?. I am interested because we (the AT&T ISP test lab in NJ) have 78 measurement servers running RedHat and if this will change in the furture I'll simply stay on 7.2 for now. However, if CONFIG_HZ is going to stay 512 I'll have to change to a different distribution. Unfortunately, I won't have a choice. Thanks again for you assistance. 1000 and 500 are not really feasible. The upstream 2.5 kernel has this yes but they don't have all the corner cases solved ;( 512 (as power of two) is possible due to divides becoming shifts (and you can't really divide a 64 bit number in the linux kernel, but you can shift) As I wrote before, the i586 kernel DOES have HZ=100 so even in the current releases there always is a HZ=100 kernel. Given the latest information, I've reclassified this as normal/enhancement because I do think it would be a good idea for the default HZ to be a true multiple of the previous value (even though the implementation details may need to be worked out). Although the i585 kernel still uses the old value (100), it does require extra installation steps (an undesirable requirement in large environments). I've been trying to compensate in my application to create even timing intervals (20.0, 40.0, 60.0,...) and I must say all the solutions so far are pretty ugly. Since I think others will eventually run into the same issue I do believe it is worth the effort to change this in the future. Thanks again for your patience and assistance. One other thing to note here is that in most cases gettimeofday() is far more accurate than the timer interrupts. You can also generate a wide range of interrupt timings on off the rtc chip. This is normally done for fancy tools like profilers but nothing stops you using int Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |