Bug 673242
Summary: | Time runs too fast in a VM on processors with > 4GHZ freq | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Alok Kataria <akataria> | ||||
Component: | kernel | Assignee: | Tim Burke <tburke> | ||||
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.7 | CC: | dhecht, dhoward, garrett, jasonmc, jiajyang, jmalanik, jpirko, jsavanyo, juzhang, knoel, kzhang, mjenner, plyons, prarit, qcai, qwan, sghosh, tburke | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | 5.7 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Previously, on VMware, the time ran too fast on virtual machines with more than 4GHz TSC (Time Step Counter) processor frequency if they were using PIT/TSC based timekeeping. This was due to a calculation bug in the get_hypervisor_cycles_per_sec function. This update fixes the calculation, and timekeeping works correctly for such virtual machines.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-07-21 09:24:07 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 690133, 690134 | ||||||
Attachments: |
|
The bugfix is fine, but it's a bit painful to port to all these releases since they are all on separate branches. We can update the RHEL5 branch first, then move it to all the z-streams. I'm actually porting fixes to RHEL5 now, so I can work this in. requesting dev ack and pm ack for 5.7; work is already done, simple patch which must be backported. You got the ack, Zach, please send the patch to rhkernel with all the relevant kernel versions. The patch does not apply cleanly due to KVM specific changes to the code. Fixing this is trivial, but I need to verify that this will not affect kernels run under Xen. patches posted for all branches Patch(es) available in kernel-2.6.18-252.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. I verified the -252 kernel and timekeeping seems to work correctly for >4GHZ tsc frequency VMs. Thanks for picking up the fix. Has anyone seen anything to indicate that this patch would do more that just deal with "wall clock" timekeeping? I've been having a problem with VMware-based VMs running EL5.6 (kernels -238.1.1 and -238.5.1) that are randomly hanging on boot. I'm unable to track this back to anything ESX-related and it all seems to be related to using TSC as a timesource. All of the issues began with the upgrade to EL 5.6 and kernel 2.6.18-238.1.1.el5 and persists in 2.6.18-238.5.1.el5 (we skipped -238.el5 for internal timing reasons). This has affected more than 25 hosts at this point of all different configurations, but always EL 5.6 VMs only. AS4 is not affected and we don't have any EL6 VMs yet. The issue is exactly the same. During the initial kernel start, it gets as far as: PCI: Setting latency timer of device 0000:00:01.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash table entries: 65536 (order: 7, 524288 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered Simple Boot Flag at 0x36 set to 0x80 The next line on all VMs that boot successfully is: Using TSC for driving interrupts However VMs that are hanging during boot never reach the "Using TSC..." line. This leads me to believe that the problem is related to the OS electing to use TSC as the clocksouce and that is somehow an unstable combination with ESX 3.5 and EL 5.6 VMs. However the issue is sporadic and I can't make this issue occur - simply that when an EL5.6 VM fails to boot, they all fail in the same place in the same way. Ay way this is related? If not, sorry for the noise, but we're grasping at straws and VMware hasn't been very helpful thus far. Sorry for the noise, I think my problem is related to changes in Bug 538022 that implemented a TSC timer for interrupts. It wouldn't hurt to test the fixed kernel in a Xen VM as well, but it's not a high priority. The backport just got a bit complex because every version of RHEL5 needed a different fix. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, on VMware, the time ran too fast on virtual machines with more than 4GHz TSC (Time Step Counter) processor frequency if they were using PIT/TSC based timekeeping. This was due to a calculation bug in the get_hypervisor_cycles_per_sec function. This update fixes the calculation, and timekeeping works correctly for such virtual machines. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html |
Created attachment 475672 [details] Fix for 4GHZ TSC issue. Description of problem: We have seen that the time in virtualized RHEL 5.4 or later guest runs too fast, when run on processors supporting more than 4GHZ TSC frequency. This is due to a bug in calculation in get_hypervisor_cycles_per_sec, this affects only VMware VM's which use the tsc_based_timekeeping. The fix is trivial and is attached. Please apply it for the next update. This bug fix is necessary for all updates of RHEL 5.4, 5.5 & 5.6. Thanks.