Red Hat Bugzilla – Bug 1145751
kvm_clock lacks protection against tsc going backwards
Last modified: 2016-01-20 02:45:37 EST
Description of problem: Due to unsigned arithmetic in timekeeping functions (specifically, timekeeping_get_ns()), kvm clocksource may return time ~1.2 hrs ahead if TSC goes slightly backwards (presumably due to CPU bug). Version-Release number of selected component (if applicable): 2.6.32-431.20.3.el6 but seems to apply to all RHEL6 and RHEL7 kernels, and Fedora too. How reproducible: once in a few days or weeks under cpu load on certain AMD Opteron CPUs Steps to Reproduce: 1. use a system with AMD Opteron (0x15 family is known to have the problem; maybe others) 2. run RHEL6 in a virtual machine (was observed in Parallels Cloud Server 6; reportedly was also seen in KVM) with kvm clocksource enabled 3. create some CPU load in the guest Actual results: Occasionally the guest "hangs" with one VCPU spinning in the timer interrupt handler processing hrtimers. After ~1.2 hrs it resumes normal operation Expected results: no hangs Additional info: The "hang" was found to be caused by ktime_get_update_offset() returning time ~4398 seconds in the future. As a result, the hrtimer processing loop in hrtimer_interrupt() didn't terminate until the time caught up. Now the apparent jump forward of the time returned was due to TSC going slightly backwards and unsigned arithmetic in timekeeping_get_ns(). This has been addressed in Linus' tree, by returning previously saved value if it happens to be bigger than the current one. I believe those commits need to be backported to RHEL6/RHEL7/Fedora kernels.
Relevant commits are: commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108 Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Jul 16 21:05:12 2014 +0000 clocksource: Move cycle_last validation to core code The only user of the cycle_last validation is the x86 TSC. In order to provide NMI safe accessor functions for clock monotonic and monotonic_raw we need to do that in the core. We can't do the TSC specific if (now < cycle_last) now = cycle_last; for the other wrapping around clocksources, but TSC has CLOCKSOURCE_MASK(64) which actually does not mask out anything so if now is less than cycle_last the subtraction will give a negative result. So we can check for that in clocksource_delta() and return 0 for that case. Implement and enable it for x86 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> commit 3a97837784acbf9fed699fc04d1799b0eb742fdf Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Jul 16 21:05:10 2014 +0000 clocksource: Make delta calculation a function We want to move the TSC sanity check into core code to make NMI safe accessors to clock monotonic[_raw] possible. For this we need to sanity check the delta calculation. Create a helper function and convert all sites to use it. [ Build fix from jstultz ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
For the record, the problem was observed on systems with AMD erratum #759 (see http://support.amd.com/TechDocs/48063_15h_Mod_00h-0Fh_Rev_Guide.pdf, p. 90): 759 One Core May Observe a Time Stamp Counter Skew ================================================== Description ----------- During a P-state change or following a C-state change, the processor core may synchronize an internal copy of the time stamp counter (TSC) incorrectly. The processor may then observe TSC values (e.g., RDTSC, RDTSCP and RDMSR 0000_0010h instructions) or MPERF (MSR0000_000E7) values that do not account for the time spent performing this last P-state or C-state change. This error is normally temporary in nature, in that it may be resolved after the next P-state or C-state change. Potential Effect on System -------------------------- System software or software with multiple threads may observe that one thread or processor core provides TSC values that are behind all of the other threads or processor cores. While a single thread operating on a single core can not observe successively stored TSC values that incorrectly decrement, it is possible that a single thread may be dispatched on one core, where the software observes a TSC, and is then dispatched by the operating system on another core that has encountered the conditions of the erratum. In this sequence of events, the thread may observe a TSC that appears to decrement. In addition, software may calculate a higher effective frequency (APERF, MSR0000_00E8, divided by MPERF). Suggested Workaround -------------------- Contact your AMD representative for information on a BIOS update. Fix Planned ----------- Yes According to https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1200533, the latest amd-ucode has a fix to this erratum; however, the bug was seen on systems with that revision of ucode, too.
[I didn't mean the bug to be private; I'd appreciate if it could be made public]
(In reply to Roman Kagan from comment #4) > [I didn't mean the bug to be private; I'd appreciate if it could be made > public] np. P.
Created attachment 941540 [details] RHEL PATCH 1/6
Created attachment 941541 [details] RHEL PATCH 2/6
Created attachment 941542 [details] RHEL PATCH 3/6
Created attachment 941543 [details] RHEL PATCH 4/6
Created attachment 941544 [details] RHEL PATCH 5/6
Created attachment 941545 [details] RHEL PATCH 6/6
Sorry everyone, I made the changes for RHEL7 first and accidentally used this BZ. I'm going to clone this to RHEL7 and POST for RHEL7 from there. P.
Created attachment 943020 [details] RHEL PATCH 1/2
Created attachment 943021 [details] RHEL PATCH 2/2
Created attachment 943022 [details] RHEL PATCH 3/2
Created attachment 943023 [details] RHEL PATCH 4/2
Created attachment 943024 [details] RHEL PATCH 5/2
Created attachment 943025 [details] RHEL PATCH 6/2
Sorry everyone, I mucked up this BZ pretty badly and am cleaning it up. I'll push 6.7 patches shortly. P.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Created attachment 975178 [details] RHEL PATCH 1/2
Created attachment 975179 [details] RHEL PATCH 2/2
Patch(es) available on kernel-2.6.32-532.el6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1272.html