Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1145751 - kvm_clock lacks protection against tsc going backwards
kvm_clock lacks protection against tsc going backwards
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.5
Unspecified Unspecified
unspecified Severity high
: rc
: ---
Assigned To: Prarit Bhargava
Cui Chun
:
Depends On:
Blocks: 1300182 1148398
  Show dependency treegraph
 
Reported: 2014-09-23 12:08 EDT by Roman Kagan
Modified: 2016-01-20 02:45 EST (History)
2 users (show)

See Also:
Fixed In Version: kernel-2.6.32-532.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1148398 (view as bug list)
Environment:
Last Closed: 2015-07-22 04:21:07 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
RHEL PATCH 1/6 (4.29 KB, patch)
2014-09-26 07:29 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 2/6 (10.81 KB, patch)
2014-09-26 07:29 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 3/6 (1.37 KB, patch)
2014-09-26 07:29 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 4/6 (3.88 KB, patch)
2014-09-26 07:29 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 5/6 (6.63 KB, patch)
2014-09-26 07:29 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 6/6 (4.69 KB, patch)
2014-09-26 07:29 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 1/2 (4.29 KB, patch)
2014-10-01 07:48 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 2/2 (10.81 KB, patch)
2014-10-01 07:48 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 3/2 (1.37 KB, patch)
2014-10-01 07:48 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 4/2 (3.88 KB, patch)
2014-10-01 07:48 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 5/2 (6.63 KB, patch)
2014-10-01 07:48 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 6/2 (4.69 KB, patch)
2014-10-01 07:48 EDT, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 1/2 (4.93 KB, patch)
2015-01-02 06:15 EST, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 2/2 (5.44 KB, patch)
2015-01-02 06:15 EST, Prarit Bhargava
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1272 normal SHIPPED_LIVE Moderate: kernel security, bug fix, and enhancement update 2015-07-22 07:56:25 EDT

  None (edit)
Description Roman Kagan 2014-09-23 12:08:19 EDT
Description of problem:

Due to unsigned arithmetic in timekeeping functions (specifically, timekeeping_get_ns()), kvm clocksource may return time ~1.2 hrs ahead if TSC goes slightly backwards (presumably due to CPU bug).


Version-Release number of selected component (if applicable):

2.6.32-431.20.3.el6
but seems to apply to all RHEL6 and RHEL7 kernels, and Fedora too.


How reproducible:

once in a few days or weeks under cpu load on certain AMD Opteron CPUs


Steps to Reproduce:
1. use a system with AMD Opteron (0x15 family is known to have the problem; maybe others)

2. run RHEL6 in a virtual machine (was observed in Parallels Cloud Server 6; reportedly was also seen in KVM) with kvm clocksource enabled

3. create some CPU load in the guest


Actual results:

Occasionally the guest "hangs" with one VCPU spinning in the timer interrupt handler processing hrtimers. After ~1.2 hrs it resumes normal operation

Expected results:

no hangs

Additional info:

The "hang" was found to be caused by ktime_get_update_offset() returning time ~4398 seconds in the future.  As a result, the hrtimer processing loop in hrtimer_interrupt() didn't terminate until the time caught up.

Now the apparent jump forward of the time returned was due to TSC going slightly backwards and unsigned arithmetic in timekeeping_get_ns().

This has been addressed in Linus' tree, by returning previously saved value if it happens to be bigger than the current one.
I believe those commits need to be backported to RHEL6/RHEL7/Fedora kernels.
Comment 1 Roman Kagan 2014-09-23 12:09:19 EDT
Relevant commits are:


commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jul 16 21:05:12 2014 +0000

    clocksource: Move cycle_last validation to core code
    
    The only user of the cycle_last validation is the x86 TSC. In order to
    provide NMI safe accessor functions for clock monotonic and
    monotonic_raw we need to do that in the core.
    
    We can't do the TSC specific
    
        if (now < cycle_last)
                    now = cycle_last;
    
    for the other wrapping around clocksources, but TSC has
    CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
    now is less than cycle_last the subtraction will give a negative
    result. So we can check for that in clocksource_delta() and return 0
    for that case.
    
    Implement and enable it for x86
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: John Stultz <john.stultz@linaro.org>

commit 3a97837784acbf9fed699fc04d1799b0eb742fdf
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jul 16 21:05:10 2014 +0000

    clocksource: Make delta calculation a function
    
    We want to move the TSC sanity check into core code to make NMI safe
    accessors to clock monotonic[_raw] possible. For this we need to
    sanity check the delta calculation. Create a helper function and
    convert all sites to use it.
    
    [ Build fix from jstultz ]
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: John Stultz <john.stultz@linaro.org>
Comment 3 Roman Kagan 2014-09-23 12:22:14 EDT
For the record, the problem was observed on systems with AMD erratum #759 (see
http://support.amd.com/TechDocs/48063_15h_Mod_00h-0Fh_Rev_Guide.pdf, p. 90):

759 One Core May Observe a Time Stamp Counter Skew
==================================================

Description
-----------

During a P-state change or following a C-state change, the processor core may
synchronize an internal copy of the time stamp counter (TSC) incorrectly. The
processor may then observe TSC values (e.g., RDTSC, RDTSCP and RDMSR 0000_0010h
instructions) or MPERF (MSR0000_000E7) values that do not account for the time
spent performing this last P-state or C-state change. This error is normally
temporary in nature, in that it may be resolved after the next P-state or
C-state change.


Potential Effect on System
--------------------------

System software or software with multiple threads may observe that one
thread or processor core provides TSC values that are behind all of the
other threads or processor cores.

While a single thread operating on a single core can not observe
successively stored TSC values that incorrectly decrement, it is
possible that a single thread may be dispatched on one core, where the
software observes a TSC, and is then dispatched by the operating system
on another core that has encountered the conditions of the erratum. In
this sequence of events, the thread may observe a TSC that appears to
decrement.

In addition, software may calculate a higher effective frequency (APERF,
MSR0000_00E8, divided by MPERF).


Suggested Workaround
--------------------

Contact your AMD representative for information on a BIOS update.


Fix Planned
-----------

Yes



According to https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1200533, the latest amd-ucode has a fix to this erratum; however, the bug was seen on systems with that revision of ucode, too.
Comment 4 Roman Kagan 2014-09-23 12:34:27 EDT
[I didn't mean the bug to be private; I'd appreciate if it could be made public]
Comment 5 Prarit Bhargava 2014-09-24 08:25:06 EDT
(In reply to Roman Kagan from comment #4)
> [I didn't mean the bug to be private; I'd appreciate if it could be made
> public]

np.

P.
Comment 6 Prarit Bhargava 2014-09-26 07:29:09 EDT
Created attachment 941540 [details]
RHEL PATCH 1/6
Comment 7 Prarit Bhargava 2014-09-26 07:29:10 EDT
Created attachment 941541 [details]
RHEL PATCH 2/6
Comment 8 Prarit Bhargava 2014-09-26 07:29:11 EDT
Created attachment 941542 [details]
RHEL PATCH 3/6
Comment 9 Prarit Bhargava 2014-09-26 07:29:12 EDT
Created attachment 941543 [details]
RHEL PATCH 4/6
Comment 10 Prarit Bhargava 2014-09-26 07:29:13 EDT
Created attachment 941544 [details]
RHEL PATCH 5/6
Comment 11 Prarit Bhargava 2014-09-26 07:29:15 EDT
Created attachment 941545 [details]
RHEL PATCH 6/6
Comment 13 Prarit Bhargava 2014-10-01 07:45:57 EDT
Sorry everyone, I made the changes for RHEL7 first and accidentally used this BZ.  I'm going to clone this to RHEL7 and POST for RHEL7 from there.

P.
Comment 14 Prarit Bhargava 2014-10-01 07:48:42 EDT
Created attachment 943020 [details]
RHEL PATCH 1/2
Comment 15 Prarit Bhargava 2014-10-01 07:48:43 EDT
Created attachment 943021 [details]
RHEL PATCH 2/2
Comment 16 Prarit Bhargava 2014-10-01 07:48:45 EDT
Created attachment 943022 [details]
RHEL PATCH 3/2
Comment 17 Prarit Bhargava 2014-10-01 07:48:46 EDT
Created attachment 943023 [details]
RHEL PATCH 4/2
Comment 18 Prarit Bhargava 2014-10-01 07:48:47 EDT
Created attachment 943024 [details]
RHEL PATCH 5/2
Comment 19 Prarit Bhargava 2014-10-01 07:48:49 EDT
Created attachment 943025 [details]
RHEL PATCH 6/2
Comment 20 Prarit Bhargava 2014-12-18 14:39:25 EST
Sorry everyone, I mucked up this BZ pretty badly and am cleaning it up.  I'll push 6.7 patches shortly.

P.
Comment 21 RHEL Product and Program Management 2014-12-18 15:19:58 EST
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.
Comment 22 Prarit Bhargava 2015-01-02 06:15:34 EST
Created attachment 975178 [details]
RHEL PATCH 1/2
Comment 23 Prarit Bhargava 2015-01-02 06:15:36 EST
Created attachment 975179 [details]
RHEL PATCH 2/2
Comment 25 Rafael Aquini 2015-02-17 14:07:39 EST
Patch(es) available on kernel-2.6.32-532.el6
Comment 31 errata-xmlrpc 2015-07-22 04:21:07 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html

Note You need to log in before you can comment on or make changes to this bug.