Bug 1145751

Summary: kvm_clock lacks protection against tsc going backwards
Product: Red Hat Enterprise Linux 6 Reporter: Roman Kagan <rvkagan>
Component: kernelAssignee: Prarit Bhargava <prarit>
kernel sub component: Other QA Contact: Cui Chun <ccui>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: ccui, vvs
Version: 6.5   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-532.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1148398 (view as bug list) Environment:
Last Closed: 2015-07-22 08:21:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1148398, 1300182    
Attachments:
Description Flags
RHEL PATCH 1/6
none
RHEL PATCH 2/6
none
RHEL PATCH 3/6
none
RHEL PATCH 4/6
none
RHEL PATCH 5/6
none
RHEL PATCH 6/6
none
RHEL PATCH 1/2
none
RHEL PATCH 2/2
none
RHEL PATCH 3/2
none
RHEL PATCH 4/2
none
RHEL PATCH 5/2
none
RHEL PATCH 6/2
none
RHEL PATCH 1/2
none
RHEL PATCH 2/2 none

Description Roman Kagan 2014-09-23 16:08:19 UTC
Description of problem:

Due to unsigned arithmetic in timekeeping functions (specifically, timekeeping_get_ns()), kvm clocksource may return time ~1.2 hrs ahead if TSC goes slightly backwards (presumably due to CPU bug).


Version-Release number of selected component (if applicable):

2.6.32-431.20.3.el6
but seems to apply to all RHEL6 and RHEL7 kernels, and Fedora too.


How reproducible:

once in a few days or weeks under cpu load on certain AMD Opteron CPUs


Steps to Reproduce:
1. use a system with AMD Opteron (0x15 family is known to have the problem; maybe others)

2. run RHEL6 in a virtual machine (was observed in Parallels Cloud Server 6; reportedly was also seen in KVM) with kvm clocksource enabled

3. create some CPU load in the guest


Actual results:

Occasionally the guest "hangs" with one VCPU spinning in the timer interrupt handler processing hrtimers. After ~1.2 hrs it resumes normal operation

Expected results:

no hangs

Additional info:

The "hang" was found to be caused by ktime_get_update_offset() returning time ~4398 seconds in the future.  As a result, the hrtimer processing loop in hrtimer_interrupt() didn't terminate until the time caught up.

Now the apparent jump forward of the time returned was due to TSC going slightly backwards and unsigned arithmetic in timekeeping_get_ns().

This has been addressed in Linus' tree, by returning previously saved value if it happens to be bigger than the current one.
I believe those commits need to be backported to RHEL6/RHEL7/Fedora kernels.

Comment 1 Roman Kagan 2014-09-23 16:09:19 UTC
Relevant commits are:


commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108
Author: Thomas Gleixner <tglx>
Date:   Wed Jul 16 21:05:12 2014 +0000

    clocksource: Move cycle_last validation to core code
    
    The only user of the cycle_last validation is the x86 TSC. In order to
    provide NMI safe accessor functions for clock monotonic and
    monotonic_raw we need to do that in the core.
    
    We can't do the TSC specific
    
        if (now < cycle_last)
                    now = cycle_last;
    
    for the other wrapping around clocksources, but TSC has
    CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
    now is less than cycle_last the subtraction will give a negative
    result. So we can check for that in clocksource_delta() and return 0
    for that case.
    
    Implement and enable it for x86
    
    Signed-off-by: Thomas Gleixner <tglx>
    Signed-off-by: John Stultz <john.stultz>

commit 3a97837784acbf9fed699fc04d1799b0eb742fdf
Author: Thomas Gleixner <tglx>
Date:   Wed Jul 16 21:05:10 2014 +0000

    clocksource: Make delta calculation a function
    
    We want to move the TSC sanity check into core code to make NMI safe
    accessors to clock monotonic[_raw] possible. For this we need to
    sanity check the delta calculation. Create a helper function and
    convert all sites to use it.
    
    [ Build fix from jstultz ]
    
    Signed-off-by: Thomas Gleixner <tglx>
    Signed-off-by: John Stultz <john.stultz>

Comment 3 Roman Kagan 2014-09-23 16:22:14 UTC
For the record, the problem was observed on systems with AMD erratum #759 (see
http://support.amd.com/TechDocs/48063_15h_Mod_00h-0Fh_Rev_Guide.pdf, p. 90):

759 One Core May Observe a Time Stamp Counter Skew
==================================================

Description
-----------

During a P-state change or following a C-state change, the processor core may
synchronize an internal copy of the time stamp counter (TSC) incorrectly. The
processor may then observe TSC values (e.g., RDTSC, RDTSCP and RDMSR 0000_0010h
instructions) or MPERF (MSR0000_000E7) values that do not account for the time
spent performing this last P-state or C-state change. This error is normally
temporary in nature, in that it may be resolved after the next P-state or
C-state change.


Potential Effect on System
--------------------------

System software or software with multiple threads may observe that one
thread or processor core provides TSC values that are behind all of the
other threads or processor cores.

While a single thread operating on a single core can not observe
successively stored TSC values that incorrectly decrement, it is
possible that a single thread may be dispatched on one core, where the
software observes a TSC, and is then dispatched by the operating system
on another core that has encountered the conditions of the erratum. In
this sequence of events, the thread may observe a TSC that appears to
decrement.

In addition, software may calculate a higher effective frequency (APERF,
MSR0000_00E8, divided by MPERF).


Suggested Workaround
--------------------

Contact your AMD representative for information on a BIOS update.


Fix Planned
-----------

Yes



According to https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1200533, the latest amd-ucode has a fix to this erratum; however, the bug was seen on systems with that revision of ucode, too.

Comment 4 Roman Kagan 2014-09-23 16:34:27 UTC
[I didn't mean the bug to be private; I'd appreciate if it could be made public]

Comment 5 Prarit Bhargava 2014-09-24 12:25:06 UTC
(In reply to Roman Kagan from comment #4)
> [I didn't mean the bug to be private; I'd appreciate if it could be made
> public]

np.

P.

Comment 6 Prarit Bhargava 2014-09-26 11:29:09 UTC
Created attachment 941540 [details]
RHEL PATCH 1/6

Comment 7 Prarit Bhargava 2014-09-26 11:29:10 UTC
Created attachment 941541 [details]
RHEL PATCH 2/6

Comment 8 Prarit Bhargava 2014-09-26 11:29:11 UTC
Created attachment 941542 [details]
RHEL PATCH 3/6

Comment 9 Prarit Bhargava 2014-09-26 11:29:12 UTC
Created attachment 941543 [details]
RHEL PATCH 4/6

Comment 10 Prarit Bhargava 2014-09-26 11:29:13 UTC
Created attachment 941544 [details]
RHEL PATCH 5/6

Comment 11 Prarit Bhargava 2014-09-26 11:29:15 UTC
Created attachment 941545 [details]
RHEL PATCH 6/6

Comment 13 Prarit Bhargava 2014-10-01 11:45:57 UTC
Sorry everyone, I made the changes for RHEL7 first and accidentally used this BZ.  I'm going to clone this to RHEL7 and POST for RHEL7 from there.

P.

Comment 14 Prarit Bhargava 2014-10-01 11:48:42 UTC
Created attachment 943020 [details]
RHEL PATCH 1/2

Comment 15 Prarit Bhargava 2014-10-01 11:48:43 UTC
Created attachment 943021 [details]
RHEL PATCH 2/2

Comment 16 Prarit Bhargava 2014-10-01 11:48:45 UTC
Created attachment 943022 [details]
RHEL PATCH 3/2

Comment 17 Prarit Bhargava 2014-10-01 11:48:46 UTC
Created attachment 943023 [details]
RHEL PATCH 4/2

Comment 18 Prarit Bhargava 2014-10-01 11:48:47 UTC
Created attachment 943024 [details]
RHEL PATCH 5/2

Comment 19 Prarit Bhargava 2014-10-01 11:48:49 UTC
Created attachment 943025 [details]
RHEL PATCH 6/2

Comment 20 Prarit Bhargava 2014-12-18 19:39:25 UTC
Sorry everyone, I mucked up this BZ pretty badly and am cleaning it up.  I'll push 6.7 patches shortly.

P.

Comment 21 RHEL Program Management 2014-12-18 20:19:58 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 22 Prarit Bhargava 2015-01-02 11:15:34 UTC
Created attachment 975178 [details]
RHEL PATCH 1/2

Comment 23 Prarit Bhargava 2015-01-02 11:15:36 UTC
Created attachment 975179 [details]
RHEL PATCH 2/2

Comment 25 Rafael Aquini 2015-02-17 19:07:39 UTC
Patch(es) available on kernel-2.6.32-532.el6

Comment 31 errata-xmlrpc 2015-07-22 08:21:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html