1145751 – kvm_clock lacks protection against tsc going backwards

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1145751 - kvm_clock lacks protection against tsc going backwards

Summary: kvm_clock lacks protection against tsc going backwards

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Prarit Bhargava
QA Contact:	Cui Chun
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1148398 1300182
TreeView+	depends on / blocked

Reported:	2014-09-23 16:08 UTC by Roman Kagan
Modified:	2016-01-20 07:45 UTC (History)
CC List:	2 users (show)
Fixed In Version:	kernel-2.6.32-532.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1148398 (view as bug list)
Environment:
Last Closed:	2015-07-22 08:21:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
RHEL PATCH 1/6 (4.29 KB, patch) 2014-09-26 11:29 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 2/6 (10.81 KB, patch) 2014-09-26 11:29 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 3/6 (1.37 KB, patch) 2014-09-26 11:29 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 4/6 (3.88 KB, patch) 2014-09-26 11:29 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 5/6 (6.63 KB, patch) 2014-09-26 11:29 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 6/6 (4.69 KB, patch) 2014-09-26 11:29 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 1/2 (4.29 KB, patch) 2014-10-01 11:48 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 2/2 (10.81 KB, patch) 2014-10-01 11:48 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 3/2 (1.37 KB, patch) 2014-10-01 11:48 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 4/2 (3.88 KB, patch) 2014-10-01 11:48 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 5/2 (6.63 KB, patch) 2014-10-01 11:48 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 6/2 (4.69 KB, patch) 2014-10-01 11:48 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 1/2 (4.93 KB, patch) 2015-01-02 11:15 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL PATCH 2/2 (5.44 KB, patch) 2015-01-02 11:15 UTC, Prarit Bhargava	no flags	Details \| Diff
Show Obsolete (6) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1272	0	normal	SHIPPED_LIVE	Moderate: kernel security, bug fix, and enhancement update	2015-07-22 11:56:25 UTC

Description Roman Kagan 2014-09-23 16:08:19 UTC

Description of problem:

Due to unsigned arithmetic in timekeeping functions (specifically, timekeeping_get_ns()), kvm clocksource may return time ~1.2 hrs ahead if TSC goes slightly backwards (presumably due to CPU bug).


Version-Release number of selected component (if applicable):

2.6.32-431.20.3.el6
but seems to apply to all RHEL6 and RHEL7 kernels, and Fedora too.


How reproducible:

once in a few days or weeks under cpu load on certain AMD Opteron CPUs


Steps to Reproduce:
1. use a system with AMD Opteron (0x15 family is known to have the problem; maybe others)

2. run RHEL6 in a virtual machine (was observed in Parallels Cloud Server 6; reportedly was also seen in KVM) with kvm clocksource enabled

3. create some CPU load in the guest


Actual results:

Occasionally the guest "hangs" with one VCPU spinning in the timer interrupt handler processing hrtimers. After ~1.2 hrs it resumes normal operation

Expected results:

no hangs

Additional info:

The "hang" was found to be caused by ktime_get_update_offset() returning time ~4398 seconds in the future.  As a result, the hrtimer processing loop in hrtimer_interrupt() didn't terminate until the time caught up.

Now the apparent jump forward of the time returned was due to TSC going slightly backwards and unsigned arithmetic in timekeeping_get_ns().

This has been addressed in Linus' tree, by returning previously saved value if it happens to be bigger than the current one.
I believe those commits need to be backported to RHEL6/RHEL7/Fedora kernels.

Comment 1 Roman Kagan 2014-09-23 16:09:19 UTC

Relevant commits are:


commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108
Author: Thomas Gleixner <tglx>
Date:   Wed Jul 16 21:05:12 2014 +0000

    clocksource: Move cycle_last validation to core code
    
    The only user of the cycle_last validation is the x86 TSC. In order to
    provide NMI safe accessor functions for clock monotonic and
    monotonic_raw we need to do that in the core.
    
    We can't do the TSC specific
    
        if (now < cycle_last)
                    now = cycle_last;
    
    for the other wrapping around clocksources, but TSC has
    CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
    now is less than cycle_last the subtraction will give a negative
    result. So we can check for that in clocksource_delta() and return 0
    for that case.
    
    Implement and enable it for x86
    
    Signed-off-by: Thomas Gleixner <tglx>
    Signed-off-by: John Stultz <john.stultz>

commit 3a97837784acbf9fed699fc04d1799b0eb742fdf
Author: Thomas Gleixner <tglx>
Date:   Wed Jul 16 21:05:10 2014 +0000

    clocksource: Make delta calculation a function
    
    We want to move the TSC sanity check into core code to make NMI safe
    accessors to clock monotonic[_raw] possible. For this we need to
    sanity check the delta calculation. Create a helper function and
    convert all sites to use it.
    
    [ Build fix from jstultz ]
    
    Signed-off-by: Thomas Gleixner <tglx>
    Signed-off-by: John Stultz <john.stultz>

Comment 3 Roman Kagan 2014-09-23 16:22:14 UTC

For the record, the problem was observed on systems with AMD erratum #759 (see
http://support.amd.com/TechDocs/48063_15h_Mod_00h-0Fh_Rev_Guide.pdf, p. 90):

759 One Core May Observe a Time Stamp Counter Skew
==================================================

Description
-----------

During a P-state change or following a C-state change, the processor core may
synchronize an internal copy of the time stamp counter (TSC) incorrectly. The
processor may then observe TSC values (e.g., RDTSC, RDTSCP and RDMSR 0000_0010h
instructions) or MPERF (MSR0000_000E7) values that do not account for the time
spent performing this last P-state or C-state change. This error is normally
temporary in nature, in that it may be resolved after the next P-state or
C-state change.


Potential Effect on System
--------------------------

System software or software with multiple threads may observe that one
thread or processor core provides TSC values that are behind all of the
other threads or processor cores.

While a single thread operating on a single core can not observe
successively stored TSC values that incorrectly decrement, it is
possible that a single thread may be dispatched on one core, where the
software observes a TSC, and is then dispatched by the operating system
on another core that has encountered the conditions of the erratum. In
this sequence of events, the thread may observe a TSC that appears to
decrement.

In addition, software may calculate a higher effective frequency (APERF,
MSR0000_00E8, divided by MPERF).


Suggested Workaround
--------------------

Contact your AMD representative for information on a BIOS update.


Fix Planned
-----------

Yes



According to https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1200533, the latest amd-ucode has a fix to this erratum; however, the bug was seen on systems with that revision of ucode, too.

Comment 4 Roman Kagan 2014-09-23 16:34:27 UTC

[I didn't mean the bug to be private; I'd appreciate if it could be made public]

Comment 5 Prarit Bhargava 2014-09-24 12:25:06 UTC

(In reply to Roman Kagan from comment #4)
> [I didn't mean the bug to be private; I'd appreciate if it could be made
> public]

np.

P.

Comment 6 Prarit Bhargava 2014-09-26 11:29:09 UTC

Created attachment 941540 [details]
RHEL PATCH 1/6

Comment 7 Prarit Bhargava 2014-09-26 11:29:10 UTC

Created attachment 941541 [details]
RHEL PATCH 2/6

Comment 8 Prarit Bhargava 2014-09-26 11:29:11 UTC

Created attachment 941542 [details]
RHEL PATCH 3/6

Comment 9 Prarit Bhargava 2014-09-26 11:29:12 UTC

Created attachment 941543 [details]
RHEL PATCH 4/6

Comment 10 Prarit Bhargava 2014-09-26 11:29:13 UTC

Created attachment 941544 [details]
RHEL PATCH 5/6

Comment 11 Prarit Bhargava 2014-09-26 11:29:15 UTC

Created attachment 941545 [details]
RHEL PATCH 6/6

Comment 13 Prarit Bhargava 2014-10-01 11:45:57 UTC

Sorry everyone, I made the changes for RHEL7 first and accidentally used this BZ.  I'm going to clone this to RHEL7 and POST for RHEL7 from there.

P.

Comment 14 Prarit Bhargava 2014-10-01 11:48:42 UTC

Created attachment 943020 [details]
RHEL PATCH 1/2

Comment 15 Prarit Bhargava 2014-10-01 11:48:43 UTC

Created attachment 943021 [details]
RHEL PATCH 2/2

Comment 16 Prarit Bhargava 2014-10-01 11:48:45 UTC

Created attachment 943022 [details]
RHEL PATCH 3/2

Comment 17 Prarit Bhargava 2014-10-01 11:48:46 UTC

Created attachment 943023 [details]
RHEL PATCH 4/2

Comment 18 Prarit Bhargava 2014-10-01 11:48:47 UTC

Created attachment 943024 [details]
RHEL PATCH 5/2

Comment 19 Prarit Bhargava 2014-10-01 11:48:49 UTC

Created attachment 943025 [details]
RHEL PATCH 6/2

Comment 20 Prarit Bhargava 2014-12-18 19:39:25 UTC

Sorry everyone, I mucked up this BZ pretty badly and am cleaning it up.  I'll push 6.7 patches shortly.

P.

Comment 21 RHEL Program Management 2014-12-18 20:19:58 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 22 Prarit Bhargava 2015-01-02 11:15:34 UTC

Created attachment 975178 [details]
RHEL PATCH 1/2

Comment 23 Prarit Bhargava 2015-01-02 11:15:36 UTC

Created attachment 975179 [details]
RHEL PATCH 2/2

Comment 25 Rafael Aquini 2015-02-17 19:07:39 UTC

Patch(es) available on kernel-2.6.32-532.el6

Comment 31 errata-xmlrpc 2015-07-22 08:21:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html

Note You need to log in before you can comment on or make changes to this bug.