Bug 1183773

Summary: clock_event_device:min_delta_ns can overflow and can never go down
Product: Red Hat Enterprise Linux 6 Reporter: Roman Kagan <rvkagan>
Component: kernelAssignee: Prarit Bhargava <prarit>
kernel sub component: Virtualization QA Contact: Cui Chun <ccui>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: ccui, fgarciad, vvs
Version: 6.6   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-542.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 08:38:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1300182    
Attachments:
Description Flags
RHEL PATCH 1/2
none
RHEL PATCH 2/2 none

Description Roman Kagan 2015-01-19 18:39:33 UTC
Description of problem:

As can be seen in kernel/time/tick-oneshot.c:tick_dev_program_event(), clock_event_device:min_delta_ns, which represents the granularity of the clockevent timer increments, can grow till overflow and can never be reduced.

One possible observable consequence of that is, if it ever overflows, the loop in this function becomes endless, because

      expires = ktime_add_ns(now, dev->min_delta_ns);

gives either negative expires or expires less than now, either of which resulting in error return from clockevents_program_event() which causes the loop to start over.


Version-Release number of selected component (if applicable):

2.6.32-504.3.3.el6.x86_64


How reproducible:

The endless loop on one of the CPUs was seen once in a virtual machine in Parallels Server.  The exact details are still investigated.


Additional info:

These problems have been addressed by the following commits in the mainline kernel:

commit 80a05b9ffa7dc13f6693902dd8999a2b61a3a0d7
Author: Thomas Gleixner <tglx>
Date:   Fri Mar 12 17:34:14 2010 +0100

    clockevents: Sanitize min_delta_ns adjustment and prevent overflows
    
    The current logic which handles clock events programming failures can
    increase min_delta_ns unlimited and even can cause overflows.
    
    Sanitize it by:
     - prevent zero increase when min_delta_ns == 1
     - limiting min_delta_ns to a jiffie
     - bail out if the jiffie limit is hit
     - add retries stats for /proc/timer_list so we can gather data
    
    Reported-by: Uwe Kleine-Koenig <u.kleine-koenig>
    Signed-off-by: Thomas Gleixner <tglx>


commit d1748302f70be7469809809283fe164156a34231
Author: Martin Schwidefsky <schwidefsky.com>
Date:   Tue Aug 23 15:29:42 2011 +0200

    clockevents: Make minimum delay adjustments configurable
    
    The automatic increase of the min_delta_ns of a clockevents device
    should be done in the clockevents code as the minimum delay is an
    attribute of the clockevents device.
    
    In addition not all architectures want the automatic adjustment, on a
    massively virtualized system it can happen that the programming of a
    clock event fails several times in a row because the virtual cpu has
    been rescheduled quickly enough. In that case the minimum delay will
    erroneously be increased with no way back. The new config symbol
    GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic
    adjustment. The config option is selected only for x86.
    
    Signed-off-by: Martin Schwidefsky <schwidefsky.com>
    Cc: john stultz <johnstul.com>
    Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com
    Signed-off-by: Thomas Gleixner <tglx>

Comment 1 Roman Kagan 2015-01-19 18:43:02 UTC
Could you please make this bug public? Thanks!

Comment 3 Prarit Bhargava 2015-01-21 13:05:40 UTC
Hi Roman, I agree that these changes need to be made.  Do you have a reproducer that leads to a bug?

P.

Comment 4 Roman Kagan 2015-01-21 13:37:46 UTC
Unfortunately, no.

We've seen several (very few) sporadic reproductions during routine automated testing of Parallels Server, but have been unable yet to identify the exact scenario of how min_delta_ns can grow up to those pathological values initially.

Comment 5 Prarit Bhargava 2015-01-23 12:41:40 UTC
(In reply to Roman Kagan from comment #4)
> Unfortunately, no.
> 
> We've seen several (very few) sporadic reproductions during routine
> automated testing of Parallels Server, but have been unable yet to identify
> the exact scenario of how min_delta_ns can grow up to those pathological
> values initially.

Okay, I've run this in our testing suite and don't see any issues so I'm going to do some additional testing over the weekend.

P.

Comment 6 Prarit Bhargava 2015-01-26 12:23:29 UTC
Created attachment 984199 [details]
RHEL PATCH 1/2

Comment 7 Prarit Bhargava 2015-01-26 12:23:30 UTC
Created attachment 984200 [details]
RHEL PATCH 2/2

Comment 9 RHEL Program Management 2015-01-26 12:29:54 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 10 Rafael Aquini 2015-03-07 05:37:37 UTC
Patch(es) available on kernel-2.6.32-542.el6

Comment 16 errata-xmlrpc 2015-07-22 08:38:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html

Comment 17 Dr. David Alan Gilbert 2018-11-28 15:12:05 UTC
*** Bug 1538078 has been marked as a duplicate of this bug. ***