Red Hat Bugzilla – Bug 1183773
clock_event_device:min_delta_ns can overflow and can never go down
Last modified: 2016-01-20 03:09:47 EST
Description of problem: As can be seen in kernel/time/tick-oneshot.c:tick_dev_program_event(), clock_event_device:min_delta_ns, which represents the granularity of the clockevent timer increments, can grow till overflow and can never be reduced. One possible observable consequence of that is, if it ever overflows, the loop in this function becomes endless, because expires = ktime_add_ns(now, dev->min_delta_ns); gives either negative expires or expires less than now, either of which resulting in error return from clockevents_program_event() which causes the loop to start over. Version-Release number of selected component (if applicable): 2.6.32-504.3.3.el6.x86_64 How reproducible: The endless loop on one of the CPUs was seen once in a virtual machine in Parallels Server. The exact details are still investigated. Additional info: These problems have been addressed by the following commits in the mainline kernel: commit 80a05b9ffa7dc13f6693902dd8999a2b61a3a0d7 Author: Thomas Gleixner <tglx@linutronix.de> Date: Fri Mar 12 17:34:14 2010 +0100 clockevents: Sanitize min_delta_ns adjustment and prevent overflows The current logic which handles clock events programming failures can increase min_delta_ns unlimited and even can cause overflows. Sanitize it by: - prevent zero increase when min_delta_ns == 1 - limiting min_delta_ns to a jiffie - bail out if the jiffie limit is hit - add retries stats for /proc/timer_list so we can gather data Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> commit d1748302f70be7469809809283fe164156a34231 Author: Martin Schwidefsky <schwidefsky@de.ibm.com> Date: Tue Aug 23 15:29:42 2011 +0200 clockevents: Make minimum delay adjustments configurable The automatic increase of the min_delta_ns of a clockevents device should be done in the clockevents code as the minimum delay is an attribute of the clockevents device. In addition not all architectures want the automatic adjustment, on a massively virtualized system it can happen that the programming of a clock event fails several times in a row because the virtual cpu has been rescheduled quickly enough. In that case the minimum delay will erroneously be increased with no way back. The new config symbol GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic adjustment. The config option is selected only for x86. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Could you please make this bug public? Thanks!
Hi Roman, I agree that these changes need to be made. Do you have a reproducer that leads to a bug? P.
Unfortunately, no. We've seen several (very few) sporadic reproductions during routine automated testing of Parallels Server, but have been unable yet to identify the exact scenario of how min_delta_ns can grow up to those pathological values initially.
(In reply to Roman Kagan from comment #4) > Unfortunately, no. > > We've seen several (very few) sporadic reproductions during routine > automated testing of Parallels Server, but have been unable yet to identify > the exact scenario of how min_delta_ns can grow up to those pathological > values initially. Okay, I've run this in our testing suite and don't see any issues so I'm going to do some additional testing over the weekend. P.
Created attachment 984199 [details] RHEL PATCH 1/2
Created attachment 984200 [details] RHEL PATCH 2/2
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Patch(es) available on kernel-2.6.32-542.el6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1272.html