Bug 1662602

Summary: 4.19.13-300.fc29.x86_64 regression
Product: [Fedora] Fedora Reporter: H.J. Lu <hongjiu.lu>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: airlied, bskeggs, ewk, fweimer, hdegoede, ichavero, itamar, jarodwilson, jforbes, jglisse, john.j5live, jonathan, josef, kernel-maint, labbott, linville, mchehab, mjg59, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://lkml.org/lkml/2018/12/30/169
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-09 20:20:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description H.J. Lu 2018-12-30 18:21:28 UTC
I got

FAIL: rt/tst-cputimer1
FAIL: rt/tst-cputimer2
FAIL: rt/tst-cputimer3

with glibc 2.29 tests under kernel 4.19.13-300.fc29.x86_64.  4.19.12-300.fc29.x86_64 is OK.

Comment 1 H.J. Lu 2018-12-30 18:31:00 UTC
Under 4.19.13, I got

futex(0x408228, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_timerid=0x2, si_overrun=0, si_value={int=1514201408, ptr=0x7ffc5a40e140}} ---
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, tv_nsec=805172721}) = 0
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
futex(0x408228, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_3 {si_signo=SIGRT_3, si_code=SI_TIMER, si_timerid=0x3, si_overrun=0, si_value={int=163, ptr=0x33333333000000a3}} ---
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, tv_nsec=905074445}) = 0
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
futex(0x408228, FUTEX_WAIT_PRIVATE, 0, NULL

It never returns.

Comment 2 H.J. Lu 2018-12-31 02:05:29 UTC
This is caused by

ommit 82c8dbb376b9fa9b831c157cbb15664cb4a343e3
Author: Thomas Gleixner <tglx>
Date:   Mon Dec 17 13:31:05 2018 +0100

    posix-timers: Fix division by zero bug
    
    commit 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 upstream.
    
    The signal delivery path of posix-timers can try to rearm the timer even if
    the interval is zero. That's handled for the common case (hrtimer) but not
    for alarm timers. In that case the forwarding function raises a division by
    zero exception.
    
    The handling for hrtimer based posix timers is wrong because it marks the
    timer as active despite the fact that it is stopped.
    
    Move the check from common_hrtimer_rearm() to posixtimer_rearm() to cure
    both issues.

Comment 3 Hans de Goede 2018-12-31 10:55:57 UTC
Hi,

(In reply to H.J. Lu from comment #2)
> This is caused by
> 
> ommit 82c8dbb376b9fa9b831c157cbb15664cb4a343e3
> Author: Thomas Gleixner <tglx>
> Date:   Mon Dec 17 13:31:05 2018 +0100
> 
>     posix-timers: Fix division by zero bug


Thank you for tracking this down, it is probably best if you directly report this problem upstream, by sending a mail directly to Thomas with the relevant mailinglists in the Cc.

Regards,

Hans

Comment 4 H.J. Lu 2018-12-31 13:17:18 UTC
(In reply to Hans de Goede from comment #3)
> Thank you for tracking this down, it is probably best if you directly report
> this problem upstream, by sending a mail directly to Thomas with the
> relevant mailinglists in the Cc.

https://lkml.org/lkml/2018/12/30/169

Comment 5 Justin M. Forbes 2019-01-29 16:13:00 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 4.20.5-200.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 6 H.J. Lu 2019-01-29 16:32:51 UTC
Will be fixed in the next 4.19 kernel by:

Subject: Patch "posix-cpu-timers: Unbreak timer rearming" has been added to the 4.19-stable tree
To: 20190111133500.840117406, gregkh, hjl.tools, john.stultz, peterz, tglx
Cc: <stable-commits.org>
From: <gregkh>
Date: Mon, 28 Jan 2019 17:20:22 +0100
Message-ID: <1548692422159200>
MIME-Version: 1.0
Content-Type: text/plain; charset=ANSI_X3.4-1968
Content-Transfer-Encoding: 8bit
X-stable: commit
X-Patchwork-Hint: ignore


This is a note to let you know that I've just added the patch titled

    posix-cpu-timers: Unbreak timer rearming

to the 4.19-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     posix-cpu-timers-unbreak-timer-rearming.patch
and it can be found in the queue-4.19 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable.org> know about it.


From 93ad0fc088c5b4631f796c995bdd27a082ef33a6 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx>
Date: Fri, 11 Jan 2019 14:33:16 +0100
Subject: posix-cpu-timers: Unbreak timer rearming

From: Thomas Gleixner <tglx>

commit 93ad0fc088c5b4631f796c995bdd27a082ef33a6 upstream.

The recent commit which prevented a division by 0 issue in the alarm timer
code broke posix CPU timers as an unwanted side effect.

The reason is that the common rearm code checks for timer->it_interval
being 0 now. What went unnoticed is that the posix cpu timer setup does not
initialize timer->it_interval as it stores the interval in CPU timer
specific storage. The reason for the separate storage is historical as the
posix CPU timers always had a 64bit nanoseconds representation internally
while timer->it_interval is type ktime_t which used to be a modified
timespec representation on 32bit machines.

Instead of reverting the offending commit and fixing the alarmtimer issue
in the alarmtimer code, store the interval in timer->it_interval at CPU
timer setup time so the common code check works. This also repairs the
existing inconistency of the posix CPU timer code which kept a single shot
timer armed despite of the interval being 0.

The separate storage can be removed in mainline, but that needs to be a
separate commit as the current one has to be backported to stable kernels.

Fixes: 0e334db6bb4b ("posix-timers: Fix division by zero bug")
Reported-by: H.J. Lu <hjl.tools>
Signed-off-by: Thomas Gleixner <tglx>
Cc: John Stultz <john.stultz>
Cc: Peter Zijlstra <peterz>
Cc: stable.org
Link: https://lkml.kernel.org/r/20190111133500.840117406@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh>

---
 kernel/time/posix-cpu-timers.c |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -685,6 +685,7 @@ static int posix_cpu_timer_set(struct k_
 	 * set up the signal and overrun bookkeeping.
 	 */
 	timer->it.cpu.incr = timespec64_to_ns(&new->it_interval);
+	timer->it_interval = ns_to_ktime(timer->it.cpu.incr);
 
 	/*
 	 * This acts as a modification timestamp for the timer,


Patches currently in stable-queue which might be from tglx are

queue-4.19/x86-pkeys-properly-copy-pkey-state-at-fork.patch
queue-4.19/x86-selftests-pkeys-fork-to-check-for-state-being-preserved.patch
queue-4.19/x86-entry-64-compat-fix-stack-switching-for-xen-pv.patch
queue-4.19/net-sun-cassini-cleanup-license-conflict.patch
queue-4.19/kvm-nvmx-do-not-validate-that-posted_intr_desc_addr-is-page-aligned.patch
queue-4.19/x86-kaslr-fix-incorrect-i8254-outb-parameters.patch
queue-4.19/posix-cpu-timers-unbreak-timer-rearming.patch

Comment 7 Justin M. Forbes 2019-01-29 19:44:13 UTC
This is also queued for 4.20.6

Comment 8 Florian Weimer 2019-02-06 11:50:08 UTC
I can confirm that this is fixed in kernel-4.20.6-200.fc29.x86_64.

Comment 9 Laura Abbott 2019-04-09 20:20:38 UTC
Thanks for letting us know