Bug 1490673

Summary: Kernel Panic always happen immediately whenever make "debug.panic_on_rcu_stall=1" set on RHEL7.4
Product: Red Hat Enterprise Linux 7 Reporter: Yasuhiro Ozone <yozone>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
kernel sub component: kexec - kdump QA Contact: Qiao Zhao <qzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: haruo.tomita, qzhao, ruyang, suse9linux
Version: 7.4   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-720.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Kernel-3.10.0-693.el7 on RHEL7.4(GA) or later
Last Closed: 2018-04-10 22:02:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vmcore-dmesg.txt none

Description Yasuhiro Ozone 2017-09-12 01:56:39 UTC
Description of problem:
Kernel Panic always happen immediately whenever make "debug.panic_on_rcu_stall=1" set on RHEL7.4

Version-Release number of selected component (if applicable):

Kernel-3.10.0-693.el7 on RHEL7.4(GA) or later

How reproducible:

100%

Steps to Reproduce:
1.
# uname -r
3.10.0-693.1.1.el7.x86_64

2.
# cat /proc/sys/debug/panic_on_rcu_stall 
0

3.
# echo 1 > /proc/sys/debug/panic_on_rcu_stall

Actual results:

kernel panic always happen.

Expected results:

When set to 1, calls panic() after RCU stall detection messages

When 

Additional info:

Comment 2 Yasuhiro Ozone 2017-09-12 02:03:16 UTC
Created attachment 1324651 [details]
vmcore-dmesg.txt

Comment 6 Pratyush Anand 2017-09-14 08:37:34 UTC
It looks like RHEL commit 6cbfcf1526c74681ac81cda667f6f89016b47b8f backported upstream commit wrongly. We should have following diff on top of current RHEL7 code to backport upstream commit 088e9d253d3a4ab7e058dd84bb532c32dadf1882 correctly.

-------------------><--------------
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index f46e3e44c0af..981fb8baf895 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1000,6 +1000,8 @@ static void print_cpu_stall(struct rcu_state *rsp)
                                     3 * rcu_jiffies_till_stall_check() + 3;
        raw_spin_unlock_irqrestore(&rnp->lock, flags);
 
+       panic_on_rcu_stall();
+
        set_need_resched();  /* kick ourselves to get things going. */
 }
 
@@ -1016,8 +1018,6 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
                return;
        j = ACCESS_ONCE(jiffies);
 
-       panic_on_rcu_stall();
-
        /*
         * Lots of memory barriers to reject false positives.
         *
-------------------><--------------

I will prepare a brew build with above diff, I hope that should resolve it.

Comment 7 Yasuhiro Ozone 2017-09-22 02:25:31 UTC
Hi all this is private account test.Please ignore it.

Comment 8 Rafael Aquini 2017-09-30 11:05:08 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 10 Rafael Aquini 2017-10-02 14:22:23 UTC
Patch(es) available on kernel-3.10.0-720.el7

Comment 14 errata-xmlrpc 2018-04-10 22:02:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1062