Bug 1389215

Summary: RT kernel panics with dm-multipath enabled
Product: Red Hat Enterprise Linux 7 Reporter: Martin Polednik <mpoledni>
Component: kernel-rtAssignee: Luis Claudio R. Goncalves <lgoncalv>
kernel-rt sub component: Multipath QA Contact: Lin Li <lilin>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: high CC: agk, bhu, bmarzins, jkastner, lgoncalv, mpoledni, msnitzer, nsoffer, williams, xni, yizhan
Version: 7.4Keywords: ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
: 1400305 1400930 (view as bug list) Environment:
Last Closed: 2017-08-01 18:57:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1353018, 1400305, 1400930    
Attachments:
Description Flags
boot messages up to manic
none
multipath configuration none

Description Martin Polednik 2016-10-27 08:24:40 UTC
Created attachment 1214505 [details]
boot messages up to manic

Description of problem:
Realtime version of kernel seems to panic right when loading dm-multipath module. I've been able to reproduce this issue on 2 machines different machines (HP DL160g6 & custom built workstation). Seems to fail even with empty multipath.conf file.

Version-Release number of selected component (if applicable):
[root@dev-16 ~]# rpm -qa | grep multipath
device-mapper-multipath-0.4.9-99.el7.x86_64
device-mapper-multipath-libs-0.4.9-99.el7.x86_64
[root@dev-16 ~]# rpm -qa | grep kernel
kernel-rt-kvm-3.10.0-514.rt56.420.el7.x86_64
kernel-rt-3.10.0-514.rt56.420.el7.x86_64

How reproducible:
100 %

Steps to Reproduce:
1. Install kernel-rt & device-mapper-multipath,
2. try to boot with realtime kernel.

Actual results:
Kernel panics.

Expected results:
Kernel boots.

Additional info:
Can be worked around by not loading dm-multipath:

$ cat /etc/modprobe.d/disable-dm-multipath.conf
install dm-multipath /bin/false

Comment 1 Martin Polednik 2016-10-27 08:25:29 UTC
Created attachment 1214506 [details]
multipath configuration

Comment 3 Ben Marzinski 2016-11-15 19:11:29 UTC
Mike, have you see anything like this before?

Just so you know, an empty multipath.conf file doesn't keep multipath from running. It just uses the default configuration. Do you know what devices are multipathed when this happens?

Comment 4 Mike Snitzer 2016-11-30 19:51:23 UTC
(In reply to Ben Marzinski from comment #3)
> Mike, have you see anything like this before?
> 
> Just so you know, an empty multipath.conf file doesn't keep multipath from
> running. It just uses the default configuration. Do you know what devices
> are multipathed when this happens?

Not sure, I'd like to see the BUG_ON at drivers/md/dm-rq.c:797 for this rt kernel source.

Does anyone have the git url for the rt kernel in question?

Comment 5 Mike Snitzer 2016-11-30 20:01:52 UTC
(In reply to Mike Snitzer from comment #4)
> (In reply to Ben Marzinski from comment #3)
> > Mike, have you see anything like this before?
> > 
> > Just so you know, an empty multipath.conf file doesn't keep multipath from
> > running. It just uses the default configuration. Do you know what devices
> > are multipathed when this happens?
> 
> Not sure, I'd like to see the BUG_ON at drivers/md/dm-rq.c:797 for this rt
> kernel source.
> 
> Does anyone have the git url for the rt kernel in question?

Based on the rt kernel's uname, I can just use the 514.el7 source.

So the BUG_ON at drivers/md/dm-rq.c:797 is:

BUG_ON(!irqs_disabled()) in the function dm_old_request_fn().

This speaks to some inherent difference in the rt kernel that breaks the assumption that irqs are enabled inside the block core's old .request_fn() callout.  Need to look closer at this.

But, one potential workaround for this issue is to set the following on the kernel commandline (this will enable dm multipath to use blk-mq): dm_mod.use_blk_mq=Y

This will completely avoid using dm_old_request_fn().

Comment 6 Mike Snitzer 2016-11-30 20:04:27 UTC
(In reply to Mike Snitzer from comment #5)

> Based on the rt kernel's uname, I can just use the 514.el7 source.
> 
> So the BUG_ON at drivers/md/dm-rq.c:797 is:
> 
> BUG_ON(!irqs_disabled()) in the function dm_old_request_fn().
> 
> This speaks to some inherent difference in the rt kernel that breaks the
> assumption that irqs are enabled inside the block core's old .request_fn()
> callout.  Need to look closer at this.

I meant to say: "... that breaks the assumption that irqs are _disabled_ ..."

Comment 7 Mike Snitzer 2016-11-30 20:08:42 UTC
This is the upstream commit that introduced the BUG_ON in question:
052189a2ec ("dm: remove superfluous irq disablement in dm_request_fn")

Comment 8 Clark Williams 2016-11-30 20:27:52 UTC
(In reply to Mike Snitzer from comment #4)
> (In reply to Ben Marzinski from comment #3)
> > Mike, have you see anything like this before?
> > 
> > Just so you know, an empty multipath.conf file doesn't keep multipath from
> > running. It just uses the default configuration. Do you know what devices
> > are multipathed when this happens?
> 
> Not sure, I'd like to see the BUG_ON at drivers/md/dm-rq.c:797 for this rt
> kernel source.
> 
> Does anyone have the git url for the rt kernel in question?

Mike, you can get the source to the RT listed above at:

    git://git.app.eng.bos.redhat.com/kernel-rt.git

and checkout branch rhel-7.3/rt-master

Comment 9 Clark Williams 2016-11-30 20:29:29 UTC
(In reply to Mike Snitzer from comment #6)
> (In reply to Mike Snitzer from comment #5)
> 
> > Based on the rt kernel's uname, I can just use the 514.el7 source.
> > 
> > So the BUG_ON at drivers/md/dm-rq.c:797 is:
> > 
> > BUG_ON(!irqs_disabled()) in the function dm_old_request_fn().
> > 
> > This speaks to some inherent difference in the rt kernel that breaks the
> > assumption that irqs are enabled inside the block core's old .request_fn()
> > callout.  Need to look closer at this.
> 
> I meant to say: "... that breaks the assumption that irqs are _disabled_ ..."

I'll need to look at the code, but if you're depending on the side-effect of a spinlock disabling irqs, then yes that's changed in the RT kernel. Holding a spinlock_t does *not* mean that interrupts are disabled.

Comment 10 Clark Williams 2016-11-30 20:46:35 UTC
Sigh. 

We're missing a patch from the upstream RT patchset:

commit fef016306919048a4db85c43aad09593efb3af4b
Author: Thomas Gleixner <tglx>
Date:   Mon Nov 14 23:06:09 2011 +0100

    dm: Make rt aware
    
    Use the BUG_ON_NORT variant for the irq_disabled() checks. RT has
    interrupts legitimately enabled here as we cant deadlock against the
    irq thread due to the "sleeping spinlocks" conversion.
    
    Reported-by: Luis Claudio R. Goncalves <lclaudio>
    
    Signed-off-by: Thomas Gleixner <tglx>

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1ca7463e8bb2..ce512e614ef4 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -802,7 +802,7 @@ static void dm_old_request_fn(struct request_queue *q)
                /* Establish tio->ti before queuing work (map_tio_request) */
                tio->ti = ti;
                queue_kthread_work(&md->kworker, &tio->work);
-               BUG_ON(!irqs_disabled());
+               BUG_ON_NONRT(!irqs_disabled());
        }
 }

I think we'll have to put this into 7.4 as well as do a z-stream update.

Comment 22 errata-xmlrpc 2017-08-01 18:57:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077

Comment 23 errata-xmlrpc 2017-08-02 00:23:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077