Bug 1389215 - RT kernel panics with dm-multipath enabled
Summary: RT kernel panics with dm-multipath enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.4
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: ---
Assignee: Luis Claudio R. Goncalves
QA Contact: Lin Li
URL:
Whiteboard:
Depends On:
Blocks: 1353018 1400305 1400930
TreeView+ depends on / blocked
 
Reported: 2016-10-27 08:24 UTC by Martin Polednik
Modified: 2017-08-02 00:23 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
: 1400305 1400930 (view as bug list)
Environment:
Last Closed: 2017-08-01 18:57:57 UTC


Attachments (Terms of Use)
boot messages up to manic (9.52 KB, text/plain)
2016-10-27 08:24 UTC, Martin Polednik
no flags Details
multipath configuration (1015 bytes, text/plain)
2016-10-27 08:25 UTC, Martin Polednik
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2077 normal SHIPPED_LIVE Important: kernel-rt security, bug fix, and enhancement update 2017-08-01 18:13:37 UTC
Red Hat Bugzilla 1390920 None None None Never

Internal Links: 1390920

Description Martin Polednik 2016-10-27 08:24:40 UTC
Created attachment 1214505 [details]
boot messages up to manic

Description of problem:
Realtime version of kernel seems to panic right when loading dm-multipath module. I've been able to reproduce this issue on 2 machines different machines (HP DL160g6 & custom built workstation). Seems to fail even with empty multipath.conf file.

Version-Release number of selected component (if applicable):
[root@dev-16 ~]# rpm -qa | grep multipath
device-mapper-multipath-0.4.9-99.el7.x86_64
device-mapper-multipath-libs-0.4.9-99.el7.x86_64
[root@dev-16 ~]# rpm -qa | grep kernel
kernel-rt-kvm-3.10.0-514.rt56.420.el7.x86_64
kernel-rt-3.10.0-514.rt56.420.el7.x86_64

How reproducible:
100 %

Steps to Reproduce:
1. Install kernel-rt & device-mapper-multipath,
2. try to boot with realtime kernel.

Actual results:
Kernel panics.

Expected results:
Kernel boots.

Additional info:
Can be worked around by not loading dm-multipath:

$ cat /etc/modprobe.d/disable-dm-multipath.conf
install dm-multipath /bin/false

Comment 1 Martin Polednik 2016-10-27 08:25:29 UTC
Created attachment 1214506 [details]
multipath configuration

Comment 3 Ben Marzinski 2016-11-15 19:11:29 UTC
Mike, have you see anything like this before?

Just so you know, an empty multipath.conf file doesn't keep multipath from running. It just uses the default configuration. Do you know what devices are multipathed when this happens?

Comment 4 Mike Snitzer 2016-11-30 19:51:23 UTC
(In reply to Ben Marzinski from comment #3)
> Mike, have you see anything like this before?
> 
> Just so you know, an empty multipath.conf file doesn't keep multipath from
> running. It just uses the default configuration. Do you know what devices
> are multipathed when this happens?

Not sure, I'd like to see the BUG_ON at drivers/md/dm-rq.c:797 for this rt kernel source.

Does anyone have the git url for the rt kernel in question?

Comment 5 Mike Snitzer 2016-11-30 20:01:52 UTC
(In reply to Mike Snitzer from comment #4)
> (In reply to Ben Marzinski from comment #3)
> > Mike, have you see anything like this before?
> > 
> > Just so you know, an empty multipath.conf file doesn't keep multipath from
> > running. It just uses the default configuration. Do you know what devices
> > are multipathed when this happens?
> 
> Not sure, I'd like to see the BUG_ON at drivers/md/dm-rq.c:797 for this rt
> kernel source.
> 
> Does anyone have the git url for the rt kernel in question?

Based on the rt kernel's uname, I can just use the 514.el7 source.

So the BUG_ON at drivers/md/dm-rq.c:797 is:

BUG_ON(!irqs_disabled()) in the function dm_old_request_fn().

This speaks to some inherent difference in the rt kernel that breaks the assumption that irqs are enabled inside the block core's old .request_fn() callout.  Need to look closer at this.

But, one potential workaround for this issue is to set the following on the kernel commandline (this will enable dm multipath to use blk-mq): dm_mod.use_blk_mq=Y

This will completely avoid using dm_old_request_fn().

Comment 6 Mike Snitzer 2016-11-30 20:04:27 UTC
(In reply to Mike Snitzer from comment #5)

> Based on the rt kernel's uname, I can just use the 514.el7 source.
> 
> So the BUG_ON at drivers/md/dm-rq.c:797 is:
> 
> BUG_ON(!irqs_disabled()) in the function dm_old_request_fn().
> 
> This speaks to some inherent difference in the rt kernel that breaks the
> assumption that irqs are enabled inside the block core's old .request_fn()
> callout.  Need to look closer at this.

I meant to say: "... that breaks the assumption that irqs are _disabled_ ..."

Comment 7 Mike Snitzer 2016-11-30 20:08:42 UTC
This is the upstream commit that introduced the BUG_ON in question:
052189a2ec ("dm: remove superfluous irq disablement in dm_request_fn")

Comment 8 Clark Williams 2016-11-30 20:27:52 UTC
(In reply to Mike Snitzer from comment #4)
> (In reply to Ben Marzinski from comment #3)
> > Mike, have you see anything like this before?
> > 
> > Just so you know, an empty multipath.conf file doesn't keep multipath from
> > running. It just uses the default configuration. Do you know what devices
> > are multipathed when this happens?
> 
> Not sure, I'd like to see the BUG_ON at drivers/md/dm-rq.c:797 for this rt
> kernel source.
> 
> Does anyone have the git url for the rt kernel in question?

Mike, you can get the source to the RT listed above at:

    git://git.app.eng.bos.redhat.com/kernel-rt.git

and checkout branch rhel-7.3/rt-master

Comment 9 Clark Williams 2016-11-30 20:29:29 UTC
(In reply to Mike Snitzer from comment #6)
> (In reply to Mike Snitzer from comment #5)
> 
> > Based on the rt kernel's uname, I can just use the 514.el7 source.
> > 
> > So the BUG_ON at drivers/md/dm-rq.c:797 is:
> > 
> > BUG_ON(!irqs_disabled()) in the function dm_old_request_fn().
> > 
> > This speaks to some inherent difference in the rt kernel that breaks the
> > assumption that irqs are enabled inside the block core's old .request_fn()
> > callout.  Need to look closer at this.
> 
> I meant to say: "... that breaks the assumption that irqs are _disabled_ ..."

I'll need to look at the code, but if you're depending on the side-effect of a spinlock disabling irqs, then yes that's changed in the RT kernel. Holding a spinlock_t does *not* mean that interrupts are disabled.

Comment 10 Clark Williams 2016-11-30 20:46:35 UTC
Sigh. 

We're missing a patch from the upstream RT patchset:

commit fef016306919048a4db85c43aad09593efb3af4b
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Nov 14 23:06:09 2011 +0100

    dm: Make rt aware
    
    Use the BUG_ON_NORT variant for the irq_disabled() checks. RT has
    interrupts legitimately enabled here as we cant deadlock against the
    irq thread due to the "sleeping spinlocks" conversion.
    
    Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1ca7463e8bb2..ce512e614ef4 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -802,7 +802,7 @@ static void dm_old_request_fn(struct request_queue *q)
                /* Establish tio->ti before queuing work (map_tio_request) */
                tio->ti = ti;
                queue_kthread_work(&md->kworker, &tio->work);
-               BUG_ON(!irqs_disabled());
+               BUG_ON_NONRT(!irqs_disabled());
        }
 }

I think we'll have to put this into 7.4 as well as do a z-stream update.

Comment 22 errata-xmlrpc 2017-08-01 18:57:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077

Comment 23 errata-xmlrpc 2017-08-02 00:23:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077


Note You need to log in before you can comment on or make changes to this bug.