Bug 1389215
Summary: | RT kernel panics with dm-multipath enabled | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Martin Polednik <mpoledni> | ||||||
Component: | kernel-rt | Assignee: | Luis Claudio R. Goncalves <lgoncalv> | ||||||
kernel-rt sub component: | Multipath | QA Contact: | Lin Li <lilin> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | unspecified | ||||||||
Priority: | high | CC: | agk, bhu, bmarzins, jkastner, lgoncalv, mpoledni, msnitzer, nsoffer, williams, xni, yizhan | ||||||
Version: | 7.4 | Keywords: | ZStream | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||
Doc Text: |
undefined
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1400305 1400930 (view as bug list) | Environment: | |||||||
Last Closed: | 2017-08-01 18:57:57 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1353018, 1400305, 1400930 | ||||||||
Attachments: |
|
Created attachment 1214506 [details]
multipath configuration
Mike, have you see anything like this before? Just so you know, an empty multipath.conf file doesn't keep multipath from running. It just uses the default configuration. Do you know what devices are multipathed when this happens? (In reply to Ben Marzinski from comment #3) > Mike, have you see anything like this before? > > Just so you know, an empty multipath.conf file doesn't keep multipath from > running. It just uses the default configuration. Do you know what devices > are multipathed when this happens? Not sure, I'd like to see the BUG_ON at drivers/md/dm-rq.c:797 for this rt kernel source. Does anyone have the git url for the rt kernel in question? (In reply to Mike Snitzer from comment #4) > (In reply to Ben Marzinski from comment #3) > > Mike, have you see anything like this before? > > > > Just so you know, an empty multipath.conf file doesn't keep multipath from > > running. It just uses the default configuration. Do you know what devices > > are multipathed when this happens? > > Not sure, I'd like to see the BUG_ON at drivers/md/dm-rq.c:797 for this rt > kernel source. > > Does anyone have the git url for the rt kernel in question? Based on the rt kernel's uname, I can just use the 514.el7 source. So the BUG_ON at drivers/md/dm-rq.c:797 is: BUG_ON(!irqs_disabled()) in the function dm_old_request_fn(). This speaks to some inherent difference in the rt kernel that breaks the assumption that irqs are enabled inside the block core's old .request_fn() callout. Need to look closer at this. But, one potential workaround for this issue is to set the following on the kernel commandline (this will enable dm multipath to use blk-mq): dm_mod.use_blk_mq=Y This will completely avoid using dm_old_request_fn(). (In reply to Mike Snitzer from comment #5) > Based on the rt kernel's uname, I can just use the 514.el7 source. > > So the BUG_ON at drivers/md/dm-rq.c:797 is: > > BUG_ON(!irqs_disabled()) in the function dm_old_request_fn(). > > This speaks to some inherent difference in the rt kernel that breaks the > assumption that irqs are enabled inside the block core's old .request_fn() > callout. Need to look closer at this. I meant to say: "... that breaks the assumption that irqs are _disabled_ ..." This is the upstream commit that introduced the BUG_ON in question: 052189a2ec ("dm: remove superfluous irq disablement in dm_request_fn") (In reply to Mike Snitzer from comment #4) > (In reply to Ben Marzinski from comment #3) > > Mike, have you see anything like this before? > > > > Just so you know, an empty multipath.conf file doesn't keep multipath from > > running. It just uses the default configuration. Do you know what devices > > are multipathed when this happens? > > Not sure, I'd like to see the BUG_ON at drivers/md/dm-rq.c:797 for this rt > kernel source. > > Does anyone have the git url for the rt kernel in question? Mike, you can get the source to the RT listed above at: git://git.app.eng.bos.redhat.com/kernel-rt.git and checkout branch rhel-7.3/rt-master (In reply to Mike Snitzer from comment #6) > (In reply to Mike Snitzer from comment #5) > > > Based on the rt kernel's uname, I can just use the 514.el7 source. > > > > So the BUG_ON at drivers/md/dm-rq.c:797 is: > > > > BUG_ON(!irqs_disabled()) in the function dm_old_request_fn(). > > > > This speaks to some inherent difference in the rt kernel that breaks the > > assumption that irqs are enabled inside the block core's old .request_fn() > > callout. Need to look closer at this. > > I meant to say: "... that breaks the assumption that irqs are _disabled_ ..." I'll need to look at the code, but if you're depending on the side-effect of a spinlock disabling irqs, then yes that's changed in the RT kernel. Holding a spinlock_t does *not* mean that interrupts are disabled. Sigh. We're missing a patch from the upstream RT patchset: commit fef016306919048a4db85c43aad09593efb3af4b Author: Thomas Gleixner <tglx> Date: Mon Nov 14 23:06:09 2011 +0100 dm: Make rt aware Use the BUG_ON_NORT variant for the irq_disabled() checks. RT has interrupts legitimately enabled here as we cant deadlock against the irq thread due to the "sleeping spinlocks" conversion. Reported-by: Luis Claudio R. Goncalves <lclaudio> Signed-off-by: Thomas Gleixner <tglx> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 1ca7463e8bb2..ce512e614ef4 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -802,7 +802,7 @@ static void dm_old_request_fn(struct request_queue *q) /* Establish tio->ti before queuing work (map_tio_request) */ tio->ti = ti; queue_kthread_work(&md->kworker, &tio->work); - BUG_ON(!irqs_disabled()); + BUG_ON_NONRT(!irqs_disabled()); } } I think we'll have to put this into 7.4 as well as do a z-stream update. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2077 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2077 |
Created attachment 1214505 [details] boot messages up to manic Description of problem: Realtime version of kernel seems to panic right when loading dm-multipath module. I've been able to reproduce this issue on 2 machines different machines (HP DL160g6 & custom built workstation). Seems to fail even with empty multipath.conf file. Version-Release number of selected component (if applicable): [root@dev-16 ~]# rpm -qa | grep multipath device-mapper-multipath-0.4.9-99.el7.x86_64 device-mapper-multipath-libs-0.4.9-99.el7.x86_64 [root@dev-16 ~]# rpm -qa | grep kernel kernel-rt-kvm-3.10.0-514.rt56.420.el7.x86_64 kernel-rt-3.10.0-514.rt56.420.el7.x86_64 How reproducible: 100 % Steps to Reproduce: 1. Install kernel-rt & device-mapper-multipath, 2. try to boot with realtime kernel. Actual results: Kernel panics. Expected results: Kernel boots. Additional info: Can be worked around by not loading dm-multipath: $ cat /etc/modprobe.d/disable-dm-multipath.conf install dm-multipath /bin/false