Created attachment 375284 [details] The backported patch We need to backport a fix for this bug: http://www.linux-archive.org/device-mapper-development/275283-deadlock-bug-kernel-side-device-mapper-code.html The fix is currently at ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-avoid-_hash_lock-deadlock.patch
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-179.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
What are the steps to reproduce this problem?
<mikulas> coughlan: this is a race condition, when failing a path and removing the mpath device simultaneously causes the deadlock. <mikulas> coughlan: if you want to reliably reproduce the race, you must artifically delay the thread that submits the events. <mikulas> coughlan: I would recommend adding a msleep(10000) (ten seconds) at the beginning of trigger_event function. <mikulas> then, fail the path and within 10 seconds, remove the device. I think that should trigger the bug. <mikulas> (note, by removing the device, I mean "dmsetup remove <device>") <mikulas> if it doesn't work (something during device removal uses the same event thread), another option how to reproduce it would be to remove schedule_work(&m->trigger_event); from dm-mpath.c
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html
*** Bug 555266 has been marked as a duplicate of this bug. ***