Bug 526249

Summary: Multipath remove/add race condition
Product: Red Hat Enterprise Linux 5 Reporter: Oren Held <oren>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Lin Li <lilin>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: agk, bdonahue, bmarzins, bmr, christophe.varoqui, dwysocha, edamato, egoggin, heinzm, iannis, jbrassow, junichi.nomura, kueda, lilin, lmb, orenhe, prockai, tranlan
Target Milestone: rcFlags: oren: needinfo-
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-06 17:17:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
My /var/log/messages when the problem occurs, see wait_for_file() debug messages towards the ending none

Description Oren Held 2009-09-29 13:54:45 UTC
Created attachment 363007 [details]
My /var/log/messages when the problem occurs, see wait_for_file() debug messages towards the ending

Description of problem:
When a new scsi disk is presented to the system, udev (rules.d/40-multipath) calls 'multipath <major>:<minor>', to add a mapping.

If multipath maps are flushed (-F) and recreated at the more-or-less same time, this sometimes leads to a very bad situation:
Multipathd gets the "add map" uevent on dm-X, while /sys/block/dm-X/dev is simply never created. (thus it's not solved by the recently added wait_for_file() function patch)

This leads to losing multipath mappings which should've been created. Manually adding using the multipath command doesn't solve the situation in many cases.


Version-Release number of selected component (if applicable):
Tested on RHEL5.1 -> 5.3 with native kernel/device-mapper-multipath versions, and with latest device-mapper-mmultipath-0.4.7-30, kernel-2.6.18-164.el5


Steps to Reproduce:
0. (Optional) Add the following line to libmultipath/discovery.c inside wait_for_file()'s while loop:
  condlog(0, "wait_for_file %d %s", loop, filename);
1. Run the following evil oneliner: while [ "1" ]; multipath -F; multipath; sleep 1; done
2. Add a SCSI device to the system (I use add-single-device to /proc/scsi/scsi)
3. Stop the while loop after udev rule 40 has finished calling multipath command
4. multipath -l


 
Actual results:
In cases of failure:
- Some multipath mappings would be missing.
- If step 0 was taken, the syslog would print repeatly "multipathd: wait_for_file xxx /sys/block/dm-X/dev" lines. until xxx == 0 (/sys/block/dm-X wouldn't exist even after a long wait)

Expected results:
All multipath maps should be present.


Additional info:
I would gladly provide more info.

Comment 2 Oren Held 2010-04-18 13:20:56 UTC
Maybe it's related to #518575

Comment 3 Ben Marzinski 2010-04-20 14:52:12 UTC
There was definitely a race between multipath and multipathd, that could cause all sorts of problems (bz #506715).  It was fixed. However the fix was in device-mapper-multipath-0.4.7-30.el5, which you say that you tried.  However, with this package applied, you shouldn't have a line in 40-multipath.rules that calls
multipath.  It was commented out.

It is also possible that 518575 has something to do with this, but I can't see what at the moment.

When you tried this with device-mapper-multipath-0.4.7-30.el5, was the following line commented out in 40-multipathd.rules?

# KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"

Comment 4 RHEL Program Management 2014-01-29 10:39:24 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.