Description of problem: Multipath UID/GID/MODE settings are not always taking effect. This appears to be a race between udevd and multipathd. The reproducer script no longer triggers if /etc/udev/rules.d/40-multipath.rules is simplified, (e.g. use the stripped down version from Fedora 10 or 11). This may be a udev bug or a multipath bug (or both). Version-Release number of selected component (if applicable): Observed on RHEL5.3 but any version of RHEL seems to be affected. How reproducible: always Steps to Reproduce: 1. configure a simple multipath.conf, e.g. multipaths { multipath { wwid some_wwid alias somename mode 0644 uid 201 gid 201 } } 2. create a new user/group 'disktest' with uid and gui = 201 3. run the following script (originally by bmr) : while true; do multipath -F multipath -v0 ls -l /dev/mapper | grep -q disktest || break date done ls -l /dev/mapper Actual results: script breaks out of the loop (nothing owned by user 'disktest'). We also observed that udevd was 'spinning' for several minutes after the script triggered. Expected results: script runs for-ever and udevd keeps up. Additional info: udevd device node creation with default uid/gid/mode seems to be racing with multipathd's chmod/chown and hence the requested uid/gid/mode are sometimes either not applied or lost, but we haven't positively identified the actual root cause. Thanks -- Mark Goodwin
I'm reassigning this to device-mapper. I can still hit this with the line removed from 40-multipath.rules. In fact, I tried removing 40-multipath.rules completely. Then I blacklisted all multipath devices. Lastly, I changed the test script to. #!/bin/bash while true; do dmsetup remove foo dmsetup create foo --uid 201 --gid 201 --mode 0644 --table "0 1 linear 8 :48 0" ls -l /dev/mapper | grep -q disktest || break date done ls -l /dev/mapper Which just creates a simple linear device. Running this causes the exact same issue, where occassionally, the uid, gid, and mode aren't correctly set.
Here's the odd thing about this. When this is happening, what's going on is that _add_dev_node() doesn't create the device node with the correct UID, GID, and mode because the device node already exists. And evidently, it already exists with the wrong UID, GID, and mode. Since the test does this in a loop, the last time that device node was created by _add_dev_node() it had the correct UID, etc. So is seems that the old device node got deleted, and something other than _add_dev_node() created a new device node in its place, but with the wrong attributes. I'm just not sure what we're racing against. udev?
It looks like the problem might actually no be a race on create, but that dmsetup remove isn't always removing the device node.
It seems that the reason I can still see this issue with that 40-multipath.rules line remove is because I had some forgotten udev rules to create raw devices on top of multipath devices in /etc/udev/rules.d/60-raw.rules. Specificially, If I also remove the line PROGRAM!="/sbin/dmsetup ls --exec /bin/basename -j %M -m %m", GOTO="skip_dm" that I had in 60-raw.rules, I'm fine. It looks like what is happening in my test, is that the device node is getting removed when I remove the dm device. However, for some reason, another dmsetup call is creating it, but without the proper attributes (and without the actual device-mapper device that this device node is supposed to access. In other words /dev/mapper/foo is there, but dmsetup doesn't show any device named foo). I wonder of the udev rule is racing with the remove call in my script. For instance, the udev call to dmsetup sees that the dm device foo exists, then my script removes it, then the udev call sees that foo has no device node, and helpfully creates one. I'll look and see if that's possible in the morning.
Just wanted to report that I can still reproduce this, even with device-mapper-multipath-0.4.7-30.el5. I'm using an x86_64 KVM running RHEL5.4 2.6.18-164.el5 with one scsi disk (sda). /etc/udev/rules.d/40-multipath.rules is installed unchanged i.e. the KERNEL line that runs multipath is commented out, exactly as shipped in device-mapper-multipath-0.4.7-30.el5. My simple multipath.conf: blacklist { devnode "*" } blacklist_exceptions { devnode "^sda" } defaults { user_friendly_names yes } multipaths { multipath { wwid 0QEMU_QEMU_HARDDISK_scsi0-hd0 alias marypoppins uid 200 gid 200 } } Reproducer script (needs user 'disktest' with uid=200) #! /bin/bash d=/dev/mapper/marypoppins udevcontrol reload_rules sleep 1 n=0 while true; do multipath -F multipath -v0 date +%s.%N ls -l $d | grep -q disktest || break n=$(($n + 1)) done echo triggered after $n iterations Tail of output from the script : 1253609662.091550000 1253609662.129400000 1253609662.166118000 1253609662.213284000 1253609662.250868000 1253609662.287375000 1253609662.726043000 1253609662.764775000 triggered after 450 iterations Snippet from udevmonitor, running in a separate terminal : UEVENT[1253609662.705175] remove@/block/dm-2 UEVENT[1253609662.718532] add@/block/dm-2 UEVENT[1253609662.719137] change@/block/dm-2 UEVENT[1253609662.743388] remove@/block/dm-2 UEVENT[1253609662.757561] add@/block/dm-2 UEVENT[1253609662.758167] change@/block/dm-2 UDEV [1253609662.821322] change@/block/dm-2 UDEV [1253609662.853317] remove@/block/dm-2 UDEV [1253609663.038904] change@/block/dm-2 UDEV [1253609663.070454] remove@/block/dm-2 UDEV [1253609663.142617] change@/block/dm-2 UDEV [1253609663.173532] remove@/block/dm-2 Notice in the output from udevmonitor, the transition from kernel UEVENT reporting to UDEV closely correlates with the time the script triggers the bug. This is 100% reproducible, but I've no idea if the observation is significant!? Smells like a lost event. Also, if I comment out line 8 in 40-multipath.rules then the script runs forever but I don't get any /dev/mpath devices created. Thanks -- Mark Goodwin
Created attachment 362548 [details] Patch to change the multipath udev rules to aviod the race. When 'dmsetup ls --exec' is run, dmsetup makes sure that the device node exists before execing the program, which is a perfectly sensible thing to do. However, that means if a uevent comes in on a device you are removing, or possibly even creating, any udev rule that calls 'dmsetup ls --exec' has a chance of creating the device node, and it will use the default permissions to do so. The easiest way to fix this is to not call 'dmsetup ls --exec' in udev rules, since removes and creates trigger uevents. This patch replaces these calls from the multipath udev rules, which is the only place where they were.
Actually, this isn't completely right. It almost always works. Sometimes the remove will fail with <dev>: map in use I assume this happens because the create hasn't finished yet. When this happens, you run the risk of not only the device nodes getting created with the wrong attributes, but also having the udev rules run in an endless loop.. I'm not sure why, but I'm looking into it.
(In reply to comment #19) > Created an attachment (id=362548) [details] > Patch to change the multipath udev rules to aviod the race. Hi Ben, that all makes sense. I tested it and it seems to work, until as you observed, the map removal fails with "map in use" and the udev worker goes off into a spin (grows in memory too did you notice?). Getting very close to a solution now I think! Your patch is missing a comma between NAME and SYMLINK on the line for the second dmsetup. Not sure if that matters to the parser. Cheers -- Mark
I take back all my objections to my patch. The reason why the device nodes got created with the wrong partitions with my patch applied was a completely unrelated issue (I still had a dmsetup command in my udev rules that called "dmsetup ls --exec". I was using it for debugging earlier). Also, the infinite udev looping doesn't have anything to do with the <dev>: map in use messages (which are simply caused by unloading while udev is still creating the partitions), and it's not infinite. If run the test for a couple of seconds, and then stop it, you will see the udev looping stop in a minute or so. If you let the test run for a long time, you will see the looping stop eventually, but it takes a long time. If you add a call to udevsettle to the script after you check that all the devices have the correct attributes, when you stop the script, there are no more udev events.
Does this seem to work like I mentioned in comment #22? If so, I'll commit the udev rule fixes.
(In reply to comment #23) > Does this seem to work like I mentioned in comment #22? If so, I'll commit the > udev rule fixes. Yes I tested it and it seems to work. Did you fix the missing comma? Also, what about the commented out rule that runs multipath? If we're not keeping that rule, then should the line be deleted altogether? Cheers -- Mark Goodwin
Yes, I fixed the missing comma. Nobody has complained about commenting out that line, so it's probably safe to go, but since it does change multipath behaviour from what it used to be, and it's not hurting anything being commented out, I'd rather leave it.
Event posted on 10-26-2009 07:22pm EST by mgoodwin Hi Ryan, I was working on the RHEL4 back-port today. Well, it can hardly be called a "back-port" since it's completely different, though the root cause is probably the same. I concentrated on eliminating "dmsetup ls ..." from the udev exec path since this is causing all the problems on RHEL5. This boiled down to hacking the RHEL4 scripts /sbin/{mpath_get_name,kpartx_get_name} My current patch is : --- /sbin/mpath_get_name.orig 2009-10-26 03:49:11.000000000 +1100 +++ /sbin/mpath_get_name 2009-10-26 06:49:56.000000000 +1100 @@ -19,7 +19,8 @@ if [ $ret -ne 0 ] ; then exit 1 fi -i=`/sbin/dmsetup ls --target multipath --exec basename -j $1 -m $2 2> /dev/null` +# i=`/sbin/dmsetup ls --target multipath --exec basename -j $1 -m $2 2> /dev/null` +i=`dmsetup info -c --noheadings -o name -j $1 -m $2 2>/dev/null` ... but the problem is still reproducible. More tomorrow .. Cheers -- Mark Goodwin This event sent from IssueTracker by mgoodwin issue 304239
Shouldn't this bug be filed against d-m-multipath and not d-m?
(In reply to comment #30) > Shouldn't this bug be filed against d-m-multipath and not d-m? maybe, but then Ben's reproducer only uses dmsetup, not multipath. See Comment #6. -- Mark
But it was device-mapper-multipath that was modified for the fix.
Verified the fix based on comment #14 # rpm -q device-mapper-multipath device-mapper-multipath-0.4.7-23.el5 # cat /etc/multipath.conf blacklist { devnode "*" } blacklist_exceptions { devnode "^sda" } defaults { user_friendly_names yes } multipaths { multipath { wwid 353333330000007d0 alias timmay uid 200 gid 200 } } # grep disktest /etc/passwd /etc/group /etc/passwd:disktest:x:200:200::/home/disktest:/bin/bash /etc/group:disktest:x:200: # service multipathd restart Stopping multipathd daemon: [ OK ] Starting multipathd daemon: [ OK ] # ls -l /dev/mapper/timmay brw-rw---- 1 disktest disktest 252, 2 Jan 7 17:43 /dev/mapper/timmay # cat check.sh #!/bin/bash -x d=/dev/mapper/timmay udevcontrol reload_rules sleep 1 n=0 time while true; do multipath -F multipath -v0 ls -l $d | grep -q disktest || break n=$(($n + 1)) done echo triggered after $n iterations # ./check.sh real 0m14.711s user 0m9.437s sys 0m3.047s triggered after 93 iterations repeted multiple times, alwas had a hit in under one minute. ---- next I took fixed device-mapper-version. # rpm -q device-mapper-multipath device-mapper-multipath-0.4.7-31.el5 # service multipathd restart Stopping multipathd daemon: [ OK ] Starting multipathd daemon: [ OK ] # ls -l /dev/mapper/timmay brw-rw---- 1 disktest disktest 252, 2 Jan 7 18:23 /dev/mapper/timmay # time ./check.sh real 6m44.860s user 4m30.006s sys 1m6.954s canceled after 6.44 without hit.5 # cat /etc/multipath.conf blacklist { devnode "*" } blacklist_exceptions { devnode "^sda" } defaults { user_friendly_names yes } multipaths { multipath { wwid 353333330000007d0 alias timmay uid 200 gid 200 } } # grep disktest /etc/passwd /etc/group /etc/passwd:disktest:x:200:200::/home/disktest:/bin/bash /etc/group:disktest:x:200: # service multipathd restart Stopping multipathd daemon: [ OK ] Starting multipathd daemon: [ OK ] # ls -l /dev/mapper/timmay brw-rw---- 1 disktest disktest 252, 2 Jan 7 17:43 /dev/mapper/timmay # cat check.sh #!/bin/bash -x d=/dev/mapper/timmay udevcontrol reload_rules sleep 1 n=0 time while true; do multipath -F multipath -v0 ls -l $d | grep -q disktest || break n=$(($n + 1)) done echo triggered after $n iterations # ./check.sh real 0m14.711s user 0m9.437s sys 0m3.047s triggered after 93 iterations repeted multiple times, alwas had a hit in under one minute. ---- next I took fixed device-mapper-version. # rpm -q device-mapper-multipath device-mapper-multipath-0.4.7-31.el5 # service multipathd restart Stopping multipathd daemon: [ OK ] Starting multipathd daemon: [ OK ] # ls -l /dev/mapper/timmay brw-rw---- 1 disktest disktest 252, 2 Jan 7 18:23 /dev/mapper/timmay # time ./check.sh real 6m44.860s user 4m30.006s sys 1m6.954s canceled after 6.44 without positive hit.
Hello all, the fix is available?, errata number? Thanks.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0255.html