Bug 518575

Summary: Multipath UID/GID/MODE settings not always taking effect
Product: Red Hat Enterprise Linux 5 Reporter: Mark Goodwin <mgoodwin>
Component: device-mapper-multipathAssignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.3CC: agk, bmarzins, bmr, bubrown, christophe.varoqui, cjt, cmarthal, dwysocha, edamato, egoggin, emcnabb, heinzm, jbrassow, jbrier, jruemker, junichi.nomura, kueda, lmb, lrosa, lscalabr, mbroz, mgoodwin, mkearey, mnovacek, prajnoha, prockai, riek, rprice, syeghiay, tao, tranlan
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 531131 (view as bug list) Environment:
Last Closed: 2010-03-30 08:30:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 499522, 537761    
Attachments:
Description Flags
Patch to change the multipath udev rules to aviod the race. none

Description Mark Goodwin 2009-08-21 01:49:00 UTC
Description of problem:
Multipath UID/GID/MODE settings are not always taking effect. This appears
to be a race between udevd and multipathd. The reproducer script no longer
triggers if /etc/udev/rules.d/40-multipath.rules is simplified,  (e.g. use
the stripped down version from Fedora 10 or 11). This may be a udev bug or
a multipath bug (or both).

Version-Release number of selected component (if applicable):
Observed on RHEL5.3 but any version of RHEL seems to be affected.

How reproducible:
always

Steps to Reproduce:

1. configure a simple multipath.conf, e.g. 
multipaths {
        multipath {
                wwid    some_wwid
                alias   somename
                mode    0644
                uid     201
                gid     201
        }
}

2. create a new user/group 'disktest' with uid and gui = 201

3. run the following script (originally by bmr) :
   while true; do
	   multipath -F
	   multipath -v0
	   ls -l /dev/mapper | grep -q disktest || break
	   date
   done
   ls -l /dev/mapper
  
Actual results:
script breaks out of the loop (nothing owned by user 'disktest').
We also observed that udevd was 'spinning' for several minutes
after the script triggered.

Expected results:
script runs for-ever and udevd keeps up.

Additional info:
udevd device node creation with default uid/gid/mode seems to be racing
with multipathd's chmod/chown and hence the requested uid/gid/mode are
sometimes either not applied or lost, but we haven't positively identified
the actual root cause.

Thanks
-- Mark Goodwin

Comment 6 Ben Marzinski 2009-09-08 20:04:43 UTC
I'm reassigning this to device-mapper.  I can still hit this with the line removed from 40-multipath.rules.  In fact, I tried removing 40-multipath.rules completely.  Then I blacklisted all multipath devices. Lastly, I changed the test
script to.

#!/bin/bash
while true; do
        dmsetup remove foo
        dmsetup create foo --uid 201 --gid 201 --mode 0644 --table "0 1 linear 8
:48 0"
        ls -l /dev/mapper | grep -q disktest || break
        date
done
ls -l /dev/mapper

Which just creates a simple linear device.  Running this causes the exact same issue, where occassionally, the uid, gid, and mode aren't correctly set.

Comment 11 Ben Marzinski 2009-09-21 20:58:19 UTC
Here's the odd thing about this.  When this is happening, what's going on is that _add_dev_node() doesn't create the device node with the correct UID, GID, and mode because the device node already exists. And evidently, it already exists with the wrong UID, GID, and mode.  Since the test does this in a loop, the last time that device node was created by _add_dev_node() it had the correct UID, etc. So is seems that the old device node got deleted, and something other than _add_dev_node() created a new device node in its place, but with the wrong attributes.  I'm just not sure what we're racing against. udev?

Comment 12 Ben Marzinski 2009-09-21 23:31:10 UTC
It looks like the problem might actually no be a race on create, but that dmsetup remove isn't always removing the device node.

Comment 13 Ben Marzinski 2009-09-22 06:25:38 UTC
It seems that the reason I can still see this issue with that 40-multipath.rules line remove is because I had some forgotten udev rules to create raw devices on top of multipath devices in /etc/udev/rules.d/60-raw.rules.  Specificially, If I also remove the line

PROGRAM!="/sbin/dmsetup ls --exec /bin/basename -j %M -m %m", GOTO="skip_dm"

that I had in 60-raw.rules, I'm fine.  It looks like what is happening in my test, is that the device node is getting removed when I remove the dm device. However, for some reason, another dmsetup call is creating it, but without the proper attributes (and without the actual device-mapper device that this device node is supposed to access. In other words /dev/mapper/foo is there, but dmsetup doesn't show any device named foo).

I wonder of the udev rule is racing with the remove call in my script.  For instance, the udev call to dmsetup sees that the dm device foo exists, then my script removes it, then the udev call sees that foo has no device node, and helpfully creates one. I'll look and see if that's possible in the morning.

Comment 14 Mark Goodwin 2009-09-22 12:01:28 UTC
Just wanted to report that I can still reproduce this, even
with device-mapper-multipath-0.4.7-30.el5. I'm using an x86_64
KVM running RHEL5.4 2.6.18-164.el5 with one scsi disk (sda).

/etc/udev/rules.d/40-multipath.rules is installed unchanged
i.e. the KERNEL line that runs multipath is commented out,
exactly as shipped in device-mapper-multipath-0.4.7-30.el5.

My simple multipath.conf:
	blacklist {
		devnode "*"
	}

	blacklist_exceptions {
		devnode "^sda"
	}

	defaults {
		user_friendly_names yes
	}

	multipaths {
		multipath {
			wwid 0QEMU_QEMU_HARDDISK_scsi0-hd0
			alias marypoppins
			uid 200
			gid 200
		}
	}

Reproducer script (needs user 'disktest' with uid=200)
	#! /bin/bash
	d=/dev/mapper/marypoppins

	udevcontrol reload_rules
	sleep 1
	n=0
	while true; do
		multipath -F
		multipath -v0
		date +%s.%N
		ls -l $d | grep -q disktest || break
		n=$(($n + 1))
	done
	echo triggered after $n iterations 


Tail of output from the script :
	1253609662.091550000
	1253609662.129400000
	1253609662.166118000
	1253609662.213284000
	1253609662.250868000
	1253609662.287375000
	1253609662.726043000
	1253609662.764775000
	triggered after 450 iterations

Snippet from udevmonitor, running in a separate terminal :
	UEVENT[1253609662.705175] remove@/block/dm-2
	UEVENT[1253609662.718532] add@/block/dm-2
	UEVENT[1253609662.719137] change@/block/dm-2
	UEVENT[1253609662.743388] remove@/block/dm-2
	UEVENT[1253609662.757561] add@/block/dm-2
	UEVENT[1253609662.758167] change@/block/dm-2
	UDEV  [1253609662.821322] change@/block/dm-2
	UDEV  [1253609662.853317] remove@/block/dm-2
	UDEV  [1253609663.038904] change@/block/dm-2
	UDEV  [1253609663.070454] remove@/block/dm-2
	UDEV  [1253609663.142617] change@/block/dm-2
	UDEV  [1253609663.173532] remove@/block/dm-2

Notice in the output from udevmonitor, the transition from kernel
UEVENT reporting to UDEV closely correlates with the time the script
triggers the bug. This is 100% reproducible, but I've no idea if the
observation is significant!? Smells like a lost event.

Also, if I comment out line 8 in 40-multipath.rules then the script
runs forever but I don't get any /dev/mpath devices created.

Thanks
-- Mark Goodwin

Comment 19 Ben Marzinski 2009-09-24 18:28:21 UTC
Created attachment 362548 [details]
Patch to change the multipath udev rules to aviod the race.

When 'dmsetup ls --exec' is run, dmsetup makes sure that the device node exists before execing the program, which is a perfectly sensible thing to do. However, that means if a uevent comes in on a device you are removing, or possibly even creating, any udev rule that calls 'dmsetup ls --exec' has a chance of creating the device node, and it will use the default permissions to do so.  The easiest way to fix this is to not call 'dmsetup ls --exec' in udev rules, since removes and creates trigger uevents.  This patch replaces these calls from the multipath udev rules, which is the only place where they were.

Comment 20 Ben Marzinski 2009-09-24 18:52:00 UTC
Actually, this isn't completely right.  It almost always works. Sometimes the remove will fail with

<dev>: map in use

I assume this happens because the create hasn't finished yet.  When this happens, you run the risk of not only the device nodes getting created with the wrong attributes, but also having the udev rules run in an endless loop.. I'm not sure why, but I'm looking into it.

Comment 21 Mark Goodwin 2009-09-25 04:40:31 UTC
(In reply to comment #19)
> Created an attachment (id=362548) [details]
> Patch to change the multipath udev rules to aviod the race.

Hi Ben, that all makes sense. I tested it and it seems to work, until as
you observed, the map removal fails with "map in use" and the
udev worker goes off into a spin (grows in memory too did you notice?).

Getting very close to a solution now I think!

Your patch is missing a comma between NAME and SYMLINK on the line
for the second dmsetup. Not sure if that matters to the parser.

Cheers
-- Mark

Comment 22 Ben Marzinski 2009-09-25 23:16:08 UTC
I take back all my objections to my patch.  The reason why the device nodes got created with the wrong partitions with my patch applied was a completely unrelated issue (I still had a dmsetup command in my udev rules that called "dmsetup ls --exec". I was using it for debugging earlier).

Also, the infinite udev looping doesn't have anything to do with the

<dev>: map in use

messages (which are simply caused by unloading while udev is still creating the partitions), and it's not infinite.  If run the test for a couple of seconds, and then stop it, you will see the udev looping stop in a minute or so.  If you let the test run for a long time, you will see the looping stop eventually, but it takes a long time.  If you add a call to udevsettle to the script after you check 
that all the devices have the correct attributes, when you stop the script, there are no more udev events.

Comment 23 Ben Marzinski 2009-09-28 19:40:16 UTC
Does this seem to work like I mentioned in comment #22? If so, I'll commit the udev rule fixes.

Comment 24 Mark Goodwin 2009-09-29 13:30:33 UTC
(In reply to comment #23)
> Does this seem to work like I mentioned in comment #22? If so, I'll commit the
> udev rule fixes.  

Yes I tested it and it seems to work. Did you fix the missing comma?
Also, what about the commented out rule that runs multipath? If we're
not keeping that rule, then should the line be deleted altogether?

Cheers
-- Mark Goodwin

Comment 25 Ben Marzinski 2009-09-29 17:53:00 UTC
Yes, I fixed the missing comma. Nobody has complained about commenting out that line, so it's probably safe to go, but since it does change multipath behaviour from what it used to be, and it's not hurting anything being commented out, I'd rather leave it.

Comment 27 Issue Tracker 2009-10-26 08:22:32 UTC
Event posted on 10-26-2009 07:22pm EST by mgoodwin

Hi Ryan, I was working on the RHEL4 back-port today. Well, it can hardly
be
called a "back-port" since it's completely different, though the root
cause
is probably the same.

I concentrated on eliminating "dmsetup ls ..." from the udev exec path
since this is causing all the problems on RHEL5. This boiled down to
hacking the RHEL4 scripts /sbin/{mpath_get_name,kpartx_get_name}

My current patch is :

--- /sbin/mpath_get_name.orig	2009-10-26 03:49:11.000000000 +1100
+++ /sbin/mpath_get_name	2009-10-26 06:49:56.000000000 +1100
@@ -19,7 +19,8 @@ if [ $ret -ne 0 ] ; then
 	exit 1
 fi
 
-i=`/sbin/dmsetup ls --target multipath --exec basename -j $1 -m $2 2>
/dev/null`
+# i=`/sbin/dmsetup ls --target multipath --exec basename -j $1 -m $2 2>
/dev/null`
+i=`dmsetup info -c --noheadings -o name -j $1 -m $2 2>/dev/null`


... but the problem is still reproducible. More tomorrow ..

Cheers
-- Mark Goodwin



This event sent from IssueTracker by mgoodwin 
 issue 304239

Comment 30 Corey Marthaler 2009-11-09 19:32:22 UTC
Shouldn't this bug be filed against d-m-multipath and not d-m?

Comment 31 Mark Goodwin 2009-11-13 04:02:36 UTC
(In reply to comment #30)
> Shouldn't this bug be filed against d-m-multipath and not d-m?  

maybe, but then Ben's reproducer only uses dmsetup, not multipath.
See Comment #6.

-- Mark

Comment 32 Ben Marzinski 2009-11-13 23:03:23 UTC
But it was device-mapper-multipath that was modified for the fix.

Comment 36 michal novacek 2010-01-07 17:36:08 UTC
Verified the fix based on comment #14

# rpm -q device-mapper-multipath
	device-mapper-multipath-0.4.7-23.el5

# cat /etc/multipath.conf 
	blacklist {
			devnode "*"
	}
	blacklist_exceptions {
		devnode "^sda"
	}
	defaults {
		user_friendly_names yes
	}
	multipaths {
		multipath {
			wwid 353333330000007d0
			alias timmay
			uid 200
			gid 200
		}
	}

# grep disktest /etc/passwd /etc/group
	/etc/passwd:disktest:x:200:200::/home/disktest:/bin/bash
	/etc/group:disktest:x:200:

# service multipathd restart
	Stopping multipathd daemon:                                [  OK  ]
	Starting multipathd daemon:                                [  OK  ]
# ls -l /dev/mapper/timmay 
	brw-rw---- 1 disktest disktest 252, 2 Jan  7 17:43 /dev/mapper/timmay

# cat check.sh 
	#!/bin/bash -x
	d=/dev/mapper/timmay

	udevcontrol reload_rules
	sleep 1
	n=0
	time while true; do
		multipath -F
		multipath -v0
		ls -l $d | grep -q disktest || break
		n=$(($n + 1))
	done
	echo triggered after $n iterations 

# ./check.sh 
	real	0m14.711s
	user	0m9.437s
	sys	0m3.047s
	triggered after 93 iterations
repeted multiple times, alwas had a hit in under one minute.

----
next I took fixed device-mapper-version.

# rpm -q device-mapper-multipath
	device-mapper-multipath-0.4.7-31.el5

# service multipathd restart
	Stopping multipathd daemon:                                [  OK  ]
	Starting multipathd daemon:                                [  OK  ]

# ls -l /dev/mapper/timmay 
	brw-rw---- 1 disktest disktest 252, 2 Jan  7 18:23 /dev/mapper/timmay

# time ./check.sh
	real	6m44.860s
	user	4m30.006s
	sys	1m6.954s

canceled after 6.44 without hit.5

# cat /etc/multipath.conf 
	blacklist {
			devnode "*"
	}
	blacklist_exceptions {
		devnode "^sda"
	}
	defaults {
		user_friendly_names yes
	}
	multipaths {
		multipath {
			wwid 353333330000007d0
			alias timmay
			uid 200
			gid 200
		}
	}

# grep disktest /etc/passwd /etc/group
	/etc/passwd:disktest:x:200:200::/home/disktest:/bin/bash
	/etc/group:disktest:x:200:

# service multipathd restart
	Stopping multipathd daemon:                                [  OK  ]
	Starting multipathd daemon:                                [  OK  ]
# ls -l /dev/mapper/timmay 
	brw-rw---- 1 disktest disktest 252, 2 Jan  7 17:43 /dev/mapper/timmay

# cat check.sh 
	#!/bin/bash -x
	d=/dev/mapper/timmay

	udevcontrol reload_rules
	sleep 1
	n=0
	time while true; do
		multipath -F
		multipath -v0
		ls -l $d | grep -q disktest || break
		n=$(($n + 1))
	done
	echo triggered after $n iterations 

# ./check.sh 
	real	0m14.711s
	user	0m9.437s
	sys	0m3.047s
	triggered after 93 iterations
repeted multiple times, alwas had a hit in under one minute.

----
next I took fixed device-mapper-version.

# rpm -q device-mapper-multipath
	device-mapper-multipath-0.4.7-31.el5

# service multipathd restart
	Stopping multipathd daemon:                                [  OK  ]
	Starting multipathd daemon:                                [  OK  ]

# ls -l /dev/mapper/timmay 
	brw-rw---- 1 disktest disktest 252, 2 Jan  7 18:23 /dev/mapper/timmay

# time ./check.sh
	real	6m44.860s
	user	4m30.006s
	sys	1m6.954s

canceled after 6.44 without positive hit.

Comment 39 Leandro A. Rosa 2010-02-16 15:34:48 UTC
Hello all, the fix is available?, errata number?

Thanks.

Comment 42 errata-xmlrpc 2010-03-30 08:30:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0255.html