Bug 1368211
| Summary: | RHEL7: device-mapper-multipath fails when removing more than one device. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Rodrigo A B Freire <rfreire> | ||||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Lin Li <lilin> | ||||||||
| Severity: | high | Docs Contact: | Steven J. Levine <slevine> | ||||||||
| Priority: | high | ||||||||||
| Version: | 7.2 | CC: | agk, berrange, bmarzins, dasmith, ealcaniz, eglynn, heinzm, kchamart, loberman, lvm-team, lyarwood, msnitzer, panbalag, prajnoha, rbryant, sbauza, sferdjao, sgordon, srevivo, vromanso, yizhan | ||||||||
| Target Milestone: | rc | Keywords: | FutureFeature | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | device-mapper-multipath-0.4.9-100.el7 | Doc Type: | Enhancement | ||||||||
| Doc Text: |
New "remove retries" multipath configuration value
If a multipath device is temporarily in use when multipath tries to remove it, the remove will fail. It is now possible to control the number of times that the "multipath" command will retry removing a multipath device that is busy by setting the "remove_retries" configuration value. The default value is 0, in which case multipath will not retry failed removes.
|
Story Points: | --- | ||||||||
| Clone Of: | 1368191 | Environment: | |||||||||
| Last Closed: | 2017-08-01 16:34:26 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1368191 | ||||||||||
| Bug Blocks: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Rodrigo A B Freire
2016-08-18 17:04:07 UTC
Human-readable reproducer:
while true
do for MPATH in <WWID 1> <WWID 2> <WWID 3>
do DEVICES=`multipath -l $MPATH | grep runnin | awk '{print substr ($_,6,8)}' `
echo "Flushing: multipath -f $MPATH"
multipath -f $MPATH
RETFIRST=$?
# IF the first multipath -f fails, give it a second and try again
if [ "$RETFIRST" != 0 ]
then echo "First flush failed Returned $RETFIRST Trying again. Sleeping 1 second."
logger "First flush failed Returned $RETFIRST Trying again. Sleeping 1 second."
sleep 1
echo "multipath -f $MPATH"
multipath -f $MPATH
RETSECOND=$?
# IF it fails the second time, throw a error and exit
if [ $RETSECOND != 0 ]
then echo "Second flush failed Returned $RETSECOND."
logger "Second flush failed Returned $RETSECOND."
exit 1
fi
# Codepath for second-try flush.
echo "Second fush success. Returned $RETSECOND"
logger "Second fush success. Returned $RETSECOND"
fi
for DEVICE in $DEVICES
do echo "Deleting: echo 1 > /sys/bus/scsi/drivers/sd/$DEVICE/delete"
echo 1 > /sys/bus/scsi/drivers/sd/$DEVICE/delete
done
done
multipath -ll
sleep 10
rescan-scsi-bus.sh -i
sleep 2
multipath -r
done
Created attachment 1194080 [details]
retry check for opened device up to three times.
Instead of failing immediately if dm says that the device is in-use, multipath will check up to 3 times with a 1 second sleep in between before failing the remove.
Created attachment 1195413 [details]
Updated retry check patch
This version of the patch also, rechecks the number of partitions on each retry, but more importantly, it releases the dm context after each call, since that was keeping us from getting an updated open count.
Like I mentioned on IRC, the problem with this patch is that it makes the common case (where removing a device fails because it is actually in use) slow. In fact, if someone had a large number of devices that were being used, running # multipath -F could take minutes. So, I'd like to make these retries optional, by adding another command option "-R". Adding "-R" to a command would make it retry in cases where the device was in use. Does this sound reasonable? (In reply to Ben Marzinski from comment #10) > Like I mentioned on IRC, the problem with this patch is that it makes the > common case (where removing a device fails because it is actually in use) > slow. In fact, if someone had a large number of devices that were being > used, running > > # multipath -F > > could take minutes. So, I'd like to make these retries optional, by adding > another command option "-R". Adding "-R" to a command would make it retry in > cases where the device was in use. Does this sound reasonable? Make sense, sounds logical. I have no problems with it. Created attachment 1197979 [details]
New versions of the retry patch
This tarball contains the RHEL-7.2 and RHEL-7.3 versions of this patch. I have tested both on the machine that can recreate the issue, and both have run for a day without issues. Since I've been able to understand and avoid the problems that the previous patches were having, I have a high degree of confidence that these will work, but you should still verify them yourself, Rodrigo.
Comment on attachment 1197979 [details]
New versions of the retry patch
Unchecking the isPatch flag, so I can download it!
Controlling the number of remove retries will be done by setting "remove_retries" in the defaults section of /etc/multipath.conf. It will default to zero, which is the current behavior. This bug is too late to make 7.3, but I'm fine with releasing the fix as a zstream. Hi Benjamin! Do you have the upstream post, so I can use it as the binding point for a OpenStack change request? Thanks!! Here's the upstream thread https://www.redhat.com/archives/dm-devel/2016-November/msg00085.html and here's the upstream commit http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=commit;h=4a2b3e75719f90e356408401d3c43210a0b2e111 But you should know that rhel multipath has is not going to sync with upstream again until the next major version of RHEL. There is enough churn going on right now, that even fedora isn't tracking it. Ben: Could you check over the way I summarized the doc text for the release notes for this feature? Looks good. Verified on device-mapper-multipath-0.4.9-111.el7
[root@storageqe-06 ~]# rpm -qa | grep multipath
device-mapper-multipath-0.4.9-111.el7.x86_64
device-mapper-multipath-libs-0.4.9-111.el7.x86_64
# man multipath.conf
remove_retries This sets how may times multipath will retry removing
a device that is in-use. Between each attempt, mul‐
tipath will sleep 1 second. The default is 0
# multipathd show config | grep remove_retries
remove_retries 0 -------->The default value is 0
Edit /etc/multipath.conf to set remove_retries 3
# cat /etc/multipath.conf
defaults {
find_multipaths no
user_friendly_names yes
disable_changed_wwids yes
remove_retries 3 <--------------
}
[root@storageqe-06 ~]# service multipathd reload
Redirecting to /bin/systemctl reload multipathd.service
[root@storageqe-06 ~]# fdisk -l
Disk /dev/sda: 146.8 GB, 146778685440 bytes, 286677120 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000f0150
Device Boot Start End Blocks Id System
/dev/sda1 2048 4095 1024 83 Linux
/dev/sda2 * 4096 1028095 512000 83 Linux
/dev/sda3 1028096 17545215 8258560 82 Linux swap / Solaris
/dev/sda4 17545216 286676991 134565888 5 Extended
/dev/sda5 17547264 286676991 134564864 83 Linux
Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk label type: dos
Disk identifier: 0x0003afef
Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 2099199 1048576 83 Linux
/dev/sdb2 2099200 41943039 19921920 8e Linux LVM
Disk /dev/sdc: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk label type: dos
Disk identifier: 0x000b9755
Device Boot Start End Blocks Id System
/dev/sdc1 2048 4194303 2096128 8e Linux LVM
Disk /dev/mapper/360a98000324669436c2b45666c56786d: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk label type: dos
Disk identifier: 0x0003afef
Device Boot Start End Blocks Id System
/dev/mapper/360a98000324669436c2b45666c56786d1 * 2048 2099199 1048576 83 Linux
/dev/mapper/360a98000324669436c2b45666c56786d2 2099200 41943039 19921920 8e Linux LVM
Disk /dev/sdd: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sde: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sdf: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sdg: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk label type: dos
Disk identifier: 0x0003afef
Device Boot Start End Blocks Id System
/dev/sdg1 * 2048 2099199 1048576 83 Linux
/dev/sdg2 2099200 41943039 19921920 8e Linux LVM
Disk /dev/sdh: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk label type: dos
Disk identifier: 0x000b9755
Device Boot Start End Blocks Id System
/dev/sdh1 2048 4194303 2096128 8e Linux LVM
Disk /dev/sdj: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sdk: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sdl: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk label type: dos
Disk identifier: 0x0003afef
Device Boot Start End Blocks Id System
/dev/sdl1 * 2048 2099199 1048576 83 Linux
/dev/sdl2 2099200 41943039 19921920 8e Linux LVM
Disk /dev/sdm: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk label type: dos
Disk identifier: 0x000b9755
Device Boot Start End Blocks Id System
/dev/sdm1 2048 4194303 2096128 8e Linux LVM
Disk /dev/sdn: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sdo: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sdp: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sdq: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk label type: dos
Disk identifier: 0x0003afef
Device Boot Start End Blocks Id System
/dev/sdq1 * 2048 2099199 1048576 83 Linux
/dev/sdq2 2099200 41943039 19921920 8e Linux LVM
Disk /dev/sdr: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk label type: dos
Disk identifier: 0x000b9755
Device Boot Start End Blocks Id System
/dev/sdr1 2048 4194303 2096128 8e Linux LVM
Disk /dev/sds: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sdt: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/sdu: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/mapper/rhel_storageqe--06-root: 18.2 GB, 18249416704 bytes, 35643392 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/mapper/rhel_storageqe--06-swap: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/mapper/360a98000324669436c2b45666c567873: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk /dev/mapper/360a98000324669436c2b45666c567875: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
[root@storageqe-06 ~]# mkfs.ext3 /dev/mapper/360a98000324669436c2b45666c567875
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=16 blocks
131072 inodes, 524288 blocks
26214 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=536870912
16 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
[root@storageqe-06 ~]# mount /dev/mapper/360a98000324669436c2b45666c567875 /mnt
[root@storageqe-06 ~]#
[root@storageqe-06 ~]# echo $?
0
[root@storageqe-06 ~]# multipath -f /dev/mapper/360a98000324669436c2b45666c567875
Jun 11 21:44:34 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:44:35 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:44:36 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:44:37 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:44:37 | failed to remove multipath map /dev/mapper/360a98000324669436c2b45666c567875
---------------------------->----------------------------->remove_retries set to 3, it will retry 3 times
Edit /etc/multipath.conf to set remove_retries 6
[root@storageqe-06 ~]# cat /etc/multipath.conf
defaults {
find_multipaths no
user_friendly_names yes
disable_changed_wwids yes
remove_retries 6 <--------------
}
[root@storageqe-06 ~]# service multipathd reload
Redirecting to /bin/systemctl reload multipathd.service
[root@storageqe-06 ~]# multipath -f /dev/mapper/360a98000324669436c2b45666c567875
Jun 11 21:48:35 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:48:36 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:48:37 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:48:38 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:48:39 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:48:40 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:48:41 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:48:41 | failed to remove multipath map /dev/mapper/360a98000324669436c2b45666c567875
----------------------------->remove_retries set to 6, it will retry 6 times
Edit /etc/multipath.conf to set remove_retries 0
[root@storageqe-06 ~]# cat /etc/multipath.conf
defaults {
find_multipaths no
user_friendly_names yes
disable_changed_wwids yes
remove_retries 0 <--------------
}
[root@storageqe-06 ~]# service multipathd reload
Redirecting to /bin/systemctl reload multipathd.service
[root@storageqe-06 ~]# multipath -f /dev/mapper/360a98000324669436c2b45666c567875
Jun 11 21:51:34 | /dev/mapper/360a98000324669436c2b45666c567875: map in use
Jun 11 21:51:34 | failed to remove multipath map /dev/mapper/360a98000324669436c2b45666c567875
------------------------------>remove_retries set to 0, it simply fails.
Edit /etc/multipath.conf to set remove_retries 8
[root@storageqe-06 ~]# cat /etc/multipath.conf
defaults {
find_multipaths no
user_friendly_names yes
disable_changed_wwids yes
remove_retries 8 <--------------
}
[root@storageqe-06 ~]# service multipathd reload
Redirecting to /bin/systemctl reload multipathd.service
[root@storageqe-06 ~]# umount /dev/mapper/360a98000324669436c2b45666c567875
[root@storageqe-06 ~]# echo $?
0
[root@storageqe-06 ~]# multipath -f /dev/mapper/360a98000324669436c2b45666c567875
[root@storageqe-06 ~]# echo $?
0
----------------------------------->stops using the multipath device it successfully removes the device.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1961 |