Bug 631009
Summary: | removal of all SCSI devices related to an unmapped LUN doesn't remove the multipath device mapping | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yoni Tsafir <tsafir> | |
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | |
Status: | CLOSED ERRATA | QA Contact: | Lin Li <lilin> | |
Severity: | medium | Docs Contact: | Steven J. Levine <slevine> | |
Priority: | low | |||
Version: | 7.0 | CC: | agk, batkisso, bdonahue, bmarzins, christophe.varoqui, dmoessne, dwysocha, egoggin, fge, heinzm, iheim, jbrassow, junichi.nomura, kueda, lilin, lilu, lmb, msnitzer, nobody, pavel, pep, prajnoha, prockai, pzhukov, soc, tlavigne, tranlan, tvvcox, yanwang | |
Target Milestone: | rc | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | device-mapper-multipath-0.4.9-78.el7 | Doc Type: | Enhancement | |
Doc Text: |
The "deferred_remove" option has been added to the multipath.conf file. When set to "yes", the multipathd service performs a deferred remove operation when deleting the last path device; the last device is removed after the user closes the device. The default "deferred_remove" value is "no".
|
Story Points: | --- | |
Clone Of: | ||||
: | 1257704 (view as bug list) | Environment: | ||
Last Closed: | 2015-11-19 12:56:01 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 620148, 645519, 730389, 756082, 952099, 1113511, 1133060, 1205790, 1257704 |
Description
Yoni Tsafir
2010-09-07 15:26:16 UTC
Can you give me more information about this? The only thing that has the multipath device open is blkid, correct? Why is udev calling blkid on the multipath device when a scsi device is getting removed? Were these scsi devices not working to start with? Is this blkid perhaps hung from the time when this multipath device was first created? Can you please attach the results of running # multipath -ll both before and after removing the scsi devices? Hi Ben, terribly sorry for the delay in answering this one... Have no idea why udev is calling blkid on the device. These scsi devices were working, and it isn't hung from the time when the device was first created because it doesn't run until I perform the operation describe above (deleting all relevant /dev/sgXX devices). ###### Mapped a new volume ###### [root@rhel6Beta ~]# lsof /dev/mapper/mpatha # blkid isn't running just after mapping the volume [root@rhel6Beta ~]# multipath -ll mpathb (20017380000161f91) dm-0 IBM,2810XIV size=144G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 1:0:1:1 sdb 8:16 active ready running |- 1:0:2:1 sdc 8:32 active ready running |- 1:0:3:1 sdd 8:48 active ready running |- 2:0:1:1 sde 8:64 active ready running |- 2:0:2:1 sdf 8:80 active ready running `- 2:0:3:1 sdg 8:96 active ready running mpatha (20017380000163dcd) dm-6 IBM,2810XIV size=16G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 1:0:3:2 sdj 8:144 active ready running |- 1:0:1:2 sdh 8:112 active ready running |- 1:0:2:2 sdi 8:128 active ready running |- 2:0:3:2 sdm 8:192 active ready running |- 2:0:2:2 sdl 8:176 active ready running `- 2:0:1:2 sdk 8:160 active ready running ###### Un-mapped the volume ###### [root@rhel6Beta ~]# lsof /dev/mapper/mpatha #blkid still isn't running [root@rhel6Beta ~]# multipath -ll mpathb (20017380000161f91) dm-0 IBM,2810XIV size=144G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 1:0:1:1 sdb 8:16 active ready running |- 1:0:2:1 sdc 8:32 active ready running |- 1:0:3:1 sdd 8:48 active ready running |- 2:0:1:1 sde 8:64 active ready running |- 2:0:2:1 sdf 8:80 active ready running `- 2:0:3:1 sdg 8:96 active ready running mpatha (20017380000163dcd) dm-6 IBM,2810XIV size=16G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 1:0:3:4 sdj 8:144 active faulty running |- 1:0:1:4 sdh 8:112 active faulty running |- 1:0:2:4 sdi 8:128 active faulty running |- 2:0:3:4 sdm 8:192 active faulty running |- 2:0:2:4 sdl 8:176 active faulty running `- 2:0:1:4 sdk 8:160 active faulty running [root@rhel6Beta ~]# xiv_fc_admin -R # rescan, this will perform the operation described above of deleting faulty /dev/sgXX devices [root@rhel6Beta ~]# lsof /dev/mapper/mpatha COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME blkid 18308 root 3r BLK 253,6 0x3ffff0000 107910 /dev/mapper/../dm-6 [root@rhel6Beta ~]# multipath -ll mpathb (20017380000161f91) dm-0 IBM,2810XIV size=144G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 1:0:1:1 sdb 8:16 active ready running |- 1:0:2:1 sdc 8:32 active ready running |- 1:0:3:1 sdd 8:48 active ready running |- 2:0:1:1 sde 8:64 active ready running |- 2:0:2:1 sdf 8:80 active ready running `- 2:0:3:1 sdg 8:96 active ready running mpatha (20017380000163dcd) dm-6 , size=16G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=enabled `- #:#:#:# - #:# failed faulty running Hope this helps... Sorry about the LUN change (from 2 to 4) in the middle, output mixed from different runs, but I assure you except from that it's the same results. My best guess at what's happening is this. When the scsi devices are deleted, multipath needs to reload the device with without that path. When this happens, a change uevent is triggered, which causes the 13-dm-disk.rules udev rule to call blkid. When you are deleting all the scsi devices: 1. multipathd receives a remove uevent for one of the scsi devices that make up a multipath device. 2. multipathd removes the scsi device and reloads the multipath device without it. 3. reloading the multipath device causes a change uevent to be sent for it 4. 13-dm-disk.rules calls blkid on the multipath device that was reloaded, which hangs because there are no working paths and the multipath device is currently set to queue_if_no_path. 5. multipathd receives a remove uevent for the last scsi device that makes up a multipath device, however the device cannot be removed, since blkid has it open. In your case: 6. The no_path_retry timeout expires, multipathd fails the blkid IO, and closes the device. However, the uevent that should have removed the device has came and went. For devices that don't set a timeout for no_path_retry, the IO will never get failed, and without manual intervention, blkid will never complete. We need to try to avoid that blkid call. Also, I believe that udev sends out unmount messages. It would be nice for multipathd to remember when it failed to remove the device, so on umount it can try again. This wouldn't help your specific issue, since blkid doesn't have the device mounted, but when all paths are lost and the device is mounted, it would be nice if multipathd cleared it up on unmount. Although possibly, this should be handled in the kernel so we can catch all closes, not just the unmounts. Hi Ben, What you said makes sense, however we don't have the 13-dm-disk.rules file you talked about and we couldn't find any other udev rule that calls blkid. So where do you think that blkid call is coming from? In general, when are you guys planning to resolve this issue? Until then - is there a workaround we can do? Thanks! (In reply to comment #6) > Hi Ben, > > What you said makes sense, however we don't have the 13-dm-disk.rules file you > talked about and we couldn't find any other udev rule that calls blkid. > So where do you think that blkid call is coming from? Huh? you don't have /lib/udev/rules.d/13-dm-disk.rules what release are you using? > In general, when are you guys planning to resolve this issue? > Until then - is there a workaround we can do? Ideally multipath would have some mechanism for processes to request that their IO not be queued, even if the device was set to queue_if_no_path. Also, ideally, multipath would be able to remove devices on the last close, if the device had no paths. The second on might be possible in the near term, but I'm doubtful. The first one seems pretty unlikely to happen at all, unless there turns out to be an easy way to co-op something like O_NONBLOCK to do this, but I don't think so. A shorter term solution would be to make sure that blkid wasn't called on the device in this case. Another solution would be for multipathd to occasionally check devices with no paths to see if they can be deleted. This would also handle the case where the device was intentially open when all the paths were lost. However it wouldn't help in the blkid case, if the device was set to queue forever if there were no paths, and there would still be the lag waiting for blkid to fail in your case. Possibly the best answer is to do both: Keep blkid from running on change events where we are simply removing paths, and make multipathd occassionally check devices with no paths, to see if they can be removed. As for a workaround: If you disable queue_if_no_paths in /etc/multipath.conf, by setting no_path_retry fail In your devices section, or if you don't have one, in your defaults section. This should work around the problem, although there is a race, so it's possible that blkid won't have the device closed by the time the last path is removed. However, this workaround will fail IOs whenever all of your paths are down. You can avoid this by instead setting flush_on_last_del yes In your defaults section. This will turn off queueing when the last path is deleted. This won't guarantee that the multipath device will be deleted. There will be a much tigher race with this method. Also in RHEL6, the scsi layer will automatically delete devices that have been failed for dev_loss_tmo seconds. In order to make sure the you don't lose queueing when all your paths are down, you can set dev_loss_tmo 60 fast_io_fail_tmo 5 in your defaults section. This will cause the scsi layer to return IO from failed paths after 5 seconds, and remove the device after 60 seconds. This means that flush_on_last_del will fire after the last path has been down for 60 seconds. Since it appears that your setup is already stopping queuing after 30 seconds, this shouldn't cause any change in how quickly your paths fail back the queued IO. Let me know if either of these helps. > Thanks! *** Bug 649508 has been marked as a duplicate of this bug. *** > Huh? you don't have > > /lib/udev/rules.d/13-dm-disk.rules > > what release are you using? > Oops, turns out I do have it, looked in the wrong place... > no_path_retry fail > flush_on_last_del yes > dev_loss_tmo 60 > fast_io_fail_tmo 5 > > Let me know if either of these helps. > OK, so when setting all four of these, stuff works fine. But when setting only the bottom three, leaving 'no_path_retry 5', the same problem still happens, which as you said means change in behavior when there are no paths available, and we don't want that... Any suggestions? > But when setting only the bottom three, leaving 'no_path_retry 5', the same
> problem still happens, which as you said means change in behavior when there
> are no paths available, and we don't want that...
>
> Any suggestions?
In that case, until I write some code to make multipathd automatically prune these devices every so often, you'll need to manually run
# mulipath -f
after the the device has stopped queuing. This bug is scheduled to get fixed in 6.1
Does the blkid call actually change anything, or just refresh information it already cached with identical information? (Flags can be set on a reload to control which subset of udev rules is run.) Peter? And should blkid be changed to check that a device is accessible before trying to read from it? (In reply to comment #11) > Does the blkid call actually change anything, or just refresh information it > already cached with identical information? (Flags can be set on a reload to > control which subset of udev rules is run.) Blkid is just called on every change uevent unless it is flagged out explicitly by DM_UDEV_DISABLE_DISK_RULES_FLAG. I don't think there's any more information that blkid can acquire in this particular situation (if blkid hangs, other tools can't make any changes as well, hence blkid would not make any benefit from that scan either, I think). Then it's about identifying such situation in the rules directly somehow (like we already catch a few situations in 10-dm.rules). Or, if possible, setting the DM_UDEV_DISABLE_DISK_RULES_FLAG flag through libdevmapper on that device reload which generates the uevent, as you mention it. (..so identifying the last usable path that has just been removed) Multipath now sets DM_UDEV_DISABLE_DISK_RULES_FLAG when it's reloading the table after a path has been deleted. This keeps blkid from firing at all in these cases. There are still cases that could benefit from multipathd occasionally occassionally trying to remove devices that have no paths (or possibly monitoring closes with inotify), but that work isn't happening for 6.1 The fix for this bug caused a regression (Bug 677937). The problem is that if a multipath or kpartx device was created in the initramfs, the udev disk rules need to be run again after the actual root device is mounted to set up all the symlinks. However, since the devices have already been created, this fix always sets DM_UDEV_DISABLE_DISK_RULES_FLAG, when it reloads them. To solve this, I need an option to kpartx and multipath to let them override this behaviors. This option will be used by rc.sysinit when it calls multipath and kpartx. Any news about this? I see RHEL 6.1 is out and this wasn't fixed yet... multipath and kpartx now have a -u option that will force the udev dm-disk rules to be run for reloads of existing devices. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Whenever a multipath device table was reloaded, udev would regather information about the device with blkid. If the device had no useable paths and was set to queue IO, this would cause blkid to hang forever, keeping the device open. reloading a already existing multipath device no longer triggers these udev rules, so blkid no longer keeps failed devices open. Even with the -u option, the fix for this caused yet another regression. Apparently, if blkid doesn't run every time the device gets a change event, that information is removed from the udev database. This causes utilities that rely on the udev database for information about the device to not work correctly. So, I'm backing this fix out. I should be possible to have the 13-dm-disk.rules fix this by calling IMPORT{db} to repopulate the udev database when a change even comes in with DM_UDEV_DISABLE_DISK_RULES_FLAG This occurrence of this issue has been greatly reduced. Fixing this issue completely involves having the waiter daemon occasionally check devices with no paths, to see if they can be removed. This is work that should get done in RHEL7 first, and then possibly backported to RHEL6. This will actually get handled using the new DM_DEFERRED_REMOVE flag. http://www.redhat.com/archives/dm-devel/2013-September/msg00074.html This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. The comment above is incorrect. The correct version is bellow. I'm sorry for any inconvenience. --------------------------------------------------------------- This request was NOT resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you need to escalate this bug. Fixed this, using the new DM_DEFERRED_REMOVE flag. When deferred_remove is set, multipathd will now used deferred removes when the last path is deleted. If a new path is added before the deferred remove completes, it is cancelled. Hi. I had same case before. You need to restart multipath service. After restart it, you should see removal of multipath disk device/LUN information from "multipath -ll" output. Then you should run "blkid" successfully without system hang. I want to correct my message at Comment #36. Because of i couldn't find way to edit message #36, i write as a new comment. If you UNMAP LUNs/Disk devices before remove by "dmsetup remove <device path>", you should clean it after; #service multipathd stop #dmsetup remove /dev/mapper/mpathep1 #dmsetup remove /dev/mapper/mpathep3 #dmsetup remove /dev/mapper/mpathep2 #dmsetup remove /dev/mapper/mpathe #service multipathd restart process. That's all. change to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2132.html |