Bug 570359
Summary: | "lvremove -f" fails to remove an active logical volume | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Michael Solberg <msolberg> | ||||||||||
Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 6.0 | CC: | acathrow, agk, ajia, bdwheele, coughlan, davidz, dwysocha, heinzm, herrold, jbrassow, kueda, liko, mbroz, mfuruta, myamazak, prajnoha, prockai, tyasui | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | lvm2-2.02.86-1.el6 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
Issuing an lvremove command could end up with a failure to remove a logical volume. This failure is caused by processing asynchronous udev event that keeps the volume opened while lvremove command tries to remove it. These asynchronous events are triggered when the 'watch' udev rule is applied (it's set for device-mapper/LVM2 devices when using the 'udisks' package that installs /lib/udev/rules.d/80-udisks.rules).
To fix this issue, the number of device open calls in read-write mode has been minimized and we use read-only mode internally if possible (the event is generated on closing a device that has the 'watch' rule set and is closed after a read-write open).
Although this fixes a problem when opening a device internally within the command execution, the failure could still occur if using several commands quickly in a sequence where each one opens a device for read-write and then closes it immediately (e.g. in a script). In this case, a user is advised to use 'udevadm settle' command in between.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-12-06 16:52:23 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 658636, 702260, 703492 | ||||||||||||
Attachments: |
|
Description
Michael Solberg
2010-03-04 01:10:27 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Hmm, I tried to reproduce, but did not manage to get the error reported. Please, try to rerun failing commands with verbose output "-vvvv" and attach it here. Also, please attach the output of "lvmdump" command, too. Thanks. Created attachment 397867 [details]
lvremove -vvvv -f /dev/VolGroup00/Test1
Created attachment 397869 [details]
lvmdump
Comment on attachment 397867 [details]
lvremove -vvvv -f /dev/VolGroup00/Test1
Bah. This is the wrong command.
Created attachment 397871 [details]
lvremove -vvvv -f /dev/VolGroup00/Test1
This is the correct output.
Well, if the logs are right then "_deactivate_node" is not called at all (and that one is responsible for calling the exact remove ioctl). Otherwise, we would see a log line like this: "#libdm-deptree:865 Removing VolGroup00-Test1 (<major>:<minor>)" Which means that lvremove gets into an erroneous state just after the dependency tree is built and before the actual ioctl is called (the exact cause of the error doesn't seem to be caught by the log exactly though - probably we need to add more info there for any future debugging). I'll try to inspect the code around manually and see what the possible cause could be... Just to be sure, could you please try to reproduce this lvremove problem with udev daemon killed as well? ("killall udevd", you can put it back with "udevd --daemon" then). So we can see if udev interferes somehow again... Created attachment 409177 [details]
Error output of lvremove -vvvv -f /dev/VolGroup00/test2 without udev running.
I was able to remove the lv with udev dead.
OK, so let's try to narrow it down. Do you have "udisks" package installed? If yes, could you please try to comment out this one rule in /lib/udev/rules.d/80-udisks.rules: #KERNEL=="dm-*", OPTIONS+="watch" ...and see if you can reproduce the problem (now with udev daemon running, of course). Thanks. ...and also try to reproduce the problem again with that udev rule uncommented after that, so we're sure and don't have a false positive... I would do that myself, but I had no luck to reproduce this on my own testing machine so I have to rely on you :) I'm able to remove with the line commented. Also - if I create the volume with the line commented and then uncomment the line, I can remove the volue. However, I can still reproduce the error with the line uncommented. OK, thanks a lot for testing this! (In reply to comment #12) > I'm able to remove with the line commented. Also - if I create the volume with So the "watch" rule is run on CHANGE udev event and that happens when creating a new device-mapper device... (I assume you had that rule commented while creating the device as well.) > the line commented and then uncomment the line, I can remove the volue. ...yes, because it has not registered the inotify watch for that device while creating it... > However, I can still reproduce the error with the line uncommented. ...and yes, here it comes again. So it seems that the "watch" rule interferes again. Unfortunately, we don't have a solution for this yet. But it seems we *really* need to prevent the watch rule use in any other rules while processing device-mapper devices until we have a solution for proper synchronization (if it's possible at all). (See also bug #577798) So - what's the downside of me leaving that line commented? Does it just break the gnome-disk-utility? (In reply to comment #14) > So - what's the downside of me leaving that line commented? Does it just break > the gnome-disk-utility? Well, CC-ing David, I think he can provide better answer for this question... > #KERNEL=="dm-*", OPTIONS+="watch" As I reported by bug 591606, an I/O to 'dm-*' generates a lot of unexpected I/O to the device. Here is a sample I/O trace. (See bug 591606 for the I/O tracer) <command> # dd if=/dev/zero of=/dev/dm-0 bs=4096 count=1 <I/O trace> [9:0:0:0] command=0x2a size=0x1000 sector=0x180 [9:0:0:0] command=0x28 size=0x1000 sector=0x180 [9:0:0:0] command=0x28 size=0x1000 sector=0x1b8 [9:0:0:0] command=0x28 size=0x1000 sector=0x6100 [9:0:0:0] command=0x28 size=0x1000 sector=0x6170 [9:0:0:0] command=0x28 size=0x1000 sector=0x188 [9:0:0:0] command=0x28 size=0x1000 sector=0x6178 [9:0:0:0] command=0x28 size=0x1000 sector=0x6078 [9:0:0:0] command=0x28 size=0x1000 sector=0x6140 [9:0:0:0] command=0x28 size=0x1000 sector=0x6080 [9:0:0:0] command=0x28 size=0x1000 sector=0x5ff0 [9:0:0:0] command=0x28 size=0x1000 sector=0x980 [9:0:0:0] command=0x28 size=0x1000 sector=0x198 [9:0:0:0] command=0x28 size=0x1000 sector=0x1f8 [9:0:0:0] command=0x28 size=0x1000 sector=0x190 [9:0:0:0] command=0x28 size=0x1000 sector=0x200 [9:0:0:0] command=0x28 size=0x1000 sector=0x1c0 [9:0:0:0] command=0x28 size=0x1000 sector=0x380 [9:0:0:0] command=0x28 size=0x1000 sector=0x1a0 [9:0:0:0] command=0x28 size=0x1000 sector=0x1180 [9:0:0:0] command=0x28 size=0x1000 sector=0x180 Why is the rule, 'KERNEL=="dm-*", OPTIONS+="watch"' set by default? This rule changes a behavior of systems compared to RHEL5. Commenting out this rule also solves bug 591606. This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. It has been denied for the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** What am I missing to reproduce this issue? [root@grant-01 ~]# lvs -a -o +devices LV VG Attr LSize Log Copy% Devices mirror grant mwi-a- 52.00m mirror_mlog 100.00 mirror_mimage_0(0),mirror_mimage_1(0) [mirror_mimage_0] grant iwi-ao 52.00m /dev/sdb1(0) [mirror_mimage_1] grant iwi-ao 52.00m /dev/sdb2(0) [mirror_mlog] grant lwi-ao 4.00m /dev/sdc3(0) [root@grant-01 ~]# lvremove -f /dev/grant/mirror Logical volume "mirror" successfully removed 2.6.32-71.el6.x86_64 lvm2-2.02.72-8.el6_0.4 BUILT: Thu Dec 9 09:46:33 CST 2010 lvm2-libs-2.02.72-8.el6_0.4 BUILT: Thu Dec 9 09:46:33 CST 2010 lvm2-cluster-2.02.72-8.el6_0.4 BUILT: Thu Dec 9 09:46:33 CST 2010 udev-147-2.29.el6 BUILT: Tue Aug 31 16:44:10 CDT 2010 device-mapper-1.02.53-8.el6_0.4 BUILT: Thu Dec 9 09:46:33 CST 2010 device-mapper-libs-1.02.53-8.el6_0.4 BUILT: Thu Dec 9 09:46:33 CST 2010 device-mapper-event-1.02.53-8.el6_0.4 BUILT: Thu Dec 9 09:46:33 CST 2010 device-mapper-event-libs-1.02.53-8.el6_0.4 BUILT: Thu Dec 9 09:46:33 CST 2010 cmirror-2.02.72-8.el6_0.4 BUILT: Thu Dec 9 09:46:33 CST 2010 (In reply to comment #20) > What am I missing to reproduce this issue? It's a race and it's not 100% reproducible. As far as we know, the race is introduced by using "udisks" package containing "/lib/udev/rules.d/80-udisks.rules" with the "watch" rule used for DM devices. That is the source of the events we can't synchronize with yet. Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as an exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. *** Bug 638711 has been marked as a duplicate of this bug. *** We've applied a patch upstream that tries to minimize device RW open calls within the LVM itself. This should also prevent the events based on the watch rule from being fired when not necessary, at least with respect to internal LVM handling of devices: https://www.redhat.com/archives/lvm-devel/2011-May/msg00025.html (LVM2 v2.02.86) However, there's still a possibility that somone else, externally, will open a device for read-write and close it (which will cause the uevent to occur) just before the device is removed and so we could end up with the same problem as reported here - in this case, we have no control over this asynchronicity. (For a hassle about the watch rule and more related information see also https://bugzilla.redhat.com/show_bug.cgi?id=561424) Adding QA ack for 6.2. Based on comments #21 and #24, a definitive reproducer for this defect does not exist. This bug will mostly be marked verified (SanityOnly) once final 6.2 regression testing has been completed. *** Bug 721122 has been marked as a duplicate of this bug. *** Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Issuing an lvremove command could end up with a failure to remove a logical volume. This failure is caused by processing asynchronous udev event that keeps the volume opened while lvremove command tries to remove it. These asynchronous events are triggered when the 'watch' udev rule is applied (it's set for device-mapper/LVM2 devices when using the 'udisks' package that installs /lib/udev/rules.d/80-udisks.rules). To fix this issue, the number of device open calls in read-write mode has been minimized and we use read-only mode internally if possible (the event is generated on closing a device that has the 'watch' rule set and is closed after a read-write open). Although this fixes a problem when opening a device internally within the command execution, the failure could still occur if using several commands quickly in a sequence where each one opens a device for read-write and then closes it immediately (e.g. in a script). In this case, a user is advised to use 'udevadm settle' command in between. *** Bug 700128 has been marked as a duplicate of this bug. *** QA was never able to reproduce this issue. Marking verified (SanityOnly). [root@taft-02 ~]# lvcreate -L 100M -n LV taft Logical volume "LV" created [root@taft-02 ~]# lvs -a -o +devices LV VG Attr LSize Devices LV taft -wi-a- 100.00m /dev/sdb1(0) [root@taft-02 ~]# lvremove -f taft/LV Logical volume "LV" successfully removed 2.6.32-192.el6.x86_64 lvm2-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-libs-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-cluster-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 udev-147-2.37.el6 BUILT: Wed Aug 10 07:48:15 CDT 2011 device-mapper-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 cmirror-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1522.html |