Bug 570359

Summary: "lvremove -f" fails to remove an active logical volume
Product: Red Hat Enterprise Linux 6 Reporter: Michael Solberg <msolberg>
Component: lvm2Assignee: Peter Rajnoha <prajnoha>
Status: CLOSED ERRATA QA Contact: Corey Marthaler <cmarthal>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: acathrow, agk, ajia, bdwheele, coughlan, davidz, dwysocha, heinzm, herrold, jbrassow, kueda, liko, mbroz, mfuruta, myamazak, prajnoha, prockai, tyasui
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.86-1.el6 Doc Type: Bug Fix
Doc Text:
Issuing an lvremove command could end up with a failure to remove a logical volume. This failure is caused by processing asynchronous udev event that keeps the volume opened while lvremove command tries to remove it. These asynchronous events are triggered when the 'watch' udev rule is applied (it's set for device-mapper/LVM2 devices when using the 'udisks' package that installs /lib/udev/rules.d/80-udisks.rules). To fix this issue, the number of device open calls in read-write mode has been minimized and we use read-only mode internally if possible (the event is generated on closing a device that has the 'watch' rule set and is closed after a read-write open). Although this fixes a problem when opening a device internally within the command execution, the failure could still occur if using several commands quickly in a sequence where each one opens a device for read-write and then closes it immediately (e.g. in a script). In this case, a user is advised to use 'udevadm settle' command in between.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 16:52:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 658636, 702260, 703492    
Attachments:
Description Flags
lvremove -vvvv -f /dev/VolGroup00/Test1
none
lvmdump
none
lvremove -vvvv -f /dev/VolGroup00/Test1
none
Error output of lvremove -vvvv -f /dev/VolGroup00/test2 without udev running. none

Description Michael Solberg 2010-03-04 01:10:27 UTC
Description of problem:
lvremove doesn't seem to honor the "f" flag.

Version-Release number of selected component (if applicable):
lvm2-2.02.61-1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a logical volume.
2. "lvremove -f" the logical volume
  
Actual results:
[root@localhost ~]# lvremove -f /dev/VolGroup00/Test1 
  The link /dev/VolGroup00/Test1 should have been removed by udev but it is still present. Falling back to direct link removal.
  Unable to deactivate logical volume "Test1"
[root@localhost ~]# lvremove -f /dev/mapper/VolGroup00-Test1 
  The link /dev/VolGroup00/Test1 should have been removed by udev but it is still present. Falling back to direct link removal.
  Unable to deactivate logical volume "Test1"

Expected results:
The volume should be removed without prompting.

Additional info:
Without the -f, the volume can be removed:
[root@localhost ~]# lvremove /dev/VolGroup00/Test1 
Do you really want to remove active logical volume Test1? [y/n]: y
  Logical volume "Test1" successfully removed

Comment 2 RHEL Program Management 2010-03-04 01:40:37 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Peter Rajnoha 2010-03-04 12:31:34 UTC
Hmm, I tried to reproduce, but did not manage to get the error reported. Please, try to rerun failing commands with verbose output "-vvvv" and attach it here. Also, please attach the output of "lvmdump" command, too. Thanks.

Comment 4 Michael Solberg 2010-03-04 17:05:14 UTC
Created attachment 397867 [details]
lvremove -vvvv -f /dev/VolGroup00/Test1

Comment 5 Michael Solberg 2010-03-04 17:05:43 UTC
Created attachment 397869 [details]
lvmdump

Comment 6 Michael Solberg 2010-03-04 17:08:22 UTC
Comment on attachment 397867 [details]
lvremove -vvvv -f /dev/VolGroup00/Test1

Bah.  This is the wrong command.

Comment 7 Michael Solberg 2010-03-04 17:09:22 UTC
Created attachment 397871 [details]
lvremove  -vvvv -f /dev/VolGroup00/Test1

This is the correct output.

Comment 8 Peter Rajnoha 2010-04-26 12:33:56 UTC
Well, if the logs are right then "_deactivate_node" is not called at all (and that one is responsible for calling the exact remove ioctl). Otherwise, we would see a log line like this:

  "#libdm-deptree:865    Removing VolGroup00-Test1 (<major>:<minor>)"

Which means that lvremove gets into an erroneous state just after the dependency tree is built and before the actual ioctl is called (the exact cause of the error doesn't seem to be caught by the log exactly though - probably we need to add more info there for any future debugging).

I'll try to inspect the code around manually and see what the possible cause could be...

Just to be sure, could you please try to reproduce this lvremove problem with udev daemon killed as well? ("killall udevd", you can put it back with "udevd --daemon" then). So we can see if udev interferes somehow again...

Comment 9 Michael Solberg 2010-04-26 13:56:29 UTC
Created attachment 409177 [details]
Error output of lvremove -vvvv -f /dev/VolGroup00/test2 without udev running.

I was able to remove the lv with udev dead.

Comment 10 Peter Rajnoha 2010-04-26 14:15:28 UTC
OK, so let's try to narrow it down.

Do you have "udisks" package installed? If yes, could you please try to comment out this one rule in /lib/udev/rules.d/80-udisks.rules:

  #KERNEL=="dm-*", OPTIONS+="watch"

...and see if you can reproduce the problem (now with udev daemon running, of course). Thanks.

Comment 11 Peter Rajnoha 2010-04-26 14:23:11 UTC
...and also try to reproduce the problem again with that udev rule uncommented after that, so we're sure and don't have a false positive... I would do that myself, but I had no luck to reproduce this on my own testing machine so I have to rely on you :)

Comment 12 Michael Solberg 2010-04-26 14:26:11 UTC
I'm able to remove with the line commented.  Also - if I create the volume with the line commented and then uncomment the line, I can remove the volue.  However, I can still reproduce the error with the line uncommented.

Comment 13 Peter Rajnoha 2010-04-26 14:59:20 UTC
OK, thanks a lot for testing this!

(In reply to comment #12)
> I'm able to remove with the line commented.  Also - if I create the volume with

So the "watch" rule is run on CHANGE udev event and that happens when creating a new device-mapper device... (I assume you had that rule commented while creating the device as well.)

> the line commented and then uncomment the line, I can remove the volue. 

...yes, because it has not registered the inotify watch for that device while creating it...

> However, I can still reproduce the error with the line uncommented.

...and yes, here it comes again.

So it seems that the "watch" rule interferes again. Unfortunately, we don't have a solution for this yet. But it seems we *really* need to prevent the watch rule use in any other rules while processing device-mapper devices until we have a solution for proper synchronization (if it's possible at all).

(See also bug #577798)

Comment 14 Michael Solberg 2010-04-26 15:08:27 UTC
So - what's the downside of me leaving that line commented?  Does it just break the gnome-disk-utility?

Comment 15 Peter Rajnoha 2010-04-26 16:32:27 UTC
(In reply to comment #14)
> So - what's the downside of me leaving that line commented?  Does it just break
> the gnome-disk-utility?    

Well, CC-ing David, I think he can provide better answer for this question...

Comment 16 Takahiro Yasui 2010-05-13 16:36:42 UTC
>  #KERNEL=="dm-*", OPTIONS+="watch"

As I reported by bug 591606, an I/O to 'dm-*' generates a lot of unexpected
I/O to the device. Here is a sample I/O trace. (See bug 591606 for the I/O
tracer)

<command>
# dd if=/dev/zero of=/dev/dm-0 bs=4096 count=1

<I/O trace>
[9:0:0:0] command=0x2a size=0x1000 sector=0x180
[9:0:0:0] command=0x28 size=0x1000 sector=0x180
[9:0:0:0] command=0x28 size=0x1000 sector=0x1b8
[9:0:0:0] command=0x28 size=0x1000 sector=0x6100
[9:0:0:0] command=0x28 size=0x1000 sector=0x6170
[9:0:0:0] command=0x28 size=0x1000 sector=0x188
[9:0:0:0] command=0x28 size=0x1000 sector=0x6178
[9:0:0:0] command=0x28 size=0x1000 sector=0x6078
[9:0:0:0] command=0x28 size=0x1000 sector=0x6140
[9:0:0:0] command=0x28 size=0x1000 sector=0x6080
[9:0:0:0] command=0x28 size=0x1000 sector=0x5ff0
[9:0:0:0] command=0x28 size=0x1000 sector=0x980
[9:0:0:0] command=0x28 size=0x1000 sector=0x198
[9:0:0:0] command=0x28 size=0x1000 sector=0x1f8
[9:0:0:0] command=0x28 size=0x1000 sector=0x190
[9:0:0:0] command=0x28 size=0x1000 sector=0x200
[9:0:0:0] command=0x28 size=0x1000 sector=0x1c0
[9:0:0:0] command=0x28 size=0x1000 sector=0x380
[9:0:0:0] command=0x28 size=0x1000 sector=0x1a0
[9:0:0:0] command=0x28 size=0x1000 sector=0x1180
[9:0:0:0] command=0x28 size=0x1000 sector=0x180

Why is the rule, 'KERNEL=="dm-*", OPTIONS+="watch"' set by default?
This rule changes a behavior of systems compared to RHEL5.
Commenting out this rule also solves bug 591606.

Comment 17 RHEL Program Management 2010-07-15 14:25:16 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 20 Corey Marthaler 2011-01-10 23:08:54 UTC
What am I missing to reproduce this issue? 

[root@grant-01 ~]# lvs -a -o +devices
  LV                VG         Attr   LSize  Log         Copy%  Devices
  mirror            grant      mwi-a- 52.00m mirror_mlog 100.00 mirror_mimage_0(0),mirror_mimage_1(0)
  [mirror_mimage_0] grant      iwi-ao 52.00m                    /dev/sdb1(0)
  [mirror_mimage_1] grant      iwi-ao 52.00m                    /dev/sdb2(0)
  [mirror_mlog]     grant      lwi-ao  4.00m                    /dev/sdc3(0)

[root@grant-01 ~]# lvremove -f /dev/grant/mirror
  Logical volume "mirror" successfully removed

2.6.32-71.el6.x86_64

lvm2-2.02.72-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
lvm2-libs-2.02.72-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
lvm2-cluster-2.02.72-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
udev-147-2.29.el6    BUILT: Tue Aug 31 16:44:10 CDT 2010
device-mapper-1.02.53-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
device-mapper-libs-1.02.53-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
device-mapper-event-1.02.53-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
device-mapper-event-libs-1.02.53-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
cmirror-2.02.72-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010

Comment 21 Peter Rajnoha 2011-01-11 09:14:25 UTC
(In reply to comment #20)
> What am I missing to reproduce this issue? 

It's a race and it's not 100% reproducible. As far as we know, the race is introduced by using "udisks" package containing "/lib/udev/rules.d/80-udisks.rules" with the "watch" rule used for DM devices. That is the source of the events we can't synchronize with yet.

Comment 22 Suzanne Logcher 2011-03-28 21:07:55 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains 
unresolved, it has been rejected as it is not proposed as an 
exception or blocker.

Red Hat invites you to ask your support representative to 
propose this request, if appropriate and relevant, in the 
next release of Red Hat Enterprise Linux.

Comment 23 Peter Rajnoha 2011-05-30 09:33:50 UTC
*** Bug 638711 has been marked as a duplicate of this bug. ***

Comment 24 Peter Rajnoha 2011-05-30 09:34:58 UTC
We've applied a patch upstream that tries to minimize device RW open calls
within the LVM itself. This should also prevent the events based on the watch
rule from being fired when not necessary, at least with respect to internal LVM
handling of devices:

  https://www.redhat.com/archives/lvm-devel/2011-May/msg00025.html (LVM2
v2.02.86)

However, there's still a possibility that somone else, externally, will open a
device for read-write and close it (which will cause the uevent to occur) just
before the device is removed and so we could end up with the same problem as
reported here - in this case, we have no control over this asynchronicity.

(For a hassle about the watch rule and more related information see also
https://bugzilla.redhat.com/show_bug.cgi?id=561424)

Comment 25 Corey Marthaler 2011-06-02 14:58:42 UTC
Adding QA ack for 6.2. 

Based on comments #21 and #24, a definitive reproducer for this defect does not exist. This bug will mostly be marked verified (SanityOnly) once final 6.2 regression testing has been completed.

Comment 27 Peter Rajnoha 2011-08-05 17:24:04 UTC
*** Bug 721122 has been marked as a duplicate of this bug. ***

Comment 28 Peter Rajnoha 2011-08-09 14:52:43 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Issuing an lvremove command could end up with a failure to remove a logical volume. This failure is caused by processing asynchronous udev event that keeps the volume opened while lvremove command tries to remove it. These asynchronous events are triggered when the 'watch' udev rule is applied (it's set for device-mapper/LVM2 devices when using the 'udisks' package that installs /lib/udev/rules.d/80-udisks.rules).

To fix this issue, the number of device open calls in read-write mode has been minimized and we use read-only mode internally if possible (the event is generated on closing a device that has the 'watch' rule set and is closed after a read-write open).

Although this fixes a problem when opening a device internally within the command execution, the failure could still occur if using several commands quickly in a sequence where each one opens a device for read-write and then closes it immediately (e.g. in a script). In this case, a user is advised to use 'udevadm settle' command in between.

Comment 29 Peter Rajnoha 2011-08-09 15:06:35 UTC
*** Bug 700128 has been marked as a duplicate of this bug. ***

Comment 32 Corey Marthaler 2011-09-07 14:54:15 UTC
QA was never able to reproduce this issue. Marking verified (SanityOnly).

[root@taft-02 ~]# lvcreate -L 100M -n LV taft
  Logical volume "LV" created
[root@taft-02 ~]# lvs -a -o +devices
  LV      VG        Attr   LSize    Devices         
  LV      taft      -wi-a- 100.00m  /dev/sdb1(0)    
[root@taft-02 ~]# lvremove -f taft/LV
  Logical volume "LV" successfully removed


2.6.32-192.el6.x86_64

lvm2-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-libs-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-cluster-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
cmirror-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011

Comment 33 errata-xmlrpc 2011-12-06 16:52:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1522.html