Bug 710305 - [regression] 'lvremove' fails sometimes to remove snapshot volumes
[regression] 'lvremove' fails sometimes to remove snapshot volumes
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: lvm2 (Show other bugs)
14
All Linux
unspecified Severity high
: ---
: ---
Assigned To: Peter Rajnoha
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-02 20:07 EDT by Ian Donaldson
Modified: 2012-02-06 04:16 EST (History)
33 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 577798
Environment:
Last Closed: 2012-02-02 07:06:27 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Ian Donaldson 2011-06-02 20:07:22 EDT
+++ This bug was initially created as a clone of Bug #577798 +++

Description of problem:

# lvdisplay

  --- Logical volume ---
  LV Name                /dev/vg01/.local.backup
  VG Name                vg01
  LV UUID                hkAyO4-M31g-LJw5-Kdcu-AfK1-Bquw-buVrWA
  LV Write Access        read only
  LV snapshot status     active destination for /dev/vg01/local
  LV Status              available
  # open                 0
  LV Size                2,00 GiB
  Current LE             512
  COW-table size         1,00 GiB
  COW-table LE           256
  Allocated to snapshot  0,01% 
  Snapshot chunk size    4,00 KiB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:29

# /dev/vg01/.local.backup
  Can't remove open logical volume ".local.backup"

  ... repeating this several times ...

# lvremove  /dev/vg01/.local.backup
Do you really want to remove active logical volume .local.backup? [y/n]: y
  Logical volume ".local.backup" successfully removed


Version-Release number of selected component (if applicable):

kernel-2.6.33.1-19.fc13.x86_64
lvm2-2.02.61-1.fc13.x86_64

--- Additional comment from enrico.scholz@informatik.tu-chemnitz.de on 2010-03-29 05:52:00 EDT ---

Created attachment 403250 [details]
strace in non-working case

--- Additional comment from enrico.scholz@informatik.tu-chemnitz.de on 2010-03-29 05:52:28 EDT ---

Created attachment 403251 [details]
strace in working case

--- Additional comment from prajnoha@redhat.com on 2010-03-29 06:08:36 EDT ---

Do you have "udisks" package installed? There is one udev rule that could have possibly caused this...

For starters, just a quick check - could you please try to kill udev daemon temporarily and see if you can reproduce the problem? Thanks.

--- Additional comment from enrico.scholz@informatik.tu-chemnitz.de on 2010-03-30 07:36:02 EDT ---

yes; udisks is installed and I can not reproduce the issue after its removal.

'udevadm control --stop-exec-queue' before lvremove seems to work too.

--- Additional comment from prajnoha@redhat.com on 2010-03-30 08:08:22 EDT ---

Just for the record, the rule we have problem supporting is this one exactly (in /lib/udev/rules.d/80-udisks.rules which is a part of udisks package):

# Make udevd synthesize a 'change' uevent when last opener of a rw-fd closes the fd - this
# should be part of the device-mapper rules
KERNEL=="dm-*", OPTIONS+="watch"

We have added udev synchronisation feature in device-mapper/lvm2 recently so we always wait until udev processing is settled down to cope with such problems where devices are accessed from within udev rules and also to provide a way to wait for nodes/symlinks to be created.

However, we can't synchronize with events synthesized as a result of this rule (like we can't with events originating in "udevadm trigger" which generates such events as well). The synchronisation could be done on events we know about (events originated in device-mapper itself).

There are still ongoing discussions with udev team to properly deal with this issue though...

Now, could you please keep the udisks package and also keep udev running while having the "watch" rule commented out and see if the problem is gone? Thanks.

(..so we're really sure this is exactly the case and as a proof)

--- Additional comment from enrico.scholz@informatik.tu-chemnitz.de on 2010-03-30 08:58:11 EDT ---

yes; after commenting out this line I can not reproduce the error anymore.

--- Additional comment from prajnoha@redhat.com on 2010-03-30 09:20:24 EDT ---

So we have another nice and practical example how the "watch" rule disrupts an idea to properly synchronize with udev events...

--- Additional comment from prajnoha@redhat.com on 2010-03-30 09:26:06 EDT ---

...also, CC-ing David (the "udisks" package maintainer).

--- Additional comment from enrico.scholz@informatik.tu-chemnitz.de on 2010-05-13 06:19:18 EDT ---

still with

lvm2-2.02.61-1.fc13.x86_64
udisks-1.0.1-1.fc13.x86_64

--- Additional comment from prajnoha@redhat.com on 2010-05-17 09:07:01 EDT ---

(In reply to comment #9)
> still with
> 
> lvm2-2.02.61-1.fc13.x86_64
> udisks-1.0.1-1.fc13.x86_64    

Unfortunately, we still don't have a proper solution for synchronization with the events like the ones originated in the "watch" rule.

--- Additional comment from bugzilla-redhat@brianneu.com on 2010-11-06 18:04:08 EDT ---

Is there a command that can be issues to assure that lvremove will work?

I've noticed that if I just run it repeatedly, it eventually works, but I don't exactly want to script a loop like that.

Or is commenting out the "watch" line the only solution available?

--- Additional comment from prajnoha@redhat.com on 2010-11-08 04:21:13 EST ---

(In reply to comment #11)
> Is there a command that can be issues to assure that lvremove will work?
> 
> I've noticed that if I just run it repeatedly, it eventually works, but I don't
> exactly want to script a loop like that.
> 
> Or is commenting out the "watch" line the only solution available?

For now, yes, this is the only solution we have, unfortunately. I'm sorry for any inconvenience that this brings. I'll provide an update as soon as we have a decent solution that will be acceptable for both for device-mapper and udev/udisks...

--- Additional comment from cyberrider@esmarshall.com on 2010-11-27 04:38:13 EST ---

I have a similar problem.
Scenario: Want to back IMAP mail spools and DB with minimum outage to cyrus-imap.

Solution: Halt Cyrus; Snapshot IMAP DB & mail spool file systems; mount the snapshot file systems as RO on some mount-point; restart Cyrus; backup RO file systems (ie snapshots); umount and throw away snapshots once backup complete.

Everything works fine until it's time to "lvremove -f ${VG}/${IMAPMB_SNAP} ${VG}/${IMAPDB_SNAP}"

The umount works but the lvremove fails, and "lvs" reports the snapshot LVs as being closed, but they're still showing as active.


I have a work-around that appears to work using "dmsetup".
(extracted from my script)

###### ====== extract from savemail script ====== ######
snap_off()
{
## First, force a flush of disc buffers
sync

## Now we dismount the snapshot file system copies
printf "Dismounting snapshot filesystems...\t"
umount ${SNAPROOT}/${IMAPDB_DIR} ${SNAPROOT}/${IMAPMB_DIR}
printf "Done!\n"

## Pause for a bit so the file systems can complete their dismount
sleep 10

## Flush any buffers to disc again - just to be sure
sync

## Wait another 10 seconds for everything to stabilise
sleep 10

### I have to use "dmsetup remove" to deactivate the snapshots first
for SNAPVOL in ${VG}-${IMAPMB_SNAP} ${VG}-${IMAPDB_SNAP}; do
  printf "Deactivating snapshot volume %s\n" ${SNAPVOL}
  dmsetup remove ${SNAPVOL}
  dmsetup remove ${SNAPVOL}-cow
## for some reason, the copy-on-write devices aren't cleaned up auto-magically
## so I have to remove them auto-manually.
done

## Okay - now we can remove the snapshot logical volumes
lvremove -f ${VG}/${IMAPMB_SNAP} ${VG}/${IMAPDB_SNAP}
}

###### ====== end of script extract ====== ######

I am now able to consistently and reliably tear down the snapshot volumes once my backup completes.

Personally, I would prefer a simpler command when working with snaphots.

Perhaps something like:
lvsnapshot accept <LogicalVolumePath> [<LogicalVolumePath>...]
and
lvsnapshot revert <LogicalVolumePath> [<LogicalVolumePath>...]

where "accept" means "throw away the snapshot and continue with all updates since snapshot created"
and "revert" means "throw all modifications since snapshot creation, and return to the point at which the snapshot was created"

A modifier flag (eg "-k" or "--keep") would retain the snapshot volume that had been created; this would allow an easy way to accept changes and re-checkpoint a snapshot; or in the case of "revert" allow you to revert to the file system as it was as many times as you like.  (Excellent in a classroom situation for example).

--- Additional comment from cyberrider@esmarshall.com on 2010-11-27 04:40:36 EST ---

Forgot to add:

package versions are as follows:

kernel-2.6.34.7-61.fc13.x86_64
lvm2-2.02.73-2.fc13.x86_64
udev-153-4.fc13.x86_64
udisks-1.0.1-4.fc13.x86_64

--- Additional comment from cyberrider@esmarshall.com on 2010-11-27 04:46:26 EST ---

And the device-mapper package is device-mapper-1.02.54-2.fc13.x86_64

--- Additional comment from pza@pza.net.au on 2011-01-04 11:16:48 EST ---

Also is a problem on RHEL 6.

--- Additional comment from iand@ekit-inc.com on 2011-02-20 19:45:22 EST ---

Also a problem on FC14

--- Additional comment from walter.haidinger@gmx.at on 2011-02-21 03:08:57 EST ---

On RHEL6, the workaround of comment #5 (commenting out the watch rule) does NOT work for me.
kernel-2.6.32-71.14.1.el6.x86_64
lvm2-2.02.72-8.el6_0.4.x86_64
udev-147-2.29.el6.x86_64
udisks-1.0.1-2.el6.x86_64

--- Additional comment from ndevos@redhat.com on 2011-02-21 08:43:18 EST ---

Any customers who see this issue on RHEL6 are advised to open a support case on the Red Hat Customer Portal at https://access.redhat.com and contact Red Hat Global Support Services from there.

The cloned Bug 638711 is the correct bug to follow for RHEL6, this is the one of Fedora.

--- Additional comment from wolfgang.pichler@ivv.tuwien.ac.at on 2011-03-14 05:39:22 EDT ---

kernel-2.6.35.11-83.fc14.i686
lvm2-2.02.73-3.fc14.i686
udev-161-8.fc14.i686
udisks-1.0.1-4.fc14.i686
device-mapper-1.02.54-3.fc14.i686

failing lvremove -f /dev/root-snap every week ;-(((

have now done the "commenting out" -

any status changes ???
(and tired of googling the issue)

--- Additional comment from iand@ekit-inc.com on 2011-03-14 06:49:26 EDT ---

My workaround was to add this to /etc/rc.local:

sed -e 's/^KERNEL=="dm-\*", OPTIONS+="watch"/#KERNEL=="dm-*", OPTIONS+="watch"/' < /lib/udev/rules.d/80-udisks.rules > /lib/udev/rules.d/80-udisks.rules.tmp
cp /lib/udev/rules.d/80-udisks.rules.tmp /lib/udev/rules.d/80-udisks.rules
rm /lib/udev/rules.d/80-udisks.rules.tmp


which keeps working after a yum update of the udisks package
(well until somebody changes that line...)

--- Additional comment from bugzilla@in-egypt.net on 2011-04-09 22:55:17 EDT ---

was trying to remove an lv and couldn't found this bug report and so i did the following:
[root@laptop ~]# lvremove -f /dev/vg_laptop/lv_home
  Can't remove open logical volume "lv_home"
[root@laptop ~]# while [ $? -eq "5" ]; do lvremove -f /dev/vg_laptop/lv_home ; done
  Can't remove open logical volume "lv_home"
  Can't remove open logical volume "lv_home"
  Can't remove open logical volume "lv_home"
  Logical volume "lv_home" successfully removed


voila!

--- Additional comment from cyberrider@esmarshall.com on 2011-04-10 01:28:29 EDT ---

(In reply to comment #22)
> was trying to remove an lv and couldn't found this bug report and so i did the
> following:
> [root@laptop ~]# lvremove -f /dev/vg_laptop/lv_home
>   Can't remove open logical volume "lv_home"
> [root@laptop ~]# while [ $? -eq "5" ]; do lvremove -f /dev/vg_laptop/lv_home ;
> done
>   Can't remove open logical volume "lv_home"
>   Can't remove open logical volume "lv_home"
>   Can't remove open logical volume "lv_home"
>   Logical volume "lv_home" successfully removed
> 
> 
> voila!

Yes, I've seen that too.
What you've done is retry enough times so that at the instant when udev/udisks don't have any open references to the logical volume, the lvremove command actually succeeds.

Unfortunately, I've seen an "lvremove -f" work on the initial attempt, and at other times never succeed (well "never" in that I stopped trying after a forced loop of 100 iterations).

This is why most of us are having to use the "dmsetup remove" workaround to remove the logical volume(s) once they've been unmounted, or otherwise closed.

--- Additional comment from zart@zartsoft.ru on 2011-05-01 06:16:28 EDT ---

# lvchange -v -an /dev/vg_name/lv_name
# lvremove -v /dev/vg_name/lv_name

works for me without force switches, hope might be useful as a workaround.

--- Additional comment from cyberrider@esmarshall.com on 2011-05-01 07:55:08 EDT ---

If you're scripting, or otherwise automating the lvremove, you will still need to use the "lvremove -f" option, otherwise you will be prompted whether you want to remove the referenced LV or not.

You don't want a script, especially one invoked through "at" or "cron" to prompt for a response.

--- Additional comment from prajnoha@redhat.com on 2011-05-30 04:28:24 EDT ---

We've applied a patch upstream that tries to minimize device RW open calls within the LVM itself. This should also prevent the events based on the watch rule from being fired when not necessary, at least with respect to internal LVM handling of devices:

  https://www.redhat.com/archives/lvm-devel/2011-May/msg00025.html (LVM2 v2.02.86)

However, there's still a possibility that somone else, externally, will open a device for read-write and close it (which will cause the uevent to occur) just before the device is removed and so we could end up with the same problem as reported here - in this case, we have no control over this asynchronicity.

(For a hassle about the watch rule and more related information see also
https://bugzilla.redhat.com/show_bug.cgi?id=561424)

--- Additional comment from triage@lists.fedoraproject.org on 2011-06-02 11:48:27 EDT ---


This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

--- Additional comment from iand@ekit-inc.com on 2011-06-02 19:32:25 EDT ---

Still present in FC14; how does one change the Version of the report?
(its not a link)

--- Additional comment from bugzilla-redhat@brianneu.com on 2011-06-02 20:03:21 EDT ---

Upper right corner, "Clone this bug" .  Save as a F14 bug.
Comment 1 Robert Story 2011-06-08 10:00:30 EDT
I have seen this too, but with a regular (non-snapshot) lvm volume..
Comment 2 Peter Rajnoha 2012-02-02 07:06:27 EST
The "retry-remove" patch is part of the lvm2 v2.02.89 release (released 26th January 2012 after a longer time) but I don't expect there's going to be any rebase or backport for F14 (it's nearing its end of life soon). Please, consider using a newer Fedora release.

If you still need to use this Fedora release, you can try to workaround this problem by commenting out the line 'KERNEL=="dm-*", OPTIONS+="watch"' found in /lib/udev/rules.d/80-udisks.rules (if udisks package is installed).

Sorry for any inconvenience.
Comment 3 Brian 2012-02-03 14:27:23 EST
Understandable for F14, but how about F15?
Comment 4 Peter Rajnoha 2012-02-06 04:16:47 EST
I don't think there's going to be a rebase for F15 to include that patch. Backporting it is a more viable idea. If there's not going to be any update (for whatever reason), I'll probably do an 'out-of-band' build with that patch and attach it to the bz #712100 for direct download. Please, watch that bz for any changes. Thanks.

Note You need to log in before you can comment on or make changes to this bug.