577798 – [regression] 'lvremove' fails sometimes to remove snapshot volumes

Bug 577798 - [regression] 'lvremove' fails sometimes to remove snapshot volumes

Summary: [regression] 'lvremove' fails sometimes to remove snapshot volumes

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	lvm2
Sub Component:
Version:	13
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Peter Rajnoha
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	638711
TreeView+	depends on / blocked

Reported:	2010-03-29 09:44 UTC by Enrico Scholz
Modified:	2012-03-25 13:50 UTC (History)
CC List:	31 users (show)
Fixed In Version:
Clone Of:
Clones:	638711 710305 712100 715624 (view as bug list)
Environment:
Last Closed:	2011-06-03 09:11:01 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace in non-working case (255.54 KB, text/plain) 2010-03-29 09:52 UTC, Enrico Scholz	no flags	Details
strace in working case (278.05 KB, text/plain) 2010-03-29 09:52 UTC, Enrico Scholz	no flags	Details
View All

Description Enrico Scholz 2010-03-29 09:44:29 UTC

Description of problem:

# lvdisplay

  --- Logical volume ---
  LV Name                /dev/vg01/.local.backup
  VG Name                vg01
  LV UUID                hkAyO4-M31g-LJw5-Kdcu-AfK1-Bquw-buVrWA
  LV Write Access        read only
  LV snapshot status     active destination for /dev/vg01/local
  LV Status              available
  # open                 0
  LV Size                2,00 GiB
  Current LE             512
  COW-table size         1,00 GiB
  COW-table LE           256
  Allocated to snapshot  0,01% 
  Snapshot chunk size    4,00 KiB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:29

# /dev/vg01/.local.backup
  Can't remove open logical volume ".local.backup"

  ... repeating this several times ...

# lvremove  /dev/vg01/.local.backup
Do you really want to remove active logical volume .local.backup? [y/n]: y
  Logical volume ".local.backup" successfully removed


Version-Release number of selected component (if applicable):

kernel-2.6.33.1-19.fc13.x86_64
lvm2-2.02.61-1.fc13.x86_64

Comment 1 Enrico Scholz 2010-03-29 09:52:00 UTC

Created attachment 403250 [details]
strace in non-working case

Comment 2 Enrico Scholz 2010-03-29 09:52:28 UTC

Created attachment 403251 [details]
strace in working case

Comment 3 Peter Rajnoha 2010-03-29 10:08:36 UTC

Do you have "udisks" package installed? There is one udev rule that could have possibly caused this...

For starters, just a quick check - could you please try to kill udev daemon temporarily and see if you can reproduce the problem? Thanks.

Comment 4 Enrico Scholz 2010-03-30 11:36:02 UTC

yes; udisks is installed and I can not reproduce the issue after its removal.

'udevadm control --stop-exec-queue' before lvremove seems to work too.

Comment 5 Peter Rajnoha 2010-03-30 12:08:22 UTC

Just for the record, the rule we have problem supporting is this one exactly (in /lib/udev/rules.d/80-udisks.rules which is a part of udisks package):

# Make udevd synthesize a 'change' uevent when last opener of a rw-fd closes the fd - this
# should be part of the device-mapper rules
KERNEL=="dm-*", OPTIONS+="watch"

We have added udev synchronisation feature in device-mapper/lvm2 recently so we always wait until udev processing is settled down to cope with such problems where devices are accessed from within udev rules and also to provide a way to wait for nodes/symlinks to be created.

However, we can't synchronize with events synthesized as a result of this rule (like we can't with events originating in "udevadm trigger" which generates such events as well). The synchronisation could be done on events we know about (events originated in device-mapper itself).

There are still ongoing discussions with udev team to properly deal with this issue though...

Now, could you please keep the udisks package and also keep udev running while having the "watch" rule commented out and see if the problem is gone? Thanks.

(..so we're really sure this is exactly the case and as a proof)

Comment 6 Enrico Scholz 2010-03-30 12:58:11 UTC

yes; after commenting out this line I can not reproduce the error anymore.

Comment 7 Peter Rajnoha 2010-03-30 13:20:24 UTC

So we have another nice and practical example how the "watch" rule disrupts an idea to properly synchronize with udev events...

Comment 8 Peter Rajnoha 2010-03-30 13:26:06 UTC

...also, CC-ing David (the "udisks" package maintainer).

Comment 9 Enrico Scholz 2010-05-13 10:19:18 UTC

still with

lvm2-2.02.61-1.fc13.x86_64
udisks-1.0.1-1.fc13.x86_64

Comment 10 Peter Rajnoha 2010-05-17 13:07:01 UTC

(In reply to comment #9)
> still with
> 
> lvm2-2.02.61-1.fc13.x86_64
> udisks-1.0.1-1.fc13.x86_64    

Unfortunately, we still don't have a proper solution for synchronization with the events like the ones originated in the "watch" rule.

Comment 11 Brian 2010-11-06 22:04:08 UTC

Is there a command that can be issues to assure that lvremove will work?

I've noticed that if I just run it repeatedly, it eventually works, but I don't exactly want to script a loop like that.

Or is commenting out the "watch" line the only solution available?

Comment 12 Peter Rajnoha 2010-11-08 09:21:13 UTC

(In reply to comment #11)
> Is there a command that can be issues to assure that lvremove will work?
> 
> I've noticed that if I just run it repeatedly, it eventually works, but I don't
> exactly want to script a loop like that.
> 
> Or is commenting out the "watch" line the only solution available?

For now, yes, this is the only solution we have, unfortunately. I'm sorry for any inconvenience that this brings. I'll provide an update as soon as we have a decent solution that will be acceptable for both for device-mapper and udev/udisks...

Comment 13 Scott Marshall 2010-11-27 09:38:13 UTC

I have a similar problem.
Scenario: Want to back IMAP mail spools and DB with minimum outage to cyrus-imap.

Solution: Halt Cyrus; Snapshot IMAP DB & mail spool file systems; mount the snapshot file systems as RO on some mount-point; restart Cyrus; backup RO file systems (ie snapshots); umount and throw away snapshots once backup complete.

Everything works fine until it's time to "lvremove -f ${VG}/${IMAPMB_SNAP} ${VG}/${IMAPDB_SNAP}"

The umount works but the lvremove fails, and "lvs" reports the snapshot LVs as being closed, but they're still showing as active.


I have a work-around that appears to work using "dmsetup".
(extracted from my script)

###### ====== extract from savemail script ====== ######
snap_off()
{
## First, force a flush of disc buffers
sync

## Now we dismount the snapshot file system copies
printf "Dismounting snapshot filesystems...\t"
umount ${SNAPROOT}/${IMAPDB_DIR} ${SNAPROOT}/${IMAPMB_DIR}
printf "Done!\n"

## Pause for a bit so the file systems can complete their dismount
sleep 10

## Flush any buffers to disc again - just to be sure
sync

## Wait another 10 seconds for everything to stabilise
sleep 10

### I have to use "dmsetup remove" to deactivate the snapshots first
for SNAPVOL in ${VG}-${IMAPMB_SNAP} ${VG}-${IMAPDB_SNAP}; do
  printf "Deactivating snapshot volume %s\n" ${SNAPVOL}
  dmsetup remove ${SNAPVOL}
  dmsetup remove ${SNAPVOL}-cow
## for some reason, the copy-on-write devices aren't cleaned up auto-magically
## so I have to remove them auto-manually.
done

## Okay - now we can remove the snapshot logical volumes
lvremove -f ${VG}/${IMAPMB_SNAP} ${VG}/${IMAPDB_SNAP}
}

###### ====== end of script extract ====== ######

I am now able to consistently and reliably tear down the snapshot volumes once my backup completes.

Personally, I would prefer a simpler command when working with snaphots.

Perhaps something like:
lvsnapshot accept <LogicalVolumePath> [<LogicalVolumePath>...]
and
lvsnapshot revert <LogicalVolumePath> [<LogicalVolumePath>...]

where "accept" means "throw away the snapshot and continue with all updates since snapshot created"
and "revert" means "throw all modifications since snapshot creation, and return to the point at which the snapshot was created"

A modifier flag (eg "-k" or "--keep") would retain the snapshot volume that had been created; this would allow an easy way to accept changes and re-checkpoint a snapshot; or in the case of "revert" allow you to revert to the file system as it was as many times as you like.  (Excellent in a classroom situation for example).

Comment 14 Scott Marshall 2010-11-27 09:40:36 UTC

Forgot to add:

package versions are as follows:

kernel-2.6.34.7-61.fc13.x86_64
lvm2-2.02.73-2.fc13.x86_64
udev-153-4.fc13.x86_64
udisks-1.0.1-4.fc13.x86_64

Comment 15 Scott Marshall 2010-11-27 09:46:26 UTC

And the device-mapper package is device-mapper-1.02.54-2.fc13.x86_64

Comment 16 Phil Anderson 2011-01-04 16:16:48 UTC

Also is a problem on RHEL 6.

Comment 17 Ian Donaldson 2011-02-21 00:45:22 UTC

Also a problem on FC14

Comment 18 walter.haidinger 2011-02-21 08:08:57 UTC

On RHEL6, the workaround of comment #5 (commenting out the watch rule) does NOT work for me.
kernel-2.6.32-71.14.1.el6.x86_64
lvm2-2.02.72-8.el6_0.4.x86_64
udev-147-2.29.el6.x86_64
udisks-1.0.1-2.el6.x86_64

Comment 19 Niels de Vos 2011-02-21 13:43:18 UTC

Any customers who see this issue on RHEL6 are advised to open a support case on the Red Hat Customer Portal at https://access.redhat.com and contact Red Hat Global Support Services from there.

The cloned Bug 638711 is the correct bug to follow for RHEL6, this is the one of Fedora.

Comment 20 wolfgang pichler 2011-03-14 09:39:22 UTC

kernel-2.6.35.11-83.fc14.i686
lvm2-2.02.73-3.fc14.i686
udev-161-8.fc14.i686
udisks-1.0.1-4.fc14.i686
device-mapper-1.02.54-3.fc14.i686

failing lvremove -f /dev/root-snap every week ;-(((

have now done the "commenting out" -

any status changes ???
(and tired of googling the issue)

Comment 21 Ian Donaldson 2011-03-14 10:49:26 UTC

My workaround was to add this to /etc/rc.local:

sed -e 's/^KERNEL=="dm-\*", OPTIONS+="watch"/#KERNEL=="dm-*", OPTIONS+="watch"/' < /lib/udev/rules.d/80-udisks.rules > /lib/udev/rules.d/80-udisks.rules.tmp
cp /lib/udev/rules.d/80-udisks.rules.tmp /lib/udev/rules.d/80-udisks.rules
rm /lib/udev/rules.d/80-udisks.rules.tmp


which keeps working after a yum update of the udisks package
(well until somebody changes that line...)

Comment 22 Mohammed Arafa 2011-04-10 02:55:17 UTC

was trying to remove an lv and couldn't found this bug report and so i did the following:
[root@laptop ~]# lvremove -f /dev/vg_laptop/lv_home
  Can't remove open logical volume "lv_home"
[root@laptop ~]# while [ $? -eq "5" ]; do lvremove -f /dev/vg_laptop/lv_home ; done
  Can't remove open logical volume "lv_home"
  Can't remove open logical volume "lv_home"
  Can't remove open logical volume "lv_home"
  Logical volume "lv_home" successfully removed


voila!

Comment 23 Scott Marshall 2011-04-10 05:28:29 UTC

(In reply to comment #22)
> was trying to remove an lv and couldn't found this bug report and so i did the
> following:
> [root@laptop ~]# lvremove -f /dev/vg_laptop/lv_home
>   Can't remove open logical volume "lv_home"
> [root@laptop ~]# while [ $? -eq "5" ]; do lvremove -f /dev/vg_laptop/lv_home ;
> done
>   Can't remove open logical volume "lv_home"
>   Can't remove open logical volume "lv_home"
>   Can't remove open logical volume "lv_home"
>   Logical volume "lv_home" successfully removed
> 
> 
> voila!

Yes, I've seen that too.
What you've done is retry enough times so that at the instant when udev/udisks don't have any open references to the logical volume, the lvremove command actually succeeds.

Unfortunately, I've seen an "lvremove -f" work on the initial attempt, and at other times never succeed (well "never" in that I stopped trying after a forced loop of 100 iterations).

This is why most of us are having to use the "dmsetup remove" workaround to remove the logical volume(s) once they've been unmounted, or otherwise closed.

Comment 24 Konstantin Zemlyak 2011-05-01 10:16:28 UTC

# lvchange -v -an /dev/vg_name/lv_name
# lvremove -v /dev/vg_name/lv_name

works for me without force switches, hope might be useful as a workaround.

Comment 25 Scott Marshall 2011-05-01 11:55:08 UTC

If you're scripting, or otherwise automating the lvremove, you will still need to use the "lvremove -f" option, otherwise you will be prompted whether you want to remove the referenced LV or not.

You don't want a script, especially one invoked through "at" or "cron" to prompt for a response.

Comment 26 Peter Rajnoha 2011-05-30 08:28:24 UTC

We've applied a patch upstream that tries to minimize device RW open calls within the LVM itself. This should also prevent the events based on the watch rule from being fired when not necessary, at least with respect to internal LVM handling of devices:

  https://www.redhat.com/archives/lvm-devel/2011-May/msg00025.html (LVM2 v2.02.86)

However, there's still a possibility that somone else, externally, will open a device for read-write and close it (which will cause the uevent to occur) just before the device is removed and so we could end up with the same problem as reported here - in this case, we have no control over this asynchronicity.

(For a hassle about the watch rule and more related information see also
https://bugzilla.redhat.com/show_bug.cgi?id=561424)

Comment 27 Bug Zapper 2011-06-02 15:48:27 UTC

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 28 Ian Donaldson 2011-06-02 23:32:25 UTC

Still present in FC14; how does one change the Version of the report?
(its not a link)

Comment 29 Brian 2011-06-03 00:03:21 UTC

Upper right corner, "Clone this bug" .  Save as a F14 bug.

Note You need to log in before you can comment on or make changes to this bug.

agk
bmarzins
bmr
bugzilla
bugzilla-redhat
centaur
cyberrider
davidz
dwysocha
frank
heinzm
idonaldson0
jonathan
kay.sievers
kparal
kueda
liko
llowrey
lvm-team
mark
mbroz
msnitzer
ndevos
nls1729
non7top
prajnoha
prockai
pza
walter.haidinger
wolfgang.pichler
zart