Bug 188013

Summary: using lvcreate --snapshot doesn't create a genuine snapshot or locks up the system when removing with lvremove
Product: [Fedora] Fedora Reporter: Dennis Ortsen <dortsen>
Component: lvm2Assignee: Alasdair Kergon <agk>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: mbroz, petr.tuma
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-27 09:50:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dennis Ortsen 2006-04-05 12:22:18 UTC
Description of problem:


When creating a snapshot volume you get a message saying the newly created
snapshot volume is in use and will not be removed/not able to deactivate new
snapshot:

localhost.localdomain:~# lvcreate -s -n snap -L 512M /dev/vg/home
   LV vg/snap in use: not removing
   Couldn't deactivate new snapshot.

When examining the attributes of the various volumes, the snapshot volume
doesn't have the expected attributes or origin volume set:

localhost.localdomain:~# lvs
   LV         VG     Attr     LSize    Origin    Snap%   Move    Copy%
   home       vg     -wi-ao   500.00M
   snap       vg     -wi-a-   512M
   log        vg     -wi-oa   500.00M

When you try to remove the snapshot volume, that doesn't seem to be a problem.
However, when you try to create another (or new after a reboot) snapshot volume
with a somewhat larger size (lets say 2G) then the creation succeeds and it
appears to be a valid snapshot volume:

localhost.localdomain:~# lvs
   LV         VG     Attr     LSize    Origin    Snap%   Move    Copy%
   home       vg     -wi-ao   500.00M
   snap       vg     swi-a-   2.00G    home      0.02
   log        vg     -wi-oa   500.00M

But when you try to remove that snapshot volume, you'll lock up the entire
system. Nothing responds anymore. The only way out is to press the reset button.
When booting, you'll run into a problem when activating the volumegroup because
the snapshot volume hasn't been removed. You'll need to boot from a rescue CD in
order to remove the snapshot volume. Then you'll be able to boot normally.

How reproducible:
All the time, take a recent kernel (the latest seems to have a problem anyway)
and try creating a snapshot volume. When doing a yum update of a few packages
that don't seem to have any relation with LVM2, the problem also appears. When
updating pciutils, tzdata or xterm for example to the latest version, the
problem with the snapshots appear.


Additional info:

kernel: 2.6.15-1.1833_FC4smp
lvm2: lvm2-2.01.08-2.1

Any piece of x86 hardware should experience the same behaviour. There is no
trace of an error message in any logfile.

Comment 1 Petr Tuma 2006-04-06 09:50:06 UTC
I seem to have the same problem - I run a regular backup script, something like:

lvcreate -L64G -s -n Snapshot /dev/VolGroup00/LogVol00
mount -o ro /dev/VolGroup00/Snapshot /mnt/snapshot
tar ...
umount /dev/VolGroup00/Snapshot
lvremove -f /dev/VolGroup00/Snapshot

For some months now, I've been observing a strange behavior where lvcreate
sometimes (not in all cases but in most cases) reports "Couldn't deactivate new
snapshot." A few repetitions of lvremove/lvcreate did, however, fix the problem.

Since about two weeks ago (probably after some updates), the behavior has
changed. Right now, an attempt at lvremove of the snapshot hangs the system.
I've only found one similar report on LVM mailing list from summer 2005, but
that thread has died off without resolution.


Comment 2 Dennis Ortsen 2006-05-19 17:29:49 UTC
I've spoken to someone (forgot the name) from RedHat at SANE-2006. He confirmed
issues like these. He also said that these problems *should* have been solved
since kernel 2.6.16 and lvm2-2.02. Any pre-2.6.16 kernel is likely to have
problems with snapshots.

Also when you update the kernel to at least 2.6.16, you MUST also update lvm2 to
2.02. The fixes meant changing some ioctl calls. That's why you also have to
update to at least lvm2-2.02 

Comment 3 Dennis Ortsen 2006-05-29 18:33:46 UTC
Today we have tested with the newer kernel (2.6.16-1.2111_FC4smp) and made new
RPMS of the SRPMS provided by RedHat/Fedora (device-mapper-1.02.07-1.0 and
lvm2-2.02.06-1.2.1). We hoped to see an improvement but sadly no change...

BTW, in my former comment I mentioned I've spoken to someone at SANE2006 in
Delft (The Netherlands). I remembered the name. I've spoken to Mr. Kergon
himself :-)

Comment 4 Dennis Ortsen 2006-07-25 11:07:42 UTC
I have installed the updates (kernel-smp-2.6.17-1.2142_FC4, lvm2-2.02.06-1.0.fc4
and device-mapper-1.02.07-2.0 from FC4 updates by AGK). I tested the
updates a few times (about 50) with some reboots in between and it all seemed to
be working like a charm again. Until I updated a (semi) production server...
Luckily it isn't in production yet, but the problem remains on that server...

However when I shoot down udevd (udev-071-0.FC4.3) I can create as many snapshot
devices as I like. Until the udevd daemon has been respawned. Then it all stops
working (the creation of snapshot volumes). When I disable udev in
/etc/rc.sysinit (place comments before start_udev and udevsend) and reboot, I
can create snapshot volumes a dozen of times without problems. Until I manually
start udevd. Then the fun is over. When I kill udevd again, I can continue
creating snapshot volumes. Both servers (the one that works and the one that
doesn't) have the exact same versions of the kernel, lvm2, device-mapper and udev.

My best guess is udev that is messing about. I don't know whether I can safely
disable udev on a production server.

Comment 5 Dennis Ortsen 2006-07-27 09:50:13 UTC
I have thrown my problem on the linux-lvm mailing-list, maintained by RedHat. I
received a solution for my problem that works.

the problem seems to be related to udev and device-mapper. see:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=343671 for more information.
The short answer is:

in /etc/udev/rules.d/50-udev.rules you need to replace (or simply add) the
following line:

#KERNEL=="dm-[0-9]*",           NAME=""

with this one:

KERNEL=="dm-[0-9]*",            OPTIONS+="ignore_device"

In my case I only needed to add the second line.

reboot and your done!