Bug 145342 - Snapshot of root filesystem hangs system
Snapshot of root filesystem hangs system
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: lvm2 (Show other bugs)
rawhide
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Milan Broz
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-17 12:25 EST by Yulianto Z
Modified: 2013-02-28 23:04 EST (History)
8 users (show)

See Also:
Fixed In Version: F7
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-11 13:36:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output of command, with and withour strace (113.87 KB, text/plain)
2005-01-17 12:27 EST, Yulianto Z
no flags Details

  None (edit)
Description Yulianto Z 2005-01-17 12:25:28 EST
Description of problem:
When running lvcreate on LINUX-RAID5 (suspect system with no RAID
affected as well), to create snapshot of mounted root-fs partition
(/), system will hang.
After forced reboot (pressing reset button), system will complaining
about unclean filesystem, and continue to work properly, with working
snapshot that being created.

Ping-ing system while hang, show that network part still running (ping
result show no loss).

Running strace on command, show that it stuck on DM_DEV_CREATE
(complete output, attached):
-------------------------
  write(1, "    Found volume group \"vg00\"\n", 30    Found volume
group "vg00"
  ) = 30
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = 0
  ioctl(3, DM_LIST_DEVICES, 0x8e75748)    = 0
  ioctl(3, DM_DEV_STATUS, 0x8e79750)      = 0
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  write(1, "    Loading vg00-snap-cow\n", 26    Loading vg00-snap-cow
) = 26
  ioctl(3, DM_DEV_CREATE
-------------------------

HDD LED shows no activity.
My best guess, is disk-IO access being blocked (deadlock?).


Version-Release number of selected component (if applicable):
lvm2-2.00.25-1.01

How reproducible:
Always

Steps to Reproduce:
1. Free some extents from log-vol's vol-group (basic requirement of
snapshot creation).
2. lvcreate -s -l <number of extents> -n snap -pr -v /dev/vg00/rootfs
3. Hang.
    

Actual Results:  System hang. But "ping"-ing system, show that network
part still running.
After pressing reset button, bring back the system on, it shows that
snapshot created properly.

Expected Results:  Snapshot created and system not hang.

Additional info:

I have tested on 2 system:
1. FC3 with RAID1 (/boot) and RAID5 (/).
2. FC3 with normal installation (no RAID, with LVM from installer).

Both system behave identically. But I did test a lot on RAID5 system.
My primary test system, using 3 HDD with:
- RAID1=/dev/md0=/dev/hda1+/dev/hdb1+/dev/hdc1 (/boot)
- RAID5=/dev/md1=/dev/hda2+/dev/hdb2+/dev/hdc2 (/) => /dev/vg00/rootfs.
- swap=/dev/hda3+/dev/hdb3+/dev/hdc3

I spare 2GB extents from /dev/vg00/rootfs for snapshot LV. Once the
system up, after pushing reset button, I did test accessing
/dev/vg00/snap (result of lvcreate -s). And it work normally, no error
at all. Removing /dev/vg00/snap using lvremove also work well.
Comment 1 Yulianto Z 2005-01-17 12:27:53 EST
Created attachment 109869 [details]
Output of command, with and withour strace

lvcreate -s output
Comment 2 Alasdair Kergon 2005-01-17 12:42:16 EST
Snapshots of the root filesystem aren't supported yet because of
issues like this.
Comment 3 Alasdair Kergon 2005-01-27 15:30:56 EST
Some testing has shown that mounting the root filesystem noatime
improves things.
Comment 4 Damian Menscher 2005-08-30 19:18:09 EDT
"Me too" for RHEL4 running on an x86/smp kernel.

If there is "functionality" that is likely to cause a system to hang, it really
should be documented somewhere (man pages?).  It's a bit frustrating to have to
search bugzilla or mailing lists to find out that RedHat has known for
months/years that your server was going to crash repeatedly on you, but didn't
issue any warning.
Comment 5 Alasdair Kergon 2005-09-21 17:18:06 EDT
See also RHEL4 bug 168824
Comment 6 Matthew Miller 2006-07-10 18:13:44 EDT
Fedora Core 3 is now maintained by the Fedora Legacy project for security
updates only. If this problem is a security issue, please reopen and
reassign to the Fedora Legacy product. If it is not a security issue and
hasn't been resolved in the current FC5 updates or in the FC6 test
release, reopen and change the version to match.

Thank you!
Comment 7 Milan Broz 2007-06-11 13:36:45 EDT
There were some changes to device mapper and lvm2 which resolved some deadlock
situations when manipulating over root volume.

I tested snaphosts of logical volume with live root filesystem (even with
undesirable ovefilling snapshot) and it works in Fedora7 - at least with my
configuration.

Closing this bug - please if you hit this problem again, reopen it and attach
debugging information.

Note You need to log in before you can comment on or make changes to this bug.