Bug 145342

Summary: Snapshot of root filesystem hangs system
Product: [Fedora] Fedora Reporter: Yulianto Z <sehari24jam>
Component: lvm2Assignee: Milan Broz <mbroz>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: agk, dwysocha, mattdm, mbroz, menscher, nospam, pvrabec, zing
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: F7 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-11 17:36:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of command, with and withour strace none

Description Yulianto Z 2005-01-17 17:25:28 UTC
Description of problem:
When running lvcreate on LINUX-RAID5 (suspect system with no RAID
affected as well), to create snapshot of mounted root-fs partition
(/), system will hang.
After forced reboot (pressing reset button), system will complaining
about unclean filesystem, and continue to work properly, with working
snapshot that being created.

Ping-ing system while hang, show that network part still running (ping
result show no loss).

Running strace on command, show that it stuck on DM_DEV_CREATE
(complete output, attached):
-------------------------
  write(1, "    Found volume group \"vg00\"\n", 30    Found volume
group "vg00"
  ) = 30
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = 0
  ioctl(3, DM_LIST_DEVICES, 0x8e75748)    = 0
  ioctl(3, DM_DEV_STATUS, 0x8e79750)      = 0
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  write(1, "    Loading vg00-snap-cow\n", 26    Loading vg00-snap-cow
) = 26
  ioctl(3, DM_DEV_CREATE
-------------------------

HDD LED shows no activity.
My best guess, is disk-IO access being blocked (deadlock?).


Version-Release number of selected component (if applicable):
lvm2-2.00.25-1.01

How reproducible:
Always

Steps to Reproduce:
1. Free some extents from log-vol's vol-group (basic requirement of
snapshot creation).
2. lvcreate -s -l <number of extents> -n snap -pr -v /dev/vg00/rootfs
3. Hang.
    

Actual Results:  System hang. But "ping"-ing system, show that network
part still running.
After pressing reset button, bring back the system on, it shows that
snapshot created properly.

Expected Results:  Snapshot created and system not hang.

Additional info:

I have tested on 2 system:
1. FC3 with RAID1 (/boot) and RAID5 (/).
2. FC3 with normal installation (no RAID, with LVM from installer).

Both system behave identically. But I did test a lot on RAID5 system.
My primary test system, using 3 HDD with:
- RAID1=/dev/md0=/dev/hda1+/dev/hdb1+/dev/hdc1 (/boot)
- RAID5=/dev/md1=/dev/hda2+/dev/hdb2+/dev/hdc2 (/) => /dev/vg00/rootfs.
- swap=/dev/hda3+/dev/hdb3+/dev/hdc3

I spare 2GB extents from /dev/vg00/rootfs for snapshot LV. Once the
system up, after pushing reset button, I did test accessing
/dev/vg00/snap (result of lvcreate -s). And it work normally, no error
at all. Removing /dev/vg00/snap using lvremove also work well.

Comment 1 Yulianto Z 2005-01-17 17:27:53 UTC
Created attachment 109869 [details]
Output of command, with and withour strace

lvcreate -s output

Comment 2 Alasdair Kergon 2005-01-17 17:42:16 UTC
Snapshots of the root filesystem aren't supported yet because of
issues like this.

Comment 3 Alasdair Kergon 2005-01-27 20:30:56 UTC
Some testing has shown that mounting the root filesystem noatime
improves things.

Comment 4 Damian Menscher 2005-08-30 23:18:09 UTC
"Me too" for RHEL4 running on an x86/smp kernel.

If there is "functionality" that is likely to cause a system to hang, it really
should be documented somewhere (man pages?).  It's a bit frustrating to have to
search bugzilla or mailing lists to find out that RedHat has known for
months/years that your server was going to crash repeatedly on you, but didn't
issue any warning.

Comment 5 Alasdair Kergon 2005-09-21 21:18:06 UTC
See also RHEL4 bug 168824

Comment 6 Matthew Miller 2006-07-10 22:13:44 UTC
Fedora Core 3 is now maintained by the Fedora Legacy project for security
updates only. If this problem is a security issue, please reopen and
reassign to the Fedora Legacy product. If it is not a security issue and
hasn't been resolved in the current FC5 updates or in the FC6 test
release, reopen and change the version to match.

Thank you!


Comment 7 Milan Broz 2007-06-11 17:36:45 UTC
There were some changes to device mapper and lvm2 which resolved some deadlock
situations when manipulating over root volume.

I tested snaphosts of logical volume with live root filesystem (even with
undesirable ovefilling snapshot) and it works in Fedora7 - at least with my
configuration.

Closing this bug - please if you hit this problem again, reopen it and attach
debugging information.