145342 – Snapshot of root filesystem hangs system

Bug 145342 - Snapshot of root filesystem hangs system

Summary: Snapshot of root filesystem hangs system

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	lvm2
Sub Component:
Version:	rawhide
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Milan Broz
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-01-17 17:25 UTC by Yulianto Z
Modified:	2013-03-01 04:04 UTC (History)
CC List:	8 users (show)
Fixed In Version:	F7
Clone Of:
Environment:
Last Closed:	2007-06-11 17:36:45 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Output of command, with and withour strace (113.87 KB, text/plain) 2005-01-17 17:27 UTC, Yulianto Z	no flags	Details
View All

Description Yulianto Z 2005-01-17 17:25:28 UTC

Description of problem:
When running lvcreate on LINUX-RAID5 (suspect system with no RAID
affected as well), to create snapshot of mounted root-fs partition
(/), system will hang.
After forced reboot (pressing reset button), system will complaining
about unclean filesystem, and continue to work properly, with working
snapshot that being created.

Ping-ing system while hang, show that network part still running (ping
result show no loss).

Running strace on command, show that it stuck on DM_DEV_CREATE
(complete output, attached):
-------------------------
  write(1, "    Found volume group \"vg00\"\n", 30    Found volume
group "vg00"
  ) = 30
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = 0
  ioctl(3, DM_LIST_DEVICES, 0x8e75748)    = 0
  ioctl(3, DM_DEV_STATUS, 0x8e79750)      = 0
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  ioctl(3, DM_DEV_STATUS, 0x8e75748)      = -1 ENXIO (No such device
or address)
  write(1, "    Loading vg00-snap-cow\n", 26    Loading vg00-snap-cow
) = 26
  ioctl(3, DM_DEV_CREATE
-------------------------

HDD LED shows no activity.
My best guess, is disk-IO access being blocked (deadlock?).


Version-Release number of selected component (if applicable):
lvm2-2.00.25-1.01

How reproducible:
Always

Steps to Reproduce:
1. Free some extents from log-vol's vol-group (basic requirement of
snapshot creation).
2. lvcreate -s -l <number of extents> -n snap -pr -v /dev/vg00/rootfs
3. Hang.
    

Actual Results:  System hang. But "ping"-ing system, show that network
part still running.
After pressing reset button, bring back the system on, it shows that
snapshot created properly.

Expected Results:  Snapshot created and system not hang.

Additional info:

I have tested on 2 system:
1. FC3 with RAID1 (/boot) and RAID5 (/).
2. FC3 with normal installation (no RAID, with LVM from installer).

Both system behave identically. But I did test a lot on RAID5 system.
My primary test system, using 3 HDD with:
- RAID1=/dev/md0=/dev/hda1+/dev/hdb1+/dev/hdc1 (/boot)
- RAID5=/dev/md1=/dev/hda2+/dev/hdb2+/dev/hdc2 (/) => /dev/vg00/rootfs.
- swap=/dev/hda3+/dev/hdb3+/dev/hdc3

I spare 2GB extents from /dev/vg00/rootfs for snapshot LV. Once the
system up, after pushing reset button, I did test accessing
/dev/vg00/snap (result of lvcreate -s). And it work normally, no error
at all. Removing /dev/vg00/snap using lvremove also work well.

Comment 1 Yulianto Z 2005-01-17 17:27:53 UTC

Created attachment 109869 [details]
Output of command, with and withour strace

lvcreate -s output

Comment 2 Alasdair Kergon 2005-01-17 17:42:16 UTC

Snapshots of the root filesystem aren't supported yet because of
issues like this.

Comment 3 Alasdair Kergon 2005-01-27 20:30:56 UTC

Some testing has shown that mounting the root filesystem noatime
improves things.

Comment 4 Damian Menscher 2005-08-30 23:18:09 UTC

"Me too" for RHEL4 running on an x86/smp kernel.

If there is "functionality" that is likely to cause a system to hang, it really
should be documented somewhere (man pages?).  It's a bit frustrating to have to
search bugzilla or mailing lists to find out that RedHat has known for
months/years that your server was going to crash repeatedly on you, but didn't
issue any warning.

Comment 5 Alasdair Kergon 2005-09-21 21:18:06 UTC

See also RHEL4 bug 168824

Comment 6 Matthew Miller 2006-07-10 22:13:44 UTC

Fedora Core 3 is now maintained by the Fedora Legacy project for security
updates only. If this problem is a security issue, please reopen and
reassign to the Fedora Legacy product. If it is not a security issue and
hasn't been resolved in the current FC5 updates or in the FC6 test
release, reopen and change the version to match.

Thank you!

Comment 7 Milan Broz 2007-06-11 17:36:45 UTC

There were some changes to device mapper and lvm2 which resolved some deadlock
situations when manipulating over root volume.

I tested snaphosts of logical volume with live root filesystem (even with
undesirable ovefilling snapshot) and it works in Fedora7 - at least with my
configuration.

Closing this bug - please if you hit this problem again, reopen it and attach
debugging information.

Note You need to log in before you can comment on or make changes to this bug.