Description of problem: When running lvcreate on LINUX-RAID5 (suspect system with no RAID affected as well), to create snapshot of mounted root-fs partition (/), system will hang. After forced reboot (pressing reset button), system will complaining about unclean filesystem, and continue to work properly, with working snapshot that being created. Ping-ing system while hang, show that network part still running (ping result show no loss). Running strace on command, show that it stuck on DM_DEV_CREATE (complete output, attached): ------------------------- write(1, " Found volume group \"vg00\"\n", 30 Found volume group "vg00" ) = 30 ioctl(3, DM_DEV_STATUS, 0x8e75748) = 0 ioctl(3, DM_LIST_DEVICES, 0x8e75748) = 0 ioctl(3, DM_DEV_STATUS, 0x8e79750) = 0 ioctl(3, DM_DEV_STATUS, 0x8e75748) = -1 ENXIO (No such device or address) ioctl(3, DM_DEV_STATUS, 0x8e75748) = -1 ENXIO (No such device or address) ioctl(3, DM_DEV_STATUS, 0x8e75748) = -1 ENXIO (No such device or address) ioctl(3, DM_DEV_STATUS, 0x8e75748) = -1 ENXIO (No such device or address) ioctl(3, DM_DEV_STATUS, 0x8e75748) = -1 ENXIO (No such device or address) ioctl(3, DM_DEV_STATUS, 0x8e75748) = -1 ENXIO (No such device or address) write(1, " Loading vg00-snap-cow\n", 26 Loading vg00-snap-cow ) = 26 ioctl(3, DM_DEV_CREATE ------------------------- HDD LED shows no activity. My best guess, is disk-IO access being blocked (deadlock?). Version-Release number of selected component (if applicable): lvm2-2.00.25-1.01 How reproducible: Always Steps to Reproduce: 1. Free some extents from log-vol's vol-group (basic requirement of snapshot creation). 2. lvcreate -s -l <number of extents> -n snap -pr -v /dev/vg00/rootfs 3. Hang. Actual Results: System hang. But "ping"-ing system, show that network part still running. After pressing reset button, bring back the system on, it shows that snapshot created properly. Expected Results: Snapshot created and system not hang. Additional info: I have tested on 2 system: 1. FC3 with RAID1 (/boot) and RAID5 (/). 2. FC3 with normal installation (no RAID, with LVM from installer). Both system behave identically. But I did test a lot on RAID5 system. My primary test system, using 3 HDD with: - RAID1=/dev/md0=/dev/hda1+/dev/hdb1+/dev/hdc1 (/boot) - RAID5=/dev/md1=/dev/hda2+/dev/hdb2+/dev/hdc2 (/) => /dev/vg00/rootfs. - swap=/dev/hda3+/dev/hdb3+/dev/hdc3 I spare 2GB extents from /dev/vg00/rootfs for snapshot LV. Once the system up, after pushing reset button, I did test accessing /dev/vg00/snap (result of lvcreate -s). And it work normally, no error at all. Removing /dev/vg00/snap using lvremove also work well.
Created attachment 109869 [details] Output of command, with and withour strace lvcreate -s output
Snapshots of the root filesystem aren't supported yet because of issues like this.
Some testing has shown that mounting the root filesystem noatime improves things.
"Me too" for RHEL4 running on an x86/smp kernel. If there is "functionality" that is likely to cause a system to hang, it really should be documented somewhere (man pages?). It's a bit frustrating to have to search bugzilla or mailing lists to find out that RedHat has known for months/years that your server was going to crash repeatedly on you, but didn't issue any warning.
See also RHEL4 bug 168824
Fedora Core 3 is now maintained by the Fedora Legacy project for security updates only. If this problem is a security issue, please reopen and reassign to the Fedora Legacy product. If it is not a security issue and hasn't been resolved in the current FC5 updates or in the FC6 test release, reopen and change the version to match. Thank you!
There were some changes to device mapper and lvm2 which resolved some deadlock situations when manipulating over root volume. I tested snaphosts of logical volume with live root filesystem (even with undesirable ovefilling snapshot) and it works in Fedora7 - at least with my configuration. Closing this bug - please if you hit this problem again, reopen it and attach debugging information.