Created attachment 791409 [details] Script used to automate snapshot create/remove and reproduce the error during snapshot creation Description of problem: Creating an LVM snapshot occasionally fails. I found this issue while running rsnapshot to backup my system, but I can easily reproduce it by creating and removing snapshots. Version-Release number of selected component (if applicable): kernel-3.10.9-200.fc19.x86_64 lvm2-2.02.98-12.fc19.x86_64 lvm2-libs-2.02.98-12.fc19.x86_64 device-mapper-1.02.77-12.fc19.x86_64 systemd-204-9.fc19.x86_64 dmraid-events-1.0.0.rc16-21.fc19.x86_64 mdadm-3.2.6-21.fc19.x86_64 How reproducible: Seems somewhat random, but reproduces within twenty iterations of the attached shell script. Steps to Reproduce: 1. Create a snapshot 2. Remove the snapshot 3. If no error occurs, go to step 1, else stop and file a bug Actual results: # ./lvm-snapshot-test.sh vg_rivendell root_fedora_15 rsnapshot -> Starting... -> Iteration 1, creating... Logical volume "rsnapshot" created -> Removing... Logical volume "rsnapshot" successfully removed -> Iteration 2, creating... Logical volume "rsnapshot" created -> Removing... Logical volume "rsnapshot" successfully removed -> Iteration 3, creating... device-mapper: create ioctl on vg_rivendell-rsnapshot failed: Device or resource busy Aborting. Failed to activate snapshot exception store. Note that in this specific instance, the snapshot was actually created - usually it is not. I was unable to remove the snapshot, however, and I needed to reboot the system at which point the snapshot no longer existed in the volume group # ls /dev/vg_rivendell/rsnapshot /dev/vg_rivendell/rsnapshot # lvremove /dev/vg_rivendell/rsnapshot One or more specified logical volume(s) not found. # lvdisplay /dev/vg_rivendell/rsnapshot One or more specified logical volume(s) not found. Expected results: No error Additional info: Physical system, with four drives configured as a RAID 6 and formatted as the physical volume of the affected volume group. The dmesg output shows no messages logged as a result of these operations, and the /var/log/messages shows no difference between the successful and failing cases: Aug 28 09:41:02 rivendell lvm[2230]: Monitoring snapshot vg_rivendell-rsnapshot Aug 28 09:41:02 rivendell lvm[2230]: No longer monitoring snapshot vg_rivendell-rsnapshot Aug 28 09:41:02 rivendell systemd-udevd[2202]: inotify_add_watch(7, /dev/dm-19, 10) failed: No such file or directory Aug 28 09:41:02 rivendell systemd-udevd[2220]: inotify_add_watch(7, /dev/dm-18, 10) failed: No such file or directory Aug 28 09:41:03 rivendell lvm[2230]: Monitoring snapshot vg_rivendell-rsnapshot Aug 28 09:41:03 rivendell lvm[2230]: No longer monitoring snapshot vg_rivendell-rsnapshot Aug 28 09:41:03 rivendell systemd-udevd[2220]: inotify_add_watch(7, /dev/dm-18, 10) failed: No such file or directory Aug 28 09:41:03 rivendell systemd-udevd[2202]: inotify_add_watch(7, /dev/dm-19, 10) failed: No such file or directory Aug 28 09:41:03 rivendell systemd-udevd[2246]: inotify_add_watch(7, /dev/dm-17, 10) failed: No such file or directory
The same utility run on a system with the same versions of packages but without a RAID configuration does not exhibit the problem. I ran 200 iterations of the test and did not see the issue. Original system on which the problem occurs: Intel Core 2 6400 @ 2.13GHz 8GB RAM 4x SATA drives configured as RAID 6 New system on which the problem does not occur: Intel Core i7 860 @ 2.80GHz 8GB RAM 2x SATA drives with physical volumes belonging to the same volume group When reproducing it again on the original system, I found a difference in /var/log/messages that I did not see earlier, as well as a message that I did not notice earlier. The following snipped shows the entries logged during several iterations. Note the difference in timing in the iteration prior to the failure and during the failure itself: Aug 29 09:58:43 rivendell lvm[12012]: Monitoring snapshot vg_rivendell-rsnapshot Aug 29 09:58:43 rivendell lvm[12012]: No longer monitoring snapshot vg_rivendell-rsnapshot Aug 29 09:58:43 rivendell systemd-udevd[11984]: inotify_add_watch(7, /dev/dm-18, 10) failed: No such file or directory Aug 29 09:58:43 rivendell systemd-udevd[11982]: inotify_add_watch(7, /dev/dm-19, 10) failed: No such file or directory Aug 29 09:58:44 rivendell lvm[12012]: Monitoring snapshot vg_rivendell-rsnapshot Aug 29 09:58:44 rivendell lvm[12012]: No longer monitoring snapshot vg_rivendell-rsnapshot Aug 29 09:58:44 rivendell systemd-udevd[11984]: inotify_add_watch(7, /dev/dm-18, 10) failed: No such file or directory Aug 29 09:58:44 rivendell systemd-udevd[11982]: inotify_add_watch(7, /dev/dm-19, 10) failed: No such file or directory Aug 29 09:58:44 rivendell lvm[12012]: Monitoring snapshot vg_rivendell-rsnapshot Aug 29 09:58:45 rivendell lvm[12012]: No longer monitoring snapshot vg_rivendell-rsnapshot Aug 29 09:58:45 rivendell systemd-udevd[11982]: inotify_add_watch(7, /dev/dm-19, 10) failed: No such file or directory Aug 29 09:58:45 rivendell systemd-udevd[11984]: inotify_add_watch(7, /dev/dm-18, 10) failed: No such file or directory Aug 29 09:58:45 rivendell lvm[12012]: Monitoring snapshot vg_rivendell-rsnapshot Aug 29 09:58:45 rivendell lvm[12012]: No longer monitoring snapshot vg_rivendell-rsnapshot Aug 29 09:58:45 rivendell lvm[12012]: Monitoring snapshot vg_rivendell-rsnapshot Aug 29 09:58:45 rivendell systemd-udevd[11993]: inotify_add_watch(7, /dev/dm-18, 10) failed: No such file or directory Aug 29 09:58:45 rivendell systemd-udevd[11984]: inotify_add_watch(7, /dev/dm-19, 10) failed: No such file or directory Aug 29 09:58:46 rivendell lvm[12012]: Snapshot vg_rivendell-rsnapshot changed state to: Aug 29 09:58:46 rivendell lvm[12012]: dm_task_run failed, errno = 6, No such device or address Aug 29 09:58:46 rivendell lvm[12012]: vg_rivendell-rsnapshot disappeared, detaching Aug 29 09:58:46 rivendell lvm[12012]: No longer monitoring snapshot vg_rivendell-rsnapshot I suspect this problem may have more to do with device-mapper/udev or some interaction between device-mapper and lvm2 that is exacerbated by the RAID 6 configuration, but I do not know how to extract additional information to further troubleshoot and triage. Any suggestions would be greatly appreciated - thank you in advance.
(In reply to John F Sullivan from comment #1) > I suspect this problem may have more to do with device-mapper/udev or some > interaction between device-mapper and lvm2 that is exacerbated by the RAID 6 > configuration, but I do not know how to extract additional information to > further troubleshoot and triage. Any suggestions would be greatly > appreciated - thank you in advance. Please, try using global/use_lvmetad=0 in /etc/lvm/lvm.conf. This will disable lvmtead daemon and accompanying LVM autoactivation - there might be a problem with the LVM autoactivation that interferes here.
Settings use_lvmetad=0 in /etc/lvm/lvm.conf and rebooting the system appears to have made the problem disappear. With this setting, I ran 200 iterations of the attached test script without error. I should be able to get my backups running again - thank you! Please let me know if I can gather any additional information that may shed light on what's going wrong with LVM auto-activation. I would be happy to continue triaging and gathering information to help solve the underlying issue so this workaround is no longer necessary.
Thanks for testing. Well, we need to fix how LVM autoactivation is done on MD RAID - we already have a similar bug reported - bug #985638 (though it might not seem to be similar at first sight). Just to be sure, please, send the output of lsblk command in addition so I can see the exact device layout. Thanks.
Created attachment 792178 [details] output of "lsblk -a" Hi Peter - you're very welcome, and thank you for the quick responses. I never would have identified that bug as being the same issue, though partly because I didn't associate the problem with RAID when I first experienced the issue. I've attached the output of "lsblk -a" - please let me know if there's any other data I can provide.
This patch should fix the issue reported: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=8d1d83504dcf9c86ad42d34d3bd0b201d7bab8f6 I'll try to add that in next F19 update.
(In reply to Peter Rajnoha from comment #6) > This patch should fix the issue reported: > > > https://git.fedorahosted.org/cgit/lvm2.git/commit/ > ?id=8d1d83504dcf9c86ad42d34d3bd0b201d7bab8f6 > > I'll try to add that in next F19 update. We also need a fix for bug #1014067 (dracut).
I really recommend using Fedora 20 which has the most recent bug fixes for lvmetad and accompanying autoactivation feature. It would be a bit complicated to backport all necessary patches back for F19 (there's a high risk of regressions to appear if the backport is not done correctly). Since we have a workaround for F19, I'm closing this bug with NEXTRELEASE. This problem should be completely fixed in F20.