Created attachment 516543 [details] output from lvcreate -vvvv -s ... Description of problem: lvcreate -s fails using lvm2 from rawhide on F15. Works fine in a VM with a fresh install, but on real hardware with existing volume group, it fails. Version-Release number of selected component (if applicable): 2.02.86-4.fc17 How reproducible: Sometimes it fails to create the COW LV entirely, other times it creates the LV but does not properly set up the snapshot, warning that 'manual intervention required'. Steps to Reproduce: 1. install Fedora 12 2. time passes 3. install Fedora 14 on a new LV in existing volume group 4. time passes 5. install Fedora 15 on a new LV in existing volume group 6. yum --enablerepo=rawhide update lvm2 (to get systemd enabled lvm2 monitor) 7. lvcreate -s ... Actual results: /dev/mapper/vg_mnetdjm4_2-things: open failed: No such file or directory /dev/vg_mnetdjm4_2/things: not found: device not cleared Aborting. Failed to wipe snapshot exception store. Unable to deactivate open vg_mnetdjm4_2-things (253:13) Unable to deactivate failed new LV. Manual intervention required. Expected results: Logical volume "things" created Additional info: I reproduced this on two real boxes. Can't reproduce in a VM with a clean install, will try again in a VM after creating the VG in F12. Also, successful and unsuccessful command output is sometimes includes these messages: semid 950274: semop failed for cookie 0xd4d720f: incorrect semaphore state Failed to set a proper state for notification semaphore identified by cookie value 223179279 (0xd4d720f) to initialize waiting for incoming notifications. Attached output from lvcreate -vvvv -s ...
One for Peter to comment on - it relies on interactions between udev rules and lvm2 (and can be tweaked with lvm.conf settings).
Until recently, lvm would do operations in /dev itself if udev hadn't done them. But we have switched that off now, so that we rely entirely on udev rules to perform /dev manipulations. "cookies" (semaphores) are the means of communication.
Looks like Peter built lvm2-2.02.86-5 - try that version in case it's something he already fixed.
The lvm2-2.02.86-5 did resolve one specific problem with udev_sync where if we used LVM tools without directly setting udev_sync=1 (either in lvm.conf or within the --config argument), it used a (wrong) default value of udev_sync=0. With .86 version where we rely on udev completely by default without doing any checks, this could end up with a nonexistent node/symlink when needed (bug #723144). ... But looking at the log provided here, the problem here is a bit different: #ioctl/libdm-iface.c:2112 Uevent not generated! Calling udev_complete internally to avoid process lock-up. That means we detected the uevent was not generated correctly in kernel. Hence we're trying to prevent a lock-up that would happen as a result of waiting for the event that would never come. Together with fully relying on udev without making any checks and corrections (the activation/verify_udev_operations setting set to 0 by default now), we sometimes miss the node/symlink as we're not synchronizing with udev. The message like: semid 950274: semop failed for cookie 0xd4d720f: incorrect semaphore state ...happens (most of the time) if the uevent was generated (though reported by kernel as not generated) and so we end up calling udev_complete twice leading to an incorrect semaphore state... This situation is very very rare, it has alredy been reported by a few people (reported in other distros though), but we were never able to properly find the source of the problem (most of the time, people restarted their machines and the problem was gone). It seems like a kernel bug somewhere... What kernel version do you use? Can you reproduce that after a machine reboot? Can you reproduce it with the older version of the kernel used in old Fedoras? Are you compiling kernels yourself or are you using stock Fedora kernels?
(...in device-mapper in kernel, we just set the "uevent generated flag" based on what the function responsible for sending the uevent returns. So it may be a bug even outside device-mapper...)
(In reply to comment #4) > The message like: > > semid 950274: semop failed for cookie 0xd4d720f: incorrect semaphore state > > ...happens (most of the time) if the uevent was generated (though reported by > kernel as not generated) and so we end up calling udev_complete twice leading > to an incorrect semaphore state... The first one being the udev_complete called internally and the other one called from within /lib/udev/rules/95-dm-notify.rules (this was not supposed to be run since the event was not supposed to be sent - as reported by the function sending the event in kernel).
(In reply to comment #4) > What kernel version do you use? Can you reproduce that after a machine reboot? > Can you reproduce it with the older version of the kernel used in old Fedoras? > Are you compiling kernels yourself or are you using stock Fedora kernels? I continue to experience the problem after many reboots. I don't see lvm2-2.02.86-5 yet in rawhide (is there a quicker way to get new rawhide packages?), but i verified udev_sync=1 in my lvm.conf. I'm using a self compiled 3.0 kernel. I see i don't have DM_UEVENTS set, however, i just tried using both 2.6.38.8-35.fc15.x86_64 and 2.6.35.10-74.fc14.x86_64 and it still fails, and i still have this message: #ioctl/libdm-iface.c:2112 Uevent not generated! Calling udev_complete internally to avoid process lock-up. Should i be using udev from rawhide to match the lvm version? Yum didn't think it was necessary, but i'll give it a shot now.
(In reply to comment #7) > I continue to experience the problem after many reboots. I don't see > lvm2-2.02.86-5 yet in rawhide (is there a quicker way to get new rawhide > packages?) Yes, you should be able to download it directly from Koji: http://koji.fedoraproject.org/koji/buildinfo?buildID=256940
(In reply to comment #7) > I'm using a self compiled 3.0 kernel. I see i don't have DM_UEVENTS set, That's not necessary - afaik this is only used for uevents that multipath sends. That option is misleading a bit, probably it would need a better comment there, but anyway...
I updated udev (and as a dep. systemd) from rawhide, but couldn't log on (not sure why...), so i had to resort to single user mode. From there i was able to successfully create a snapshot, but in trying to undo my yum history i ended up with all sorts of rpm conflicts and i'm no longer sure which parts of which packages i'm using, so i'm going back to create another fresh install...
(In reply to comment #0) > Also, successful and unsuccessful command output is sometimes includes these > messages: > semid 950274: semop failed for cookie 0xd4d720f: incorrect semaphore state > Failed to set a proper state for notification semaphore identified by cookie > value 223179279 (0xd4d720f) to initialize waiting for incoming notifications. Milan found a problem that appears when using namespaces - the semaphore is decreased as a fallback action based on the fact whether the kernel sent uevent or not. However, when using namespaces, this information is incorrect - there's still uevent sent and the kernel function reports it as not being sent (see also http://www.redhat.com/archives/dm-devel/2011-August/msg00075.html). That leads to a situation as described in this bug report. So my question now is: do you have Chrome web browser running while executing the lvm commands? :) (Alternatively, try disabling the namespace support in kernel and see if the problem is gone. The kernel setting to look for is "CONFIG_NET_NS").
Sorry for the slow reply. I've tried rebooting and never starting Chrome, and disabling CONFIG_NET_NS, and in both cases it continued to fail.
*** Bug 744532 has been marked as a duplicate of this bug. ***
(In reply to comment #12) > I've tried rebooting and never starting Chrome, and disabling CONFIG_NET_NS, > and in both cases it continued to fail. Doug, just to be sure, please look at the debug log and see if there's still the message "Uevent not generated! Calling udev_complete internally to avoid process lock-up."
(In reply to comment #14) > (In reply to comment #12) > > I've tried rebooting and never starting Chrome, and disabling CONFIG_NET_NS, > > and in both cases it continued to fail. > > Doug, just to be sure, please look at the debug log and see if there's still > the message "Uevent not generated! Calling udev_complete internally to avoid > process lock-up." ..also, please, send the .config of the self-compiled kernel you use. Thanks.
We also sometimes see the semaphore stuff when opening a LUKS encrypted disk...
Can you hit this problem with stock Fedora kernel or does this occur only with self-compiled ones?
I only run self-compiled kernels. (or at least 99.9x % of the uptime)
Two out of three times: # cryptsetup luksOpen /dev/sdf1 crypto2 Enter passphrase for /dev/sdf1: semid 1998857: semop failed for cookie 0xd4dcb72: incorrect semaphore state Failed to set a proper state for notification semaphore identified by cookie value 223202162 (0xd4dcb72) to initialize waiting for incoming notifications. # cryptsetup luksOpen /dev/sdf1 crypto2 Enter passphrase for /dev/sdf1: semid 2129929: semop failed for cookie 0xd4daa8d: incorrect semaphore state Failed to set a proper state for notification semaphore identified by cookie value 223193741 (0xd4daa8d) to initialize waiting for incoming notifications.
Would some debug logging of cryptsetup events like these help?
If you compile own kernel, please can you add this patch if it helps? Thanks. https://lkml.org/lkml/2011/11/9/965
I can try this with 3.0.8 this weekend. I.e.: boot into new kernel, do some tests and report about semaphores.
BTW 3.0.8 is not enough, patch above in queue for 3.0.9.
Applies cleany here to 3.0.8? But 3.0.8 is not enough? http://kernel.org/ appears to have pulled (?) 3.1.0. No 3.0.9 yet. Please advise.
It is Greg queue for release candidate for 3.0.9 and 3.1.1 (3.2 Linus' devel has it already). So for stable you need to patch it manually. Or wait cca week until stable 3.0.9 is out.
3.0.9 is out. Compiled and booted into it. Still saw a number of these: dracut: /dev/mapper/myvg-usrlv not set up by udev: Falling back to direct node creation.
I did luksOpen , mount, rsync, umount and luksClose of a backup disk: # umount /media # cryptsetup luksClose crypto2 No semaphore issues so far. Then, without stopping or removing the SATA disk I tried to luksOpen: # cryptsetup luksOpen /dev/sdf1 crypto2 Enter passphrase for /dev/sdf1: semid 655367: semop failed for cookie 0xd4da6e3: incorrect semaphore state Failed to set a proper state for notification semaphore identified by cookie value 223192803 (0xd4da6e3) to initialize waiting for incoming notifications. And immediately close: # cryptsetup luksClose crypto2 semid 720903: semop failed for cookie 0xd4dbefa: incorrect semaphore state Failed to set a proper state for notification semaphore identified by cookie value 223198970 (0xd4dbefa) to initialize waiting for incoming notifications.
Ever since upgrading to Fedora 16 my own built kernels using F16 don't boot anymore. dracut: /dev/mapper/myvg-rootlv: open failed: No such file or directory dracut: /dev/mapper/myvg-usrlv: open failed: No such file or directory dracut: /dev/mapper/myvg-varlv: open failed: No such file or directory dracut: /dev/mapper/myvg-datalv: open failed: No such file or directory dracut: /dev/mapper/myvg-srclv: open failed: No such file or directory dracut: /dev/mapper/myvg-home2lv: open failed: No such file or directory My older 3.0.3 kernel (built using F15) does boot.
and the official F16 kernel ends with a dracut error about finding the root fs on /dev/mapper/myvg-rootlv
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Are you still hitting the problem reported with recent version of lvm2 and recent Fedoras (F18+)?
I'm closing this bug report. If you still encounter the problems reported here, feel free to reopen. Thanks.