Description of problem: [This is probably not really a vdsm bug, but I couldn't identify a better category] I was trying to test something else, using ovirt installed on a collection of F17 machines. I discovered that in kernell versions newer than 3.3.4-5 wdmd fails to start, which causes sanlock to fail, which causes vdsm to be unable to act as a storage controller. On a newer kernel, such as 3.6.6-1, attempting to start wdmd causes the following to appear in /var/log/messages: Nov 19 10:46:36 f17z systemd-wdmd[9900]: Starting wdmd: [ OK ] Nov 19 10:46:36 f17z wdmd[9921]: could not set RR|RESET_ON_FORK priority 99 err 1 Nov 19 10:46:36 f17z wdmd[9921]: /dev/watchdog failed to set timeout Nov 19 10:46:36 f17z wdmd[9921]: /dev/watchdog disarmed After that wdmd is not running, causing sanlock to fail. Running the same node with kernel 3.3.4-5 allows wdmd to start, and everything else works. The same appears to be true with all the 3.6 series kernels, but I cannot claim to have tested them exhaustively. Version-Release number of selected component (if applicable): [root@f17z ~]# rpm -qa kernel sanlock vdsm vdsm-4.10.0-10.fc17.x86_64 kernel-3.3.4-5.fc17.x86_64 kernel-3.6.6-1.fc17.x86_64 sanlock-2.4-2.fc17.x86_64 How reproducible: Always Steps to Reproduce: 1. Install vdsm on a vanilla F17 machine 2. Configure a storage domain with that machine as controller (local or remote) 3. Actual results: wdmd/sanlock won't start in 3.6+ kernel Expected results: wdmd/sanlock should start :-) Additional info:
We had several issues caused by selinux, my first suggestion is to update the selinux-policy package to the latest version (3.10.0-160.fc17): https://koji.fedoraproject.org/koji/buildinfo?buildID=366128 You can also try to temporarily disable selinux and if it works it means that you're hitting something new (in selinux). The error: /dev/watchdog failed to set timeout Might also be related to your watchdog driver (I found that some laptops have a watchdog that seems to reject the timeout configuration). Please report: # ls -l /dev/watchdog* and try to find what is the watchdog driver that is loaded.
I have two machines on which the problem manifests; one is a dell optiplex 755, the other is an Intel Piketon SDP. Both work with kernel 3.3.4-5 and fail with kernel 3.6.6-1. Both machines have selinux set to permissive mode. [root@f17z ~]# rpm -qa selinux-policy selinux-policy-3.10.0-159.fc17.noarch yum upgrade selinux-policy yields no updates; perhaps I need to subscribe to a newer channel? But remember that with no changes to selinux-policy, it works in 3.3 kernel and fails in 3.6. Do you suspect that selinux itself is behaving differently w/r/t /dev/watchdog in the newer kernel? On the piketon box, running 3.6 [root@f17z ~]# ls -l /dev/watchdog* crw-------. 1 root root 10, 130 Nov 20 08:23 /dev/watchdog crw-------. 1 root root 253, 0 Nov 20 08:23 /dev/watchdog0 crw-------. 1 root root 253, 1 Nov 20 08:23 /dev/watchdog1 On the same box, running 3.3 [root@f17z ~]# ls -l /dev/watchdog* crw-------. 1 root root 10, 130 Nov 20 08:25 /dev/watchdog So something different is happening w/r/t the watchdog device initialization I poked around in lsmod output, but nothing jumped out at me about watchdog driver. What are good candidates for me to look for?
After further debugging we discovered that the culprit is the iTCO_wdt module: iTCO_wdt 17948 0 iTCO_vendor_support 13419 1 iTCO_wdt It exposes two watchdog (one of which is unusable): [root@f17z ~]# ls -l /dev/watchdog* crw-------. 1 root root 10, 130 Nov 20 08:23 /dev/watchdog crw-------. 1 root root 253, 0 Nov 20 08:23 /dev/watchdog0 crw-------. 1 root root 253, 1 Nov 20 08:23 /dev/watchdog1 wdmd is in fact able to use /dev/watchdog1 One workaround (while we wait iTCO_wdt to be fixed) is to blacklist iTCO_wdt/iTCO_vendor_support and use the softdog module. I will also go ahead and add an additional option to sanlock to select the preferred watchdog device (so that it could be possible to select /dev/watchdog1 eventually).
This has been fixed in sanlock-2.6-7.fc18: * Sun Jan 13 2013 Federico Simoncelli <fsimonce> 2.6-6 - wdmd: dynamically select working watchdog device
sanlock-2.6-7.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/sanlock-2.6-7.fc18
Package sanlock-2.6-7.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sanlock-2.6-7.fc18' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-2857/sanlock-2.6-7.fc18 then log in and leave karma (feedback).
sanlock-2.6-7.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.