Bug 878119 - wdmd/sanlock incompatibility with modern F17 kernel(s)
Summary: wdmd/sanlock incompatibility with modern F17 kernel(s)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: sanlock
Version: 18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Federico Simoncelli
QA Contact: Haim
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-19 17:09 UTC by jrd
Modified: 2014-01-13 00:55 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-03-15 00:09:53 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description jrd 2012-11-19 17:09:07 UTC
Description of problem:

[This is probably not really a vdsm bug, but I couldn't identify a better category]

I was trying to test something else, using ovirt installed on a collection of F17 machines.  I discovered that in kernell versions newer than 3.3.4-5 wdmd fails to start, which causes sanlock to fail, which causes vdsm to be unable to act as a storage controller.

On a newer kernel, such as 3.6.6-1, attempting to start wdmd causes the following to appear in /var/log/messages:

Nov 19 10:46:36 f17z systemd-wdmd[9900]: Starting wdmd: [  OK  ]
Nov 19 10:46:36 f17z wdmd[9921]: could not set RR|RESET_ON_FORK priority 99 err 1
Nov 19 10:46:36 f17z wdmd[9921]: /dev/watchdog failed to set timeout
Nov 19 10:46:36 f17z wdmd[9921]: /dev/watchdog disarmed

After that wdmd is not running, causing sanlock to fail.  

Running the same node with kernel 3.3.4-5 allows wdmd to start, and everything else works.

The same appears to be true with all the 3.6 series kernels, but I cannot claim to have tested them exhaustively.

Version-Release number of selected component (if applicable):

[root@f17z ~]# rpm -qa kernel sanlock vdsm
vdsm-4.10.0-10.fc17.x86_64
kernel-3.3.4-5.fc17.x86_64
kernel-3.6.6-1.fc17.x86_64
sanlock-2.4-2.fc17.x86_64


How reproducible:

Always

Steps to Reproduce:
1.  Install vdsm on a vanilla F17 machine
2.  Configure a storage domain with that machine as controller (local or remote)
3.  
  
Actual results:

wdmd/sanlock won't start in 3.6+ kernel

Expected results:

wdmd/sanlock should start :-)

Additional info:

Comment 1 Federico Simoncelli 2012-11-20 12:30:47 UTC
We had several issues caused by selinux, my first suggestion is to update the selinux-policy package to the latest version (3.10.0-160.fc17):

https://koji.fedoraproject.org/koji/buildinfo?buildID=366128

You can also try to temporarily disable selinux and if it works it means that you're hitting something new (in selinux).

The error:

/dev/watchdog failed to set timeout

Might also be related to your watchdog driver (I found that some laptops have a watchdog that seems to reject the timeout configuration).

Please report:

# ls -l /dev/watchdog*

and try to find what is the watchdog driver that is loaded.

Comment 2 jrd 2012-11-20 13:30:55 UTC
I have two machines on which the problem manifests; one is a dell optiplex 755, the other is an Intel Piketon SDP.  Both work with kernel 3.3.4-5 and fail with kernel 3.6.6-1.  Both machines have selinux set to permissive mode.

[root@f17z ~]# rpm -qa selinux-policy
selinux-policy-3.10.0-159.fc17.noarch

yum upgrade selinux-policy yields no updates; perhaps I need to subscribe to a newer channel?

But remember that with no changes to selinux-policy, it works in 3.3 kernel and fails in 3.6.  Do you suspect that selinux itself is behaving differently w/r/t /dev/watchdog in the newer kernel?

On the piketon box, running 3.6

[root@f17z ~]# ls -l /dev/watchdog*
crw-------. 1 root root  10, 130 Nov 20 08:23 /dev/watchdog
crw-------. 1 root root 253,   0 Nov 20 08:23 /dev/watchdog0
crw-------. 1 root root 253,   1 Nov 20 08:23 /dev/watchdog1

On the same box, running 3.3

[root@f17z ~]#  ls -l /dev/watchdog*
crw-------. 1 root root 10, 130 Nov 20 08:25 /dev/watchdog

So something different is happening w/r/t the watchdog device initialization

I poked around in lsmod output, but nothing jumped out at me about watchdog driver.  What are good candidates for me to look for?

Comment 3 Federico Simoncelli 2012-11-20 14:32:21 UTC
After further debugging we discovered that the culprit is the iTCO_wdt module:

 iTCO_wdt               17948  0 
 iTCO_vendor_support    13419  1 iTCO_wdt

It exposes two watchdog (one of which is unusable):

[root@f17z ~]# ls -l /dev/watchdog*
crw-------. 1 root root  10, 130 Nov 20 08:23 /dev/watchdog
crw-------. 1 root root 253,   0 Nov 20 08:23 /dev/watchdog0
crw-------. 1 root root 253,   1 Nov 20 08:23 /dev/watchdog1

wdmd is in fact able to use /dev/watchdog1

One workaround (while we wait iTCO_wdt to be fixed) is to blacklist iTCO_wdt/iTCO_vendor_support and use the softdog module.

I will also go ahead and add an additional option to sanlock to select the preferred watchdog device (so that it could be possible to select /dev/watchdog1 eventually).

Comment 4 Federico Simoncelli 2013-02-21 10:40:19 UTC
This has been fixed in sanlock-2.6-7.fc18:

* Sun Jan 13 2013 Federico Simoncelli <fsimonce> 2.6-6
- wdmd: dynamically select working watchdog device

Comment 5 Fedora Update System 2013-02-21 10:45:09 UTC
sanlock-2.6-7.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/sanlock-2.6-7.fc18

Comment 6 Fedora Update System 2013-02-23 00:56:37 UTC
Package sanlock-2.6-7.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sanlock-2.6-7.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-2857/sanlock-2.6-7.fc18
then log in and leave karma (feedback).

Comment 8 Fedora Update System 2013-03-15 00:09:56 UTC
sanlock-2.6-7.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.