Bug 832056

Summary: wdmd initscript should verify that /dev/watchdog exists
Product: Red Hat Enterprise Linux 6 Reporter: Rami Vaknin <rvaknin>
Component: sanlockAssignee: David Teigland <teigland>
Status: CLOSED DUPLICATE QA Contact: Yaniv Kaul <ykaul>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.3CC: ajia, cluster-maint, cpelland, dron, jkt, yeylon
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-26 21:10:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rami Vaknin 2012-06-14 12:00:13 UTC
Environment:
RHEL 6.3, sanlock-2.3-1.el6.x86_64

Scenario:
sanlock and wdmd services die right after startup, although they report that they started OK.
The reason is missing /dev/watchdog device which should be loaded by watchdog hardware or softdog module.

Unless sanlock is started explicitly without wdmd usage, it should look for a /dev/watchdog device during startup.


# /etc/init.d/sanlock restart
Sending stop signal sanlock:                               [FAILED]
Waiting for sanlock to stop:                               [  OK  ]
Starting sanlock:                                          [  OK  ]
# /etc/init.d/sanlock status
sanlock is stopped
#


# /etc/init.d/wdmd restart
Stopping wdmd:                                             [FAILED]
Starting wdmd:                                             [  OK  ]
# /etc/init.d/wdmd status
wdmd is stopped
#

From /var/log/messages:
Jun 14 14:43:34 localhost wdmd[15565]: wdmd started tests_built client
Jun 14 14:43:34 localhost wdmd[15565]: no /dev/watchdog, load a watchdog driver

Comment 2 Dafna Ron 2012-06-25 13:24:15 UTC
moving to urgent since after install of si7 with vdsm vdsm-4.9.6-17.0.el6.x86_64
my vm's failed to run with:

Thread-1216::ERROR::2012-06-25 15:05:57,958::vm::604::vm.Vm::(_startUnderlyingVm) vmId=`3b80bb3c-8fad-4c48-aa5c-e5d26224bcfb`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 570, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1364, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2490, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error Failed to open socket to sanlock daemon: No such file or directory


/var/log/sanlock.log will show the following when we try to start it: 

wdmd connect failed for watchdog handling

Comment 3 David Teigland 2012-06-25 15:49:31 UTC
Can you include any wdmd messages from /var/log/messages?

Do things start properly if you 'modprobe softdog' yourself before wdmd and sanlock are started?

Comment 4 Dafna Ron 2012-06-25 15:52:21 UTC
yes. 
after I ran modprobe softdog and restarted all services I was able to run the vms
but I would have to run it each time I reboot my host or vm's will fail to run.

Comment 5 David Teigland 2012-06-25 16:06:59 UTC
In my test, 'service wdmd start' loads the softdog module and starts wdmd without a problem.  Do you have the latest sanlock package installed?
 sanlock-2.3-1.1.gitfee5d9c.el6_3.x86_64.rpm 

I wonder if /dev/watchdog already exists prior to starting wdmd?  That would cause init.d/wdmd to not load the softdog module.  Could you run the following and verify that /dev/wathdog doesn't exist, and that "Load the softdog..." message appears?

[root@bull-01 ~]# ls -l /dev/watchdog
ls: cannot access /dev/watchdog: No such file or directory
[root@bull-01 ~]# service wdmd start
Loading the softdog kernel module:                         [  OK  ]
Starting wdmd:                                             [  OK  ]

Comment 8 Dafna Ron 2012-06-26 08:45:52 UTC
starting wdmd will not load softdog: 

[root@blond-vdsh ~]# ls -l /dev/watchdog
ls: cannot access /dev/watchdog: No such file or directory
[root@blond-vdsh ~]# service wdmd start
Starting wdmd:                                             [  OK  ]
[root@blond-vdsh ~]# service wdmd status
wdmd is stopped
[root@blond-vdsh ~]# ls -l /dev/watchdog
ls: cannot access /dev/watchdog: No such file or directory
[root@blond-vdsh ~]#

Comment 9 David Teigland 2012-06-26 15:31:59 UTC
I see in the original description you're using sanlock-2.3-1.el6.x86_64.
You need to use the the latest build that Federico set up for you to test with before he left:  sanlock-2.3-1.1.gitfee5d9c.el6_3.x86_64

Comment 10 Rami Vaknin 2012-06-26 15:35:51 UTC
The original description was mine, the reproduction made by Dafna Ron.

Dafna, could you please add the sanlock version you tested with?

Comment 11 Dafna Ron 2012-06-26 15:42:20 UTC
[root@blond-vdsh ~]# rpm -qa |grep sanlock
sanlock-python-2.3-1.el6.x86_64
libvirt-lock-sanlock-0.9.10-21.el6.x86_64
sanlock-lib-2.3-1.el6.x86_64
sanlock-2.3-1.el6.x86_64

Comment 12 David Teigland 2012-07-26 21:10:28 UTC

*** This bug has been marked as a duplicate of bug 832935 ***