Description of problem:
sometimes during stopping cluster services I'm receiving clurgmgrd: <crit>
Watchdog: Daemon died, rebooting...
Version-Release number of selected component (if applicable):
Steps to Reproduce:
i would like to have more info why it happened. I can't find any info about such
That happens if rgmanager crashes. There are a few crash-fixes coming in the
It could theoretically also happen if rgmanager isn't down and cman tells it to
die (e.g. running cman_tool leave force ...) could have this effect.
Try starting rgmanager with:
ulimit -c unlimited
That will disable the watchdog. Additionally, if rgmanager crashes on the way
down, it will produce a core file. I need the core file and what version of
rgmanager you're using as well as processor architecture in order to debug this.
(The core file is most important)
unfortunately till now i couldn't reproduce this problem in controlled
environment... but still trying.
Could You pass me more info about clurgmgrd parameters? what exactly -d option
means? are there any other options available?
-d turns on debugging and disables the internal self-monitoring "watchdog" daemon.
There aren't any other helpful options in this case.
this is really a serious Bug. We have now at least 5 Productive Cluster who hit
this Bug. But the Problem is when we enable the Debug mode then it don't happen
Also the Problem occurs more often during ore after disabling a service with
"clusvcadm -d ...".
That happen at least thrice.
Ok, I at *least* need the version of rgmanager you guys are using.
No problem I can send you any Information do you like.
This is a normal RHEL4 AS u4 / Cluster /GFS installation.
root@lilr622b:~# rpm -qa | grep rgmanager
Hi, exactly the same system version, and we still aren't able to reproduce
problem in controlled environment (it just dying when no one is watching :) )
we are testing rgmanager-1.9.54-3.228823test now, if problem occurs I'll pass
(In reply to comment #8)
> we are testing rgmanager-1.9.54-3.228823test now, if problem occurs I'll pass
> some info
PS. on production we have rgmanager-1.9.54-1, and rgmanager-1.9.54-3.228823test
on identically configured test environment
Hi I try a extensiv testing with the clusvcadm -d ??? and clusvcadm -e ??? maybe
it works and I get a crash.
for i in $(seq 1 1000) do
clusvcadm -d $SERVICENAME
clusvcadm -e $SERVICENAME
The watchdog fires when the daemon crashes - ostensibly due to a segmentation
fault. The 3.228823test package has two fixes that, if left open, could cause
Tomasz - with response to C#9 - the configuration between .54-0 and .54-3.228823
packages should be identical; there are no backwards-compatibility issues there
Michael - with response to C#10 - that will eventually cause a crash due to a
race on .54, but is fixed in .54-3.228823 and the update 5 beta packages.
Could I get everyone who is on this bugzilla who is not already using
.54-3.228823 to use it? I have a very strong suspicion that the crash causing
this symptom is fixed already. All of the fixes in .54-3.228823 are included in
If you need a different architecture than what is on my people page, let me know.
is it possible to get a Hotfix package from Red Hat Support for .54-3.228823 ?
Sorry for the late response; this is fixed in 4.5
no problem, we also received a Hotfix package from Support.