Bug 227823 - clurgmgrd[6673]: <crit> Watchdog: Daemon died, rebooting...
clurgmgrd[6673]: <crit> Watchdog: Daemon died, rebooting...
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: clumanager (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
Depends On:
  Show dependency treegraph
Reported: 2007-02-08 08:39 EST by Tomasz Jaszowski
Modified: 2009-04-16 16:22 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2007-0133
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-05-16 11:47:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Tomasz Jaszowski 2007-02-08 08:39:46 EST
Description of problem:
sometimes during stopping cluster services I'm receiving clurgmgrd[6673]: <crit>
Watchdog: Daemon died, rebooting...

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:
i would like to have more info why it happened. I can't find any info about such

Additional info:
Comment 1 Lon Hohberger 2007-02-08 10:23:44 EST
That happens if rgmanager crashes.  There are a few crash-fixes coming in the
next update.

It could theoretically also happen if rgmanager isn't down and cman tells it to
die (e.g. running cman_tool leave force ...) could have this effect.
Comment 2 Lon Hohberger 2007-02-08 10:29:41 EST
Try starting rgmanager with:

ulimit -c unlimited
clurgmgrd -d

That will disable the watchdog.  Additionally, if rgmanager crashes on the way
down, it will produce a core file.  I need the core file and what version of
rgmanager you're using as well as processor architecture in order to debug this.

(The core file is most important)
Comment 3 Tomasz Jaszowski 2007-03-01 01:00:52 EST
unfortunately till now i couldn't reproduce this problem in controlled
environment... but still trying.

Could You pass me more info about clurgmgrd parameters? what exactly -d option
means? are there any other options available?

Comment 4 Lon Hohberger 2007-03-05 10:54:53 EST
-d turns on debugging and disables the internal self-monitoring "watchdog" daemon.

There aren't any other helpful options in this case.
Comment 5 Michael Hagmann 2007-03-15 08:47:21 EDT

this is really a serious Bug. We have now at least 5 Productive Cluster who hit
this Bug. But the Problem is when we enable the Debug mode then it don't happen

Also the Problem occurs more often during ore after disabling a service with
"clusvcadm -d ...".

That happen at least thrice.

Comment 6 Lon Hohberger 2007-03-15 10:47:48 EDT
Ok, I at *least* need the version of rgmanager you guys are using.
Comment 7 Michael Hagmann 2007-03-15 12:09:35 EDT

No problem I can send you any Information do you like.

This is a normal RHEL4 AS u4 / Cluster /GFS installation.

root@lilr622b:~# rpm -qa | grep rgmanager


Comment 8 Tomasz Jaszowski 2007-03-15 12:22:44 EDT
Hi, exactly the same system version, and we still aren't able to reproduce
problem in controlled environment (it just dying when no one is watching :) )

we are testing rgmanager-1.9.54-3.228823test now, if problem occurs I'll pass
some info

Comment 9 Tomasz Jaszowski 2007-03-15 12:27:04 EDT
(In reply to comment #8)
> we are testing rgmanager-1.9.54-3.228823test now, if problem occurs I'll pass
> some info

PS. on production we have rgmanager-1.9.54-1, and rgmanager-1.9.54-3.228823test
on identically configured test environment
Comment 10 Michael Hagmann 2007-03-15 12:39:50 EDT
Hi I try a extensiv testing with the clusvcadm -d ??? and clusvcadm -e ??? maybe
it works and I get a crash.

for i in $(seq 1 1000) do
    clusvcadm -d $SERVICENAME
    sleep 10
    clusvcadm -e $SERVICENAME
    sleep 10

regards mike
Comment 11 Lon Hohberger 2007-03-16 13:29:40 EDT
The watchdog fires when the daemon crashes - ostensibly due to a segmentation
fault.  The 3.228823test package has two fixes that, if left open, could cause
this behavior.

Tomasz - with response to C#9 - the configuration between .54-0 and .54-3.228823
packages should be identical; there are no backwards-compatibility issues there  

Michael - with response to C#10 - that will eventually cause a crash due to a
race on .54, but is fixed in .54-3.228823 and the update 5 beta packages.

Could I get everyone who is on this bugzilla who is not already using
.54-3.228823 to use it?  I have a very strong suspicion that the crash causing
this symptom is fixed already.  All of the fixes in .54-3.228823 are included in
update 5.

If you need a different architecture than what is on my people page, let me know.

Comment 12 Michael Hagmann 2007-03-28 15:03:26 EDT
Hi Lon

is it possible to get a Hotfix package from Red Hat Support for .54-3.228823 ?

thx mike
Comment 14 Lon Hohberger 2007-05-16 11:47:26 EDT
Sorry for the late response; this is fixed in 4.5
Comment 15 Michael Hagmann 2007-05-16 12:35:22 EDT
Hi Lon

no problem, we also received a Hotfix package from Support.

thx mike

Note You need to log in before you can comment on or make changes to this bug.