Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 227823 - clurgmgrd[6673]: <crit> Watchdog: Daemon died, rebooting...
Summary: clurgmgrd[6673]: <crit> Watchdog: Daemon died, rebooting...
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: clumanager
Version: 4
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
Depends On:
TreeView+ depends on / blocked
Reported: 2007-02-08 13:39 UTC by Tomasz Jaszowski
Modified: 2009-04-16 20:22 UTC (History)
4 users (show)

Fixed In Version: RHBA-2007-0133
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2007-05-16 15:47:26 UTC

Attachments (Terms of Use)

Description Tomasz Jaszowski 2007-02-08 13:39:46 UTC
Description of problem:
sometimes during stopping cluster services I'm receiving clurgmgrd[6673]: <crit>
Watchdog: Daemon died, rebooting...

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:
i would like to have more info why it happened. I can't find any info about such

Additional info:

Comment 1 Lon Hohberger 2007-02-08 15:23:44 UTC
That happens if rgmanager crashes.  There are a few crash-fixes coming in the
next update.

It could theoretically also happen if rgmanager isn't down and cman tells it to
die (e.g. running cman_tool leave force ...) could have this effect.

Comment 2 Lon Hohberger 2007-02-08 15:29:41 UTC
Try starting rgmanager with:

ulimit -c unlimited
clurgmgrd -d

That will disable the watchdog.  Additionally, if rgmanager crashes on the way
down, it will produce a core file.  I need the core file and what version of
rgmanager you're using as well as processor architecture in order to debug this.

(The core file is most important)

Comment 3 Tomasz Jaszowski 2007-03-01 06:00:52 UTC
unfortunately till now i couldn't reproduce this problem in controlled
environment... but still trying.

Could You pass me more info about clurgmgrd parameters? what exactly -d option
means? are there any other options available?

Comment 4 Lon Hohberger 2007-03-05 15:54:53 UTC
-d turns on debugging and disables the internal self-monitoring "watchdog" daemon.

There aren't any other helpful options in this case.

Comment 5 Michael Hagmann 2007-03-15 12:47:21 UTC

this is really a serious Bug. We have now at least 5 Productive Cluster who hit
this Bug. But the Problem is when we enable the Debug mode then it don't happen

Also the Problem occurs more often during ore after disabling a service with
"clusvcadm -d ...".

That happen at least thrice.


Comment 6 Lon Hohberger 2007-03-15 14:47:48 UTC
Ok, I at *least* need the version of rgmanager you guys are using.

Comment 7 Michael Hagmann 2007-03-15 16:09:35 UTC

No problem I can send you any Information do you like.

This is a normal RHEL4 AS u4 / Cluster /GFS installation.

root@lilr622b:~# rpm -qa | grep rgmanager


Comment 8 Tomasz Jaszowski 2007-03-15 16:22:44 UTC
Hi, exactly the same system version, and we still aren't able to reproduce
problem in controlled environment (it just dying when no one is watching :) )

we are testing rgmanager-1.9.54-3.228823test now, if problem occurs I'll pass
some info


Comment 9 Tomasz Jaszowski 2007-03-15 16:27:04 UTC
(In reply to comment #8)
> we are testing rgmanager-1.9.54-3.228823test now, if problem occurs I'll pass
> some info

PS. on production we have rgmanager-1.9.54-1, and rgmanager-1.9.54-3.228823test
on identically configured test environment

Comment 10 Michael Hagmann 2007-03-15 16:39:50 UTC
Hi I try a extensiv testing with the clusvcadm -d ??? and clusvcadm -e ??? maybe
it works and I get a crash.

for i in $(seq 1 1000) do
    clusvcadm -d $SERVICENAME
    sleep 10
    clusvcadm -e $SERVICENAME
    sleep 10

regards mike

Comment 11 Lon Hohberger 2007-03-16 17:29:40 UTC
The watchdog fires when the daemon crashes - ostensibly due to a segmentation
fault.  The 3.228823test package has two fixes that, if left open, could cause
this behavior.

Tomasz - with response to C#9 - the configuration between .54-0 and .54-3.228823
packages should be identical; there are no backwards-compatibility issues there  

Michael - with response to C#10 - that will eventually cause a crash due to a
race on .54, but is fixed in .54-3.228823 and the update 5 beta packages.

Could I get everyone who is on this bugzilla who is not already using
.54-3.228823 to use it?  I have a very strong suspicion that the crash causing
this symptom is fixed already.  All of the fixes in .54-3.228823 are included in
update 5.

If you need a different architecture than what is on my people page, let me know.


Comment 12 Michael Hagmann 2007-03-28 19:03:26 UTC
Hi Lon

is it possible to get a Hotfix package from Red Hat Support for .54-3.228823 ?

thx mike

Comment 14 Lon Hohberger 2007-05-16 15:47:26 UTC
Sorry for the late response; this is fixed in 4.5

Comment 15 Michael Hagmann 2007-05-16 16:35:22 UTC
Hi Lon

no problem, we also received a Hotfix package from Support.

thx mike

Note You need to log in before you can comment on or make changes to this bug.