Bug 202637 - better error reporting from rgmanager is needed to debug restarts/failovers
Summary: better error reporting from rgmanager is needed to debug restarts/failovers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
: 199678 202569 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-08-15 17:01 UTC by Lon Hohberger
Modified: 2009-04-16 20:20 UTC (History)
5 users (show)

Fixed In Version: RHBA-2007-0149
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-10 21:17:19 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0149 0 normal SHIPPED_LIVE rgmanager bug fix update 2007-05-10 21:16:41 UTC

Description Lon Hohberger 2006-08-15 17:01:45 UTC
Description of problem:

In some situations, the reason a status check fails is not reported correctly to
 the system logs prior to causing a failover or a service restart.  The reason a
service is restarted or migrated without intervention is required for proper
debugging of service problems.

There are a couple of things that should happen:

(a) Each resource agent should log errors whenever they are about to return a
failure code.

(b) Rgmanager should log an error whenever a resource agent status check fails.

(c) Rgmanager should pass in "<type>:'<attr>'" as OCF_INSTANCE_NAME to resource
agents, and resource agents should log OCF_INSTANCE_NAME whenever an error occurs.

(d) Error messages and logs should be more uniform in the way they are reported
throughout the resource-agents.

Comment 1 Lon Hohberger 2006-08-15 17:23:43 UTC
*** Bug 202569 has been marked as a duplicate of this bug. ***

Comment 2 Lon Hohberger 2006-08-15 17:29:58 UTC
*** Bug 199678 has been marked as a duplicate of this bug. ***

Comment 5 Lon Hohberger 2006-10-31 15:19:20 UTC
Moving back to assigned.  We need to be sure we notify callers about fatal
signal errors when a child process dies.

Comment 6 Lon Hohberger 2006-11-21 15:06:15 UTC
This is just about done, except for clusterfs.sh needing to be updated.

Comment 9 Red Hat Bugzilla 2007-05-10 21:17:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0149.html



Note You need to log in before you can comment on or make changes to this bug.