Bug 418541 - RFE: Make fence_ack_manual in RHEL5 branch talk to manual override socket
Summary: RFE: Make fence_ack_manual in RHEL5 branch talk to manual override socket
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.0
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Lon Hohberger
QA Contact: GFS Bugs
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-12-10 18:02 UTC by Lon Hohberger
Modified: 2011-06-13 21:58 UTC (History)
2 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2008-05-21 15:58:34 UTC


Attachments (Terms of Use)
Makes fence_ack_manual work as override (needs -e flag) (4.87 KB, text/plain)
2007-12-14 16:57 UTC, Lon Hohberger
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0347 normal SHIPPED_LIVE cman bug fix and enhancement update 2008-05-20 12:39:41 UTC

Description Lon Hohberger 2007-12-10 18:02:31 UTC
Description of problem:

In RHEL5, we introduced a socket whereby administrators could issue commands to
unstick cluster nodes where fencing has failed.  It looks like this:

   echo "nodename.mydomain.com" > /var/run/cluster/fenced_override

The problem is that this is highly timing dependent - that is, an administrator
must hit it within the 5-second fence retry window.

In the head branch of CVS, fence_ack_manual is a script which waits for
/var/run/cluster/fenced_override to exist before issuing the command.

fence_ack_manual in the RHEL5 branch should also be able to do this.  This will
enable administrators to fix broken clusters with less difficulty.

Comment 1 RHEL Product and Program Management 2007-12-10 18:54:24 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 2 Lon Hohberger 2007-12-14 16:57:14 UTC
Created attachment 289271 [details]
Makes fence_ack_manual work as override (needs -e flag)

Comment 3 Lon Hohberger 2007-12-14 16:57:41 UTC
Worked for me.

Dec 14 11:53:53 molly fenced[434]: frederick not a cluster member after 6 sec
post_join_delay
Dec 14 11:53:53 molly fenced[434]: fencing node "frederick"
Dec 14 11:53:53 molly fenced[434]: fence "frederick" failed
Dec 14 11:53:54 molly fenced[434]: fence "frederick" overridden by administrator
intervention


Comment 4 Lon Hohberger 2007-12-17 20:05:06 UTC
Patch in CVS

Checking in agents/manual/Makefile;
/cvs/cluster/cluster/fence/agents/manual/Makefile,v  <--  Makefile
new revision: 1.7.2.1; previous revision: 1.7
done
Checking in agents/manual/ack.c;
/cvs/cluster/cluster/fence/agents/manual/Attic/ack.c,v  <--  ack.c
new revision: 1.3.16.1; previous revision: 1.3
done


Comment 6 Lon Hohberger 2008-03-27 19:43:26 UTC
I have tested this using the following command with cman-2.0.81:

  fence_ack_manual -e -n frederick

It works as expected; fence_ack_manual waits for fencing to fail (as is
expected; I simply did a 'reboot -fn' while disabling the fencing device) and
issues the override for us.  It's far easier to use than timing the "echo" method.


Comment 8 errata-xmlrpc 2008-05-21 15:58:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0347.html



Note You need to log in before you can comment on or make changes to this bug.