Bug 418541

Summary: RFE: Make fence_ack_manual in RHEL5 branch talk to manual override socket
Product: Red Hat Enterprise Linux 5 Reporter: Lon Hohberger <lhh>
Component: cmanAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: GFS Bugs <gfs-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 5.0CC: cluster-maint, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0347 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 15:58:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Makes fence_ack_manual work as override (needs -e flag) none

Description Lon Hohberger 2007-12-10 18:02:31 UTC
Description of problem:

In RHEL5, we introduced a socket whereby administrators could issue commands to
unstick cluster nodes where fencing has failed.  It looks like this:

   echo "nodename.mydomain.com" > /var/run/cluster/fenced_override

The problem is that this is highly timing dependent - that is, an administrator
must hit it within the 5-second fence retry window.

In the head branch of CVS, fence_ack_manual is a script which waits for
/var/run/cluster/fenced_override to exist before issuing the command.

fence_ack_manual in the RHEL5 branch should also be able to do this.  This will
enable administrators to fix broken clusters with less difficulty.

Comment 1 RHEL Program Management 2007-12-10 18:54:24 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 2 Lon Hohberger 2007-12-14 16:57:14 UTC
Created attachment 289271 [details]
Makes fence_ack_manual work as override (needs -e flag)

Comment 3 Lon Hohberger 2007-12-14 16:57:41 UTC
Worked for me.

Dec 14 11:53:53 molly fenced[434]: frederick not a cluster member after 6 sec
post_join_delay
Dec 14 11:53:53 molly fenced[434]: fencing node "frederick"
Dec 14 11:53:53 molly fenced[434]: fence "frederick" failed
Dec 14 11:53:54 molly fenced[434]: fence "frederick" overridden by administrator
intervention


Comment 4 Lon Hohberger 2007-12-17 20:05:06 UTC
Patch in CVS

Checking in agents/manual/Makefile;
/cvs/cluster/cluster/fence/agents/manual/Makefile,v  <--  Makefile
new revision: 1.7.2.1; previous revision: 1.7
done
Checking in agents/manual/ack.c;
/cvs/cluster/cluster/fence/agents/manual/Attic/ack.c,v  <--  ack.c
new revision: 1.3.16.1; previous revision: 1.3
done


Comment 6 Lon Hohberger 2008-03-27 19:43:26 UTC
I have tested this using the following command with cman-2.0.81:

  fence_ack_manual -e -n frederick

It works as expected; fence_ack_manual waits for fencing to fail (as is
expected; I simply did a 'reboot -fn' while disabling the fencing device) and
issues the override for us.  It's far easier to use than timing the "echo" method.


Comment 8 errata-xmlrpc 2008-05-21 15:58:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0347.html